Distributed Inference Monitoring
Graphsignal has a built-in support for distributed inference, e.g. multi-node and multi-GPU inference. When runs involve multiple workers, the dashboards seamlessly aggregate, structure and visualize data from all workers.
The ranks of workers are automatically recorded for some frameworks. In other cases, to identify each worker, you can provide
configure method. Tags can also be used to identify and compare runs and jobs.
graphsignal.configure( api_key='my-api-key', deployment='my-model-prod', tags=dict(rank=0))
tags can also be provided via an environment variable.
for sample in dataset: with graphsignal.start_trace(endpoint='predict'): # inference code
The DeepSpeed GPT Neo example illustrates a distributed inference use case.