Hugging Face Inference Monitoring and Profiling

See the Quick Start guide on how to install and configure Graphsignal.

For pipelines:

from transformers import pipeline

pipe = pipeline(task="text-generation")

with graphsignal.start_trace(endpoint='predict', profiler='pytorch'):
    output = pipe('some text')

For Trainer or trainer subclass:

from transformers import Trainer
from transformers import TrainingArguments

training_args = TrainingArguments("test_trainer")

class MyTrainer(Trainer):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    
    def prediction_step(self, *args, **kwargs):
        with graphsignal.start_trace(endpoint='predict', profiler='pytorch'):
            return super().prediction_step(*args, **kwargs)

For optimized ONNX models exported by Optimum:

from transformers import AutoTokenizer, PretrainedConfig, pipeline
from optimum.onnxruntime import ORTModelForSequenceClassification
import onnxruntime

from graphsignal.profilers.onnxruntime_profiler import ONNXRuntimeProfiler()
ort_profiler = ONNXRuntimeProfiler()

sess_options = onnxruntime.SessionOptions()
ort_profiler.initialize_options(sess_options)

# Load ONNX model exported by optimum
session = onnxruntime.InferenceSession('model.onnx', sess_options, providers=[...])
model = ORTModelForSequenceClassification(
    model=session, 
    config=PretrainedConfig.from_json_file('path-to-config.json'))

pipe = pipeline('text-classification', model=model, tokenizer=tokenizer)

ort_profiler.set_onnx_session(session)

with graphsignal.start_trace(endpoint='predict', profiler=ort_profiler):
    pipe('some text')

Examples

The Hugging Face BERT example illustrates how to add the start_trace method.

The ONNX Runtime BERT example shows how to perform benchmarks for various model optimizations.

Model serving

Graphsignal provides a built-in support for server applications. See Model Serving guide for more information.