Hugging Face Inference Monitoring

See the Quick Start guide on how to install and configure Graphsignal.

Pipeline:

from transformers import pipeline

pipe = pipeline(task="text-generation")

with graphsignal.start_trace(endpoint='predict'):
    output = pipe('some text')

Prediction with Trainer or trainer subclass:

from transformers import Trainer
from transformers import TrainingArguments

training_args = TrainingArguments("test_trainer")

class MyTrainer(Trainer):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    
    def prediction_step(self, *args, **kwargs):
        with graphsignal.start_trace(endpoint='predict'):
            return super().prediction_step(*args, **kwargs)

Optimized ONNX models exported by Optimum:

from transformers import AutoTokenizer, PretrainedConfig, pipeline
from optimum.onnxruntime import ORTModelForSequenceClassification
import onnxruntime

session = onnxruntime.InferenceSession('model.onnx')

# Load ONNX model exported by optimum
model = ORTModelForSequenceClassification(
    config=PretrainedConfig.from_json_file('path-to-config.json'))

pipe = pipeline('text-classification', model=model, tokenizer=tokenizer)

with graphsignal.start_trace(endpoint='predict'):
    pipe('some text')

Examples

The Hugging Face BERT example illustrates how to add the start_trace method.

The ONNX Runtime BERT example shows how to perform benchmarks for various model optimizations.

Model serving

Graphsignal provides a built-in support for server applications. See Model Serving guide for more information.