Hugging Face Inference Monitoring
See the Quick Start guide on how to install and configure Graphsignal.
Pipeline:
from transformers import pipeline
pipe = pipeline(task="text-generation")
with graphsignal.start_trace(endpoint='predict'):
output = pipe('some text')
Prediction with Trainer
or trainer subclass:
from transformers import Trainer
from transformers import TrainingArguments
training_args = TrainingArguments("test_trainer")
class MyTrainer(Trainer):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def prediction_step(self, *args, **kwargs):
with graphsignal.start_trace(endpoint='predict'):
return super().prediction_step(*args, **kwargs)
Optimized ONNX models exported by Optimum:
from transformers import AutoTokenizer, PretrainedConfig, pipeline
from optimum.onnxruntime import ORTModelForSequenceClassification
import onnxruntime
session = onnxruntime.InferenceSession('model.onnx')
# Load ONNX model exported by optimum
model = ORTModelForSequenceClassification(
config=PretrainedConfig.from_json_file('path-to-config.json'))
pipe = pipeline('text-classification', model=model, tokenizer=tokenizer)
with graphsignal.start_trace(endpoint='predict'):
pipe('some text')
Examples
The Hugging Face BERT example illustrates how to add the start_trace
method.
The ONNX Runtime BERT example shows how to perform benchmarks for various model optimizations.
Model serving
Graphsignal provides a built-in support for server applications. See Model Serving guide for more information.