Concepts

Trace

A trace is a detailed record of operations executed during a single inference run. It captures the sequence and timing of key steps—such as preprocessing, model execution, and postprocessing — along with relevant inputs, outputs, and metadata.

A trace may also include one or more profiles, providing deeper performance statistics for specific parts of the run.

Profile

A profile is a collection of detailed performance statistics captured during a trace or over a period of time. It provides insight into resource usage, execution time distribution, and performance bottlenecks across operations.

Metric

Metrics provide quantitative measurements of execution, performance, and system behavior over time. Graphsignal records various categories of metrics, including:

  • Performance metrics - Function duration, counts, number or errors.
  • System metrics – CPU, memory, and network usage.
  • Device metrics – GPU utilization, memory usage, temperature.
  • Framework metrics – framework-level statistics such as batch processing times.
  • Inference metrics – model-specific statistics such as latency, token usage, and throughput.

Error

An error represents any exception, failure, or unexpected event recorded by the tracer during execution. Errors include runtime exceptions, failed requests, and framework-level issues.

Tracer

The Graphsignal tracer is a module integrated into applications and scripts.

It automatically traces and profiles execution for natively supported libraries, and also provides APIs for manual tracing of custom operations.

By integrating the tracer, developers can observe performance, diagnose issues, and optimize resource usage without extensive instrumentation.