Skip to content

Introduction

Graphsignal is a production-scale inference profiling platform that helps engineers optimize AI performance across models, engines, GPUs, and other accelerators. It provides essential visibility across the inference stack, including:

  • Continuous, high-resolution profiling timelines exposing operation durations and resource utilization across inference workloads.
  • LLM generation tracing with per-step timing, token throughput, and latency breakdowns for major inference frameworks.
  • System-level metrics for inference engines and hardware (CPU, GPU, accelerators).
  • Error monitoring for device-level failures and inference errors.
  • Inference telemetry for AI agents to identify bottlenecks and drive targeted improvements across the inference stack.

The name Graphsignal blends graph - the structure underlying inference - with signal - the telemetry and profiling data emitted during execution.

The Graphsignal Profiler runs as a sidecar process alongside your inference workload — started with the graphsignal-run CLI or graphsignal.watch() from Python.

The profiler sends recorded performance data to Graphsignal servers, where it is post-processed and ready to analyze at app.graphsignal.com.

  • Sign up for an account.
  • See the Quick Start guide on how to add Graphsignal to your application.