Introduction

What is Graphsignal

Graphsignal is a production-scale inference profiling platform that helps engineers optimize AI performance across models, engines, GPUs, and other accelerators. It provides essential visibility across the inference stack, including:

Continuous, high-resolution profiling timelines exposing operation durations and resource utilization across inference workloads.
LLM generation tracing with per-step timing, token throughput, and latency breakdowns for major inference frameworks.
System-level metrics for inference engines and hardware (CPU, GPU, accelerators).
Error monitoring for device-level failures and inference errors.
Automatic engine flag optimization, AI chat for bottleneck investigation, and profiling context for AI coding agents.

The name Graphsignal blends graph - the structure underlying inference - with signal - the telemetry and profiling data emitted during execution.

How it works

The Graphsignal Profiler runs as a sidecar process alongside your inference workload — started with the graphsignal-run CLI or graphsignal.watch() from Python.

The profiler sends recorded performance data to Graphsignal servers, where it is post-processed and ready to analyze at app.graphsignal.com.

Getting started

Sign up for an account.
See the Quick Start guide on how to add Graphsignal to your application.