Skip to content

Profiler CLI

Graphsignal observes your workload from a sidecar process — the profiler. It runs out-of-process, never inside your workload. graphsignal-run launches a workload with the profiler attached.

Install the CLI as an isolated uv tool so it doesn’t pollute the workload environment:

Terminal window
UV_TOOL_BIN_DIR=/usr/local/bin uv tool install 'graphsignal[cu12]' # CUDA 12.x
# or
UV_TOOL_BIN_DIR=/usr/local/bin uv tool install 'graphsignal[cu13]' # CUDA 13.x

UV_TOOL_BIN_DIR=/usr/local/bin puts graphsignal-run in a directory that is already on PATH for every shell, including non-interactive scripts and containers.

Alternative: install into your workload environment

Section titled “Alternative: install into your workload environment”

If you prefer a single environment, or you use the graphsignal.watch() Python API (which requires graphsignal importable by your application), install it directly into your workload’s environment instead:

Terminal window
pip install 'graphsignal[cu12]' # CUDA 12.x
# or
pip install 'graphsignal[cu13]' # CUDA 13.x

Wrap any launch command. graphsignal-run starts the profiler sidecar, enables GPU profiling, and launches your workload so process managers (init systems, container runtimes, etc.) see only the workload.

Terminal window
graphsignal-run <command> [args...]

Examples:

Terminal window
graphsignal-run vllm serve <model> --port 8001
graphsignal-run --enable-otel sglang serve --model-path <model>
graphsignal-run python -m sglang.launch_server --model-path <model>
graphsignal-run trtllm-serve <model> --port 8000
graphsignal-run --metrics-port 8000 trtllm-serve <model> --port 8000
graphsignal-run python myapp.py
graphsignal-run app.py

Options (must precede the command):

  • --enable-otel — Enable OpenTelemetry trace capture for supported engines (vLLM, SGLang). Captures the engine’s request traces via a local OTLP/gRPC collector. Requires OpenTelemetry installed in the engine’s environment. Off by default.
  • --metrics-port PORT — Port to scrape the workload’s Prometheus /metrics endpoint on. Overrides the port derived from the engine’s --port flag or its default (e.g. 8000 for vLLM/TensorRT-LLM, 30000 for SGLang). Use this when metrics are exposed on a different port than the HTTP server. Not forwarded to the workload.
  • --cuda-graph-trace graph|node — CUDA graph tracing granularity. graph (default): trace each CUDA graph as one aggregated cuda.graph event; lower CUPTI overhead. node: trace individual graph node activities (kernels, memory copies, etc.) as separate cuda.kernel / cuda.memcpy events; higher CUPTI overhead. Use node when you need node-level visibility inside captured CUDA graphs. Not forwarded to the workload.

Behavior:

  • Detects the engine from your command (vLLM, SGLang, TensorRT-LLM, or a generic fallback).
  • For OTEL-aware workloads (vLLM, SGLang), captures the engine’s request traces via a local OTLP/gRPC collector when --enable-otel is set.
  • Scrapes Prometheus metrics from http://127.0.0.1:<port>/metrics when a metrics port is resolved (from --metrics-port, the engine’s --port, or the engine default).
  • Collects CUDA kernel activity via CUPTI as soon as CUDA initializes.
  • Launches your workload with the profiler sidecar running alongside it.

The profiler reads its configuration from environment variables. Set these before invoking graphsignal-run (or before calling graphsignal.watch()).

VariablePurpose
GRAPHSIGNAL_API_KEY (required)Your account API key.
GRAPHSIGNAL_API_BASEOverride the API endpoint (defaults to https://api.graphsignal.com).
GRAPHSIGNAL_TAG_<KEY>=<value>Arbitrary tag attached to all signals (e.g. GRAPHSIGNAL_TAG_DEPLOYMENT=us-prod).

To get an API key, sign up for a free account at graphsignal.com; the key is in Settings / API Keys.