SGLang Profiling, Tracing, and Monitoring

See the Quick Start guide on how to install and configure Graphsignal.

For GPU profiling with SGLang on Linux, install the CUPTI extra matching your CUDA version: pip install graphsignal[cu12] (CUDA 12.x) or pip install graphsignal[cu13] (CUDA 13.x).

Graphsignal automatically instruments and profiles SGLang.

What is captured

Profiling: SGLang millisecond-level operations.
Tracing: SGLang OTEL spans.
Metrics: SGLang Prometheus metrics.

Integration into a Python application that runs SGLang

Call graphsignal.configure(...) in your app and run SGLang normally.

import graphsignal

graphsignal.configure(api_key='my-api-key')
# or pass the API key via the GRAPHSIGNAL_API_KEY environment variable

Run SGLang server with Graphsignal runner

Set your API key, then start sglang serve via graphsignal-run:

export GRAPHSIGNAL_API_KEY="..."

graphsignal-run sglang serve \
  --model-path Qwen/Qwen1.5-7B-Chat \
  --port 8000

Add Graphsignal to an SGLang Docker image

If your image does not include Graphsignal (or CUPTI), install Graphsignal at container startup and run SGLang through graphsignal-run.

docker run --gpus all \
  -p 8000:8000 \
  --ipc=host \
  -e GRAPHSIGNAL_API_KEY=YOUR_API_KEY \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --entrypoint bash \
  your-sglang-image:latest \
  -lc 'pip install --no-cache-dir graphsignal[cu12] \
        && exec graphsignal-run sglang serve \
            --model-path Qwen/Qwen2.5-1.5B-Instruct \
            --port 8000'