SGLang Profiling, Tracing, and Monitoring
See the Quick Start guide on how to install and configure Graphsignal.
For GPU profiling with SGLang on Linux, install the CUPTI extra matching your CUDA version: pip install graphsignal[cu12] (CUDA 12.x) or pip install graphsignal[cu13] (CUDA 13.x).
Graphsignal automatically instruments and profiles SGLang.
What is captured
- Profiling: SGLang millisecond-level operations.
- Tracing: SGLang OTEL spans.
- Metrics: SGLang Prometheus metrics.
Integration into a Python application that runs SGLang
Call graphsignal.configure(...) in your app and run SGLang normally.
import graphsignal
graphsignal.configure(api_key='my-api-key')
# or pass the API key via the GRAPHSIGNAL_API_KEY environment variable
Run SGLang server with Graphsignal runner
Set your API key, then start sglang serve via graphsignal-run:
export GRAPHSIGNAL_API_KEY="..."
graphsignal-run sglang serve \
--model-path Qwen/Qwen1.5-7B-Chat \
--port 8000
Add Graphsignal to an SGLang Docker image
If your image does not include Graphsignal (or CUPTI), install Graphsignal at container startup and run SGLang through graphsignal-run.
docker run --gpus all \
-p 8000:8000 \
--ipc=host \
-e GRAPHSIGNAL_API_KEY=YOUR_API_KEY \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--entrypoint bash \
your-sglang-image:latest \
-lc 'pip install --no-cache-dir graphsignal[cu12] \
&& exec graphsignal-run sglang serve \
--model-path Qwen/Qwen2.5-1.5B-Instruct \
--port 8000'