Graphsignal on dstack
See the Quick Start guide on how to install Graphsignal.
dstack provisions GPUs and runs your inference workloads as services. Graphsignal attaches to whatever you’re running — vLLM, SGLang, TensorRT-LLM, raw PyTorch, custom CUDA — with the same one-line pattern: install Graphsignal in the service image (or at startup), pass GRAPHSIGNAL_API_KEY via secrets/env, and wrap your launch command with graphsignal-run.
graphsignal-run recognizes supported inference engines by their argv and configures their OTEL / Prometheus / CUPTI integration automatically. Any other command (python my_app.py, …) is launched with CUPTI attached but without engine-specific argv mutations.
Pattern
Section titled “Pattern”A dstack service file with Graphsignal attached:
type: servicename: my-inference-service
image: <your inference image>env: - GRAPHSIGNAL_API_KEY # supplied via dstack secrets/env
commands: - | pip install --no-cache-dir 'graphsignal[cu12]' && \ graphsignal-run <your launch command>
port: 8000
resources: gpu: 24GBThe pip install step is only needed when the image doesn’t already bundle Graphsignal — see the Bake Graphsignal into your image section below.
Example: SGLang on dstack
Section titled “Example: SGLang on dstack”The dstack SGLang example starts the server with python3 -m sglang.launch_server. Wrapping it with graphsignal-run (and supplying the API key via env) is the only change:
type: servicename: deepseek-r1
image: lmsysorg/sglang:latestenv: - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B - GRAPHSIGNAL_API_KEY
commands: - | pip install --no-cache-dir 'graphsignal[cu12]' && \ graphsignal-run python3 -m sglang.launch_server \ --model-path $MODEL_ID \ --port 8000 \ --trust-remote-code
port: 8000model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
resources: gpu: 24GBDeploy with dstack apply:
dstack apply -f service.dstack.ymlExample: vLLM on dstack
Section titled “Example: vLLM on dstack”type: servicename: qwen-vllm
image: vllm/vllm-openai:latestenv: - GRAPHSIGNAL_API_KEY
commands: - | pip install --no-cache-dir 'graphsignal[cu12]' && \ graphsignal-run vllm serve Qwen/Qwen1.5-7B-Chat --port 8000
port: 8000
resources: gpu: 24GBExample: a custom Python entry point
Section titled “Example: a custom Python entry point”For any non-engine workload (research script, custom serving code, etc.), wrap the launch the same way — graphsignal-run falls back to attaching CUPTI and spawning the watcher without any engine-specific argv mutation:
type: servicename: my-custom-inference
image: my-team/my-inference-image:latestenv: - GRAPHSIGNAL_API_KEY
commands: - graphsignal-run python serve.py --port 8000
port: 8000
resources: gpu: 24GBBake Graphsignal into your image
Section titled “Bake Graphsignal into your image”If you ship a custom image, install Graphsignal once at build time and drop the pip install from commands:
# In your Dockerfile:RUN pip install --no-cache-dir 'graphsignal[cu12]'The dstack service file’s commands then becomes just graphsignal-run <your launch command>. Faster cold starts, no per-deploy install.