Skip to content

Graphsignal on dstack

See the Quick Start guide on how to install Graphsignal.

dstack provisions GPUs and runs your inference workloads as services. Graphsignal attaches to whatever you’re running — vLLM, SGLang, TensorRT-LLM, raw PyTorch, custom CUDA — with the same one-line pattern: install Graphsignal in the service image (or at startup), pass GRAPHSIGNAL_API_KEY via secrets/env, and wrap your launch command with graphsignal-run.

graphsignal-run recognizes supported inference engines by their argv and configures their OTEL / Prometheus / CUPTI integration automatically. Any other command (python my_app.py, …) is launched with CUPTI attached but without engine-specific argv mutations.

A dstack service file with Graphsignal attached:

type: service
name: my-inference-service
image: <your inference image>
env:
- GRAPHSIGNAL_API_KEY # supplied via dstack secrets/env
commands:
- |
pip install --no-cache-dir 'graphsignal[cu12]' && \
graphsignal-run <your launch command>
port: 8000
resources:
gpu: 24GB

The pip install step is only needed when the image doesn’t already bundle Graphsignal — see the Bake Graphsignal into your image section below.

The dstack SGLang example starts the server with python3 -m sglang.launch_server. Wrapping it with graphsignal-run (and supplying the API key via env) is the only change:

type: service
name: deepseek-r1
image: lmsysorg/sglang:latest
env:
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
- GRAPHSIGNAL_API_KEY
commands:
- |
pip install --no-cache-dir 'graphsignal[cu12]' && \
graphsignal-run python3 -m sglang.launch_server \
--model-path $MODEL_ID \
--port 8000 \
--trust-remote-code
port: 8000
model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
resources:
gpu: 24GB

Deploy with dstack apply:

Terminal window
dstack apply -f service.dstack.yml
type: service
name: qwen-vllm
image: vllm/vllm-openai:latest
env:
- GRAPHSIGNAL_API_KEY
commands:
- |
pip install --no-cache-dir 'graphsignal[cu12]' && \
graphsignal-run vllm serve Qwen/Qwen1.5-7B-Chat --port 8000
port: 8000
resources:
gpu: 24GB

For any non-engine workload (research script, custom serving code, etc.), wrap the launch the same way — graphsignal-run falls back to attaching CUPTI and spawning the watcher without any engine-specific argv mutation:

type: service
name: my-custom-inference
image: my-team/my-inference-image:latest
env:
- GRAPHSIGNAL_API_KEY
commands:
- graphsignal-run python serve.py --port 8000
port: 8000
resources:
gpu: 24GB

If you ship a custom image, install Graphsignal once at build time and drop the pip install from commands:

# In your Dockerfile:
RUN pip install --no-cache-dir 'graphsignal[cu12]'

The dstack service file’s commands then becomes just graphsignal-run <your launch command>. Faster cold starts, no per-deploy install.