Graphsignal on dstack

See the Quick Start guide on how to install Graphsignal.

dstack provisions GPUs and runs your inference workloads as services. Graphsignal attaches to whatever you’re running — vLLM, SGLang, TensorRT-LLM, raw PyTorch, custom CUDA — with the same one-line pattern: install Graphsignal in the service image (or at startup), pass GRAPHSIGNAL_API_KEY via secrets/env, and wrap your launch command with graphsignal-run.

graphsignal-run detects supported inference engines and configures their OTEL / Prometheus / GPU profiling automatically. Any other command (python my_app.py, …) is launched with GPU profiling attached but without engine-specific configuration.

Pattern

A dstack service file with Graphsignal attached:

type: service
name: my-inference-service

image: <your inference image>
env:
  - GRAPHSIGNAL_API_KEY        # supplied via dstack secrets/env

commands:
  - |
    pip install --no-cache-dir 'graphsignal[cu12]' && \
    graphsignal-run <your launch command>

port: 8000

resources:
  gpu: 24GB

The pip install step is only needed when the image doesn’t already bundle Graphsignal — see the Bake Graphsignal into your image section below.

Example: SGLang on dstack

The dstack SGLang example starts the server with python3 -m sglang.launch_server. Wrapping it with graphsignal-run (and supplying the API key via env) is the only change:

type: service
name: deepseek-r1

image: lmsysorg/sglang:latest
env:
  - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
  - GRAPHSIGNAL_API_KEY

commands:
  - |
    pip install --no-cache-dir 'graphsignal[cu12]' && \
    graphsignal-run python3 -m sglang.launch_server \
      --model-path $MODEL_ID \
      --port 8000 \
      --trust-remote-code

port: 8000
model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B

resources:
  gpu: 24GB

Deploy with dstack apply:

dstack apply -f service.dstack.yml

Example: vLLM on dstack

type: service
name: qwen-vllm

image: vllm/vllm-openai:latest
env:
  - GRAPHSIGNAL_API_KEY

commands:
  - |
    pip install --no-cache-dir 'graphsignal[cu12]' && \
    graphsignal-run vllm serve Qwen/Qwen1.5-7B-Chat --port 8000

port: 8000

resources:
  gpu: 24GB

Example: a custom Python entry point

For any non-engine workload (research script, custom serving code, etc.), wrap the launch the same way — graphsignal-run falls back to attaching GPU profiling and the profiler sidecar without any engine-specific configuration:

type: service
name: my-custom-inference

image: my-team/my-inference-image:latest
env:
  - GRAPHSIGNAL_API_KEY

commands:
  - graphsignal-run python serve.py --port 8000

port: 8000

resources:
  gpu: 24GB

Bake Graphsignal into your image

If you ship a custom image, install Graphsignal once at build time and drop the pip install from commands:

# In your Dockerfile:
RUN pip install --no-cache-dir 'graphsignal[cu12]'

The dstack service file’s commands then becomes just graphsignal-run <your launch command>. Faster cold starts, no per-deploy install.