NVIDIA GPU Profiling and Monitoring

See the Quick Start guide on how to install and configure Graphsignal.

Graphsignal automatically detects and monitors NVIDIA GPUs via NVML.

What’s captured

  • Metrics: GPU utilization, memory usage (used/free/total/reserved), temperature, power, PCIe and NVLink throughput/utilization, and GPU error indicators (ECC, PCIe/NVLink errors, XID events when supported).

Integration into your Python application

Call graphsignal.configure(...) in your app and run your workload normally:

import graphsignal

graphsignal.configure(api_key="my-api-key")
# or pass the API key via the GRAPHSIGNAL_API_KEY environment variable

Run an application with Graphsignal runner

This is enabled automatically when you launch your app with graphsignal-run (for example, graphsignal-run vllm serve ...).