NVIDIA GPU Profiling and Monitoring
See the Quick Start guide on how to install and configure Graphsignal.
Graphsignal automatically detects and monitors NVIDIA GPUs via NVML.
What’s captured
- Metrics: GPU utilization, memory usage (used/free/total/reserved), temperature, power, PCIe and NVLink throughput/utilization, and GPU error indicators (ECC, PCIe/NVLink errors, XID events when supported).
Integration into your Python application
Call graphsignal.configure(...) in your app and run your workload normally:
import graphsignal
graphsignal.configure(api_key="my-api-key")
# or pass the API key via the GRAPHSIGNAL_API_KEY environment variable
Run an application with Graphsignal runner
This is enabled automatically when you launch your app with graphsignal-run (for example, graphsignal-run vllm serve ...).