Inference Observability

Track 100+ GPU and inference metrics, monitor XID errors and code exceptions, record CUDA profiles, and trace LLM generations - all in one place, all automatically.

Inference tracing

Trace LLM generations, communication, kernel launches, and more.

Inference profiling

Identify top contributors to inference latency.

GPU monitoring

Monitor inference performance, CPU/GPU utilization, and errors.

Issue detection

Track and get alerts on errors and inefficiencies.

Read more about LLM performance optimization

Article

LLM API Latency Optimization Explained

Learn how to make your LLM-powered applications faster.

Mar 25, 2025

Article

Measuring LLM Token Streaming Performance

Learn how to measure and analyze LLM streaming performance using time-to-first-token metrics and traces.

Jan 22, 2024

Article

AI Application Monitoring and Profiling

Learn about challenges of running AI applications and how to address them with new generation of tools.

Oct 3, 2022