Introduction

What is Graphsignal

Graphsignal is an inference observability platform that helps developers accelerate and troubleshoot AI systems. With Graphsignal, developers can:

  • Trace and profile LLM generations, communication, CUDA kernels, batching, and more.
  • Monitor inference performance, CPU/GPU utilization, memory usage, and server metrics.
  • Track and get alerst on errors and exceptions - with contextual data, stack traces, and triggering conditions.
  • Compare performance across models, versions, hardware setups, and optimization configurations.

How it works

Graphsignal tracer is added to application code. It automatically measures and records operations and sessions in single-run scripts as well as long running server applications.

Graphsignal measures latency, throughput, tokens, compute, and device performance. It also computes speed deviations and other performance and reliability indicators.

After recording, the performance data is sent to Graphsignal servers, post-processed and is ready to be analyzed at app.graphsignal.com. This allows Graphsignal to run in any environment without the need to install any additional software.

Getting started

  • Sign up for an account.
  • See the Quick Start guide on how to add Graphsignal to your application.