Introduction

What is Graphsignal

Graphsignal is an AI observability platform. It helps ML engineers and MLOps teams make AI applications run faster and reliably by monitoring and analyzing performance, resources, data and errors. Graphsignal's capabilities enable full visibility into AI applications for any model, data and deployment.

  • Inference tracing and monitoring.
  • Automatic inference profiling.
  • Error and exception tracking.
  • Data monitoring and anomaly detection.

How it works

Graphsignal agent is added to application code. It measures and profiles single and batch inferences or any other functions or data in one time scripts as well as long running server applications.

Graphsignal measures latency, throughput, records operator-level statistics, execution trace and compute utilization, including GPU utilization and memory.

To provide operator and kernel statistics, depending on the ML framework, Graphsignal uses built-in framework profilers internally, which in turn may use NVIDIA® CUDA® profiling capabilities for GPU profiling.

After recording, the performance data is sent to Graphsignal servers, post-processed and is ready to be analyzed at app.graphsignal.com. This allows Graphsignal to run in any environment without the need to install any additional software.

Getting started

  • Sign up for an account.
  • See the Quick Start guide on how to add Graphsignal to your ML notebook, batch job or application.