Introduction

What is Graphsignal

Graphsignal is a machine learning profiler. It helps data scientists and ML engineers make model training and inference faster and more efficient. It is built for real-world use cases and allows ML practitioners to:

  • Optimize training and inference by benchmarking and analyzing performance summaries, resource usage and ML operations.
  • Start profiling notebooks, scripts and model serving automatically by adding a few lines of code.
  • Use the profiler in local, remote or cloud environment without installing any additional software or opening inbound ports.
  • Keep data private; no code or data is sent to Graphsignal cloud, only run statistics and metadata.

How it works

Graphsignal Profiler is a Python module that is installed and added to machine learning code. It automatically starts and stops profiling for steps, which represent repeatable parts of training or inference code. An example of a step is a training batch in a notebook or a prediction call/batch in a model serving application.

The profiler measures step statistics, resource usage, including GPU utilization and memory, and records environment information.

To provide ML operation and kernel level statistics, depending on the ML framework, Graphsignal uses TensorFlow or PyTorch profilers internally, which in turn may use NVIDIA® CUDA® profiling capabilities for GPU profiling.

After recording a profile, it is sent to Graphsignal cloud, post-processed and is ready to be analyzed at app.graphsignal.com. This allows the profiler to run in any environment without the need to install any additional software.

Getting started

  • Sign up for an account.
  • See the Quick Start Guide guide on how to add the profiler to your ML notebook, batch job or application.