Profiling Machine Learning Using Graphsignal
By Dmitri Melikyan | | 1 min read

Learn how to profile machine learning training and inference using Graphsignal.

Graphsignal is a machine learning profiler that helps data scientists and ML engineers understand, benchmark and analyze model training and inference to make it faster and computationally efficient.

By adding a few lines of code to machine learning notebooks, scripts or applications, Graphsignal automatically profiles TensorFlow, Keras, PyTorch, Hugging Face and other frameworks.

To add the profiler, simply install and import the Python module and configure it by providing an API key and workload name. Get the API key by signing up for a free account. Finally, register the profiler callback or use profile functions depending on the framework.

Here is a minimal example for Keras framework:

# 1. Import Graphsignal modules
import graphsignal
from graphsignal.profilers.keras import GraphsignalCallback

# 2. Configure
graphsignal.configure(api_key='my_key', workload_name='training_example')

....

# 3. Add profiler callback
model.fit(..., callbacks=[GraphsignalCallback()])

You can find more information on other integrations in the Docs.

After starting the run, the profiles are automatically recorded and available for analysis in the cloud dashboard for each run or experiment, where some basic statistics, such as batch time, batch rate, GPU memory and utilization, can be compared.

Profile Timeline

Opening a profile allows you to see much more detailed information about a particular phase of the run.

A profile includes a performance summary, detailed information about the run environment, training or inference speed:

Summary

Compute resources utilization:

Compute

Also, operation and kernel level statistics, which are instrumental for understanding where exactly the most time and compute are spent. For example, in this case, we can see that most of the time is spent on data input operations, so optimizing those may result in significant speed gains.

Operations

To see how runs perform over time, use Metrics dashboard.

Metrics

Add Graphsignal to all your ML workloads that are run manually or executed periodically in ML pipelines, and enable constant team access to automatically recorded profiles and metrics at any time.

As you have seen, trying it out is easy, see Quick Start Guide for instructions or learn more at graphsignal.com.