Machine Learning Profiler for Training and Inference
By Dmitri Melikyan | | 2 min read

Optimized machine learning training and inference lead to faster model iterations, reduced costs and better user experience.

What is a Machine Learning Profiler

Profilers are essential tools for optimizing and troubleshooting application speed, latency and resource consumption. They help reduce computation costs, fix performance issues and improve user experience. Such improvements benefit machine learning profoundly; model training jobs that run for hours or days could be made much shorter and inference latency could be reduced resulting in significantly lower costs and improved user experience.

Similar to traditional profilers, a machine learning profiler provides execution statistics, however, the focus is on ML operations and compute kernels instead of plain method calls. Additionally, ML profilers provide GPU utilization information relevant in machine learning context.

Graphsignal Profiler

TensorFlow and PyTorch provide built-in ML profilers, which utilize NVIDIA® CUDA® profiling interface (CUPTI) under the hood for GPU profiling. One way to use those profilers is via locally installed TensorBoard. In turn, Graphsignal uses the built-in profilers as well as other tools to enable automatic profiling in any environment, including notebooks, training pipelines, periodic batch jobs, model serving and so on, without installing additional software. It also allows teams to share and collaborate online.

Getting Started

Setting up Graphsignal Profiler is simple. Just follow the few instructions in the Quick Start Guide.

Here is a simple model training example with enabled profiler:

import torch

import graphsignal
from graphsignal.profilers.pytorch import profile_step

graphsignal.configure(api_key='my_key', workload_name='training_example')

x = torch.arange(-5, 5, 0.1).view(-1, 1)
y = -5 * x + 0.1 * torch.randn(x.size())

model = torch.nn.Linear(1, 1)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.1)

for epoch in range(10):
    with profile_step():
        y1 = model(x)
        loss = criterion(y1, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

After running this example application, profiles will be available and ready for analysis in Graphsignal Dashboard.

Profile view

In addition to operation and kernel statistics, profiles contain resource usage information, such as GPU and memory utilization, power usage and temperature, collected prior to recording a profile. This information is essential for benchmarking performance optimizations in order to achieve full resource utilization and efficiency.

Resource usage subsection

Learn more about Graphsignal or contact us for a quick demo.