Natively supported frameworks and technologies

TensorFlowKerasPyTorchPyTorch LightningHugging FaceXGBoostJAX

Machine learning profiler for full visibility

Benchmark training speed

Compare changes in parameters, speed and compute across runs.

Analyze time distribution

Understand where the most time and resources are spend.

Analyze operation and kernel statistics

Understand what ML operations and compute kernels consume most time.

See detailed device utilization

Make sure all CPUs and GPUs are utilized as expected.

Monitor speed and usage metrics

Monitor run metrics to catch issues such as memory leaks.

Get visibility into distributed workloads

See all distributed training or inference performance data in one place.

Enable team access

Easily share improvements with team members and others.

Ensure data privacy

Keep data private. No code or data is sent to Graphsignal cloud, only run statistics and metadata.

