Manual Profiling
Profiling Python functions
See the Quick Start for instructions on installing and configuring the Graphsignal SDK.
The Graphsignal Python SDK allows profiling of specific functions to monitor and analyze their performance over time. Some important functions from supported frameworks are profiled automatically. Profiling can be enabled for a function either by registering the function directly or by specifying the function via its import path.
Profiling is available in Python 3.12+.
To profile a function, register it using profile_function() before the function is called:
def slow_transform(x):
...
graphsignal.profile_function(func=slow_transform, category='transform', event_name='data-transform')
slow_transform(data)
You can also profile methods and async functions:
class Worker:
async def process(self):
...
graphsignal.profile_function(func=Worker.process)
To avoid importing the function being profiled directly, or when applying profiling across modules, use profile_function_path():
graphsignal.profile_function_path(path='myapp.tasks.prepare_data', category='preprocessing')
The function referenced by the import path will be resolved at runtime and profiled whenever executed within a traced span.
Profiling CUDA kernels
When profiling GPU workloads, CUDA kernel names are grouped by kernel patterns so that related kernels (e.g. all cuBLAS GEMM kernels) appear under a single event in the profile. You can register custom patterns with profile_cuda_kernel():
# Group kernels whose names contain "cublas" or "my_custom_kernel" under one event
graphsignal.profile_cuda_kernel(kernel_pattern="cublas", event_name="custom_kernels")
graphsignal.profile_cuda_kernel(kernel_pattern="my_custom_kernel", event_name="custom_kernels")
- kernel_pattern — Substring to match against kernel names (case-insensitive). Any kernel whose name contains the pattern is grouped under the given event.
- event_name — Display name for the group in the profile (e.g.
gpu.compute); used for duration and call count.
Register patterns before the kernels run (e.g. at startup or before the first GPU work). Multiple patterns can share the same event_name; all matching kernels are aggregated under that event. The SDK already registers built-in groups (e.g. NCCL, matmul/GEMM, attention); use profile_cuda_kernel() to add or override grouping for your own kernels.