API Reference

graphsignal.configure

configure(
    api_key: str = None, 
    workload_name: str = None, 
    run_id: Optional[str] = None, 
    node_rank: Optional[int] = None, 
    local_rank: Optional[int] = None, 
    world_rank: Optional[int] = None, 
    debug_mode: Optional[bool] = False) -> None

Configures and initializes the profiler.

All arguments can also be passed via environment variables: GRAPHSIGNAL_{ARG_NAME}. Arguments passed directly to the function take precedence.

Arguments:

  • api_key: The access key for communication with the Graphsignal cloud.
  • workload_name: Workload name that uniquely identifies this script, job or application.
  • run_id: Unique identifier for a group of worker processes in a distributed workload, e.g. multi-process or multi-node training. It is used to internally group and aggregate profiles collected from different worker processes. See Distributed Workloads section for more information.
  • node_rank: Rank of the worker node.
  • local_rank: Rank of the worker process within the node.
  • world_rank: Global rank of the worker process.
  • debug_mode: Enable/disable debug output.

graphsignal.log_parameter

log_parameter(name: str, value: Any) -> None

Log any run parameters, e.g. hyperparameters or script arguments, to be included in the profiles. Providing parameters enables tracking changes between multiple runs or deployments.

Arguments:

  • name: Parameter name.
  • value: Parameter value. If not of type str, will be converted using str().

Raises:

  • ValueError - When arguments are missing or invalid.

graphsignal.log_metric

log_metric(name: str, value: Union[int, float]) -> None

Log any run metrics, e.g. current accuracy or loss, to be included in the profiles. Providing metrics enables tracking metric dynamics between multiple runs or deployments.

Arguments:

  • name: Parameter name.
  • value: Parameter value.

Raises:

  • ValueError - When arguments are missing or invalid.

graphsignal.shutdown

shutdown() -> None

Clean up and shut down the profiler.

Normally, when python scripts exists, this method is automatically called. Use this method, if you want to explicitly clean up and shut down the profiler.

graphsignal.profilers.tensorflow.profile_step

profile_step(
    phase_name: Optional[str] = None, 
    effective_batch_size: Optional[int] = None, 
    ensure_profile: Optional[bool] = False) -> ProfilingStep

Starts the TensorFlow profiler for a step, e.g. a training batch or prediction. Only some steps will be profiled; the profiler decides which steps to profile for optimal statistics and low overhead.

with context manager can be used around the profiled code. Otherwise, stop() method should be called on the returned ProfilingStep object.

Arguments:

  • phase_name: Name of the run phase, e.g. 'training', 'test' or 'prediction'. Used to group and aggregate profiles.
  • ensure_profile: Enforce profiling for the current step. The number of ensured profiles is limited.
  • effective_batch_size: Effective batch size is the number of samples processed in one step by all devices. It is used to calculate run speed.

Returns:

  • ProfilingStep - step object representing current profiling activity.

graphsignal.profilers.pytorch.profile_step

profile_step(
    phase_name: Optional[str] = None, 
    effective_batch_size: Optional[int] = None, 
    ensure_profile: Optional[bool] = False) -> ProfilingStep

Starts the PyTorch profiler for a step, e.g. a training batch or prediction. Only some steps will be profiled; the profiler decides which steps to profile for optimal statistics and low overhead.

with context manager can be used around the profiled code. Otherwise, stop() method should be called on the returned ProfilingStep object.

Arguments:

  • phase_name: Name of the run phase, e.g. 'training', 'test' or 'prediction'. Used to group and aggregate profiles.
  • ensure_profile: Enforce profiling for the current step. The number of ensured profiles is limited.
  • effective_batch_size: Effective batch size is the number of samples processed in one step by all devices. It is used to calculate run speed.

Returns:

  • ProfilingStep - step object representing current profiling activity.

graphsignal.profilers.generic.profile_step

profile_step(
    phase_name: Optional[str] = None, 
    effective_batch_size: Optional[int] = None, 
    ensure_profile: Optional[bool] = False) -> ProfilingStep

Starts a generic profiler for a step, e.g. a training batch or prediction. ML operation and kernel statistics are not supported by generic profiler.

with context manager can be used around the profiled code. Otherwise, stop() method should be called on the returned ProfilingStep object.

Arguments:

  • phase_name: Name of the run phase, e.g. 'training', 'test' or 'prediction'. Used to group and aggregate profiles.
  • ensure_profile: Enforce profiling for the current step. The number of ensured profiles is limited.
  • effective_batch_size: Effective batch size is the number of samples processed in one step by all devices. It is used to calculate run speed.

Returns:

  • ProfilingStep - step object representing current profiling activity.

graphsignal.ProfilingStep

ProfilingStep object represent current profiling activity. It is returned by framework-specific profilers and should not be initialized directly.

graphsignal.ProfilingStep.set_effective_batch_size

set_effective_batch_size(self, effective_batch_size: int) -> None

Sets effective batch size for the current step.

Arguments:

  • effective_batch_size: Effective batch size is the number of samples processed in one step by all devices. It is used to calculate run speed.

graphsignal.ProfilingStep.stop

stop() -> None

Stops profiling for current step, if profiling is active. This method is automatically called if with context manager is used around profiled code.

graphsignal.profilers.keras.GraphsignalCallback

GraphsignalCallback()

Keras callback interface for automatic profiling of training and/or inference. Only some batches will be profiled; the profiler decides which batch to profile for optimal statistics and low overhead.

Usage: model.fit(..., callbacks=[GraphsignalCallback()]) or model.predict(..., callbacks=[GraphsignalCallback()]).

See Model class for more information on adding callbacks.

graphsignal.profilers.pytorch_lightning.GraphsignalCallback

GraphsignalCallback(batch_size: Optional[int] = None)

PyTorch Lightning callback for automatic profiling of training. Only some batches will be profiled; the profiler decides which batches to profile for optimal statistics and low overhead.

Usage: Trainer(..., callbacks=[GraphsignalCallback()]).

See Trainer class for more information on adding callbacks.

Arguments:

  • batch_size: Batch size of the DataLoader. It is used to calculate run speed.

graphsignal.profilers.huggingface.GraphsignalPTCallback

GraphsignalPTCallback()

Hugging Face PyTorch callback for automatic profiling of training. Only some steps will be profiled; the profiler decides which step to profile for optimal statistics and low overhead.

Usage: Trainer(..., callbacks=[GraphsignalPTCallback()] or trainer.add_callback(GraphsignalPTCallback()).

See Trainer class for more information on adding callbacks.

graphsignal.profilers.huggingface.GraphsignalTFCallback

GraphsignalTFCallback()

Hugging Face TensorFlow callback for automatic profiling of training. Only some steps will be profiled; the profiler decides which step to profile for optimal statistics and low overhead.

Usage: Trainer(..., callbacks=[GraphsignalTFCallback()] or trainer.add_callback(GraphsignalTFCallback()).

See Trainer class for more information on adding callbacks.