API Reference
graphsignal.configure
configure(
api_key: str = None,
workload_name: str = None,
run_id: Optional[str] = None,
node_rank: Optional[int] = None,
local_rank: Optional[int] = None,
world_rank: Optional[int] = None,
debug_mode: Optional[bool] = False) -> None
Configures and initializes the profiler.
All arguments can also be passed via environment variables: GRAPHSIGNAL_{ARG_NAME}
. Arguments passed directly to the function take precedence.
Arguments:
api_key
: The access key for communication with the Graphsignal cloud.workload_name
: Workload name that uniquely identifies this script, job or application.run_id
: Unique identifier for a group of worker processes in a distributed workload, e.g. multi-process or multi-node training. It is used to internally group and aggregate profiles collected from different worker processes. See Distributed Workloads section for more information.node_rank
: Rank of the worker node.local_rank
: Rank of the worker process within the node.world_rank
: Global rank of the worker process.debug_mode
: Enable/disable debug output.
graphsignal.log_parameter
log_parameter(name: str, value: Any) -> None
Log any run parameters, e.g. hyperparameters or script arguments, to be included in the profiles. Providing parameters enables tracking changes between multiple runs or deployments.
Arguments:
name
: Parameter name.value
: Parameter value. If not of typestr
, will be converted usingstr()
.
Raises:
ValueError
- When arguments are missing or invalid.
graphsignal.log_metric
log_metric(name: str, value: Union[int, float]) -> None
Log any run metrics, e.g. current accuracy or loss, to be included in the profiles. Providing metrics enables tracking metric dynamics between multiple runs or deployments.
Arguments:
name
: Parameter name.value
: Parameter value.
Raises:
ValueError
- When arguments are missing or invalid.
graphsignal.shutdown
shutdown() -> None
Clean up and shut down the profiler.
Normally, when python scripts exists, this method is automatically called. Use this method, if you want to explicitly clean up and shut down the profiler.
graphsignal.profilers.tensorflow.profile_step
profile_step(
phase_name: Optional[str] = None,
effective_batch_size: Optional[int] = None,
ensure_profile: Optional[bool] = False) -> ProfilingStep
Starts the TensorFlow profiler for a step, e.g. a training batch or prediction. Only some steps will be profiled; the profiler decides which steps to profile for optimal statistics and low overhead.
with
context manager can be used around the profiled code. Otherwise, stop()
method should be called on the returned ProfilingStep
object.
Arguments:
phase_name
: Name of the run phase, e.g. 'training', 'test' or 'prediction'. Used to group and aggregate profiles.ensure_profile
: Enforce profiling for the current step. The number of ensured profiles is limited.effective_batch_size
: Effective batch size is the number of samples processed in one step by all devices. It is used to calculate run speed.
Returns:
ProfilingStep
- step object representing current profiling activity.
graphsignal.profilers.pytorch.profile_step
profile_step(
phase_name: Optional[str] = None,
effective_batch_size: Optional[int] = None,
ensure_profile: Optional[bool] = False) -> ProfilingStep
Starts the PyTorch profiler for a step, e.g. a training batch or prediction. Only some steps will be profiled; the profiler decides which steps to profile for optimal statistics and low overhead.
with
context manager can be used around the profiled code. Otherwise, stop()
method should be called on the returned ProfilingStep
object.
Arguments:
phase_name
: Name of the run phase, e.g. 'training', 'test' or 'prediction'. Used to group and aggregate profiles.ensure_profile
: Enforce profiling for the current step. The number of ensured profiles is limited.effective_batch_size
: Effective batch size is the number of samples processed in one step by all devices. It is used to calculate run speed.
Returns:
ProfilingStep
- step object representing current profiling activity.
graphsignal.profilers.generic.profile_step
profile_step(
phase_name: Optional[str] = None,
effective_batch_size: Optional[int] = None,
ensure_profile: Optional[bool] = False) -> ProfilingStep
Starts a generic profiler for a step, e.g. a training batch or prediction. ML operation and kernel statistics are not supported by generic profiler.
with
context manager can be used around the profiled code. Otherwise, stop()
method should be called on the returned ProfilingStep
object.
Arguments:
phase_name
: Name of the run phase, e.g. 'training', 'test' or 'prediction'. Used to group and aggregate profiles.ensure_profile
: Enforce profiling for the current step. The number of ensured profiles is limited.effective_batch_size
: Effective batch size is the number of samples processed in one step by all devices. It is used to calculate run speed.
Returns:
ProfilingStep
- step object representing current profiling activity.
graphsignal.ProfilingStep
ProfilingStep
object represent current profiling activity. It is returned by framework-specific profilers and should not be initialized directly.
graphsignal.ProfilingStep.set_effective_batch_size
set_effective_batch_size(self, effective_batch_size: int) -> None
Sets effective batch size for the current step.
Arguments:
effective_batch_size
: Effective batch size is the number of samples processed in one step by all devices. It is used to calculate run speed.
graphsignal.ProfilingStep.stop
stop() -> None
Stops profiling for current step, if profiling is active. This method is automatically called if with
context manager is used around profiled code.
graphsignal.profilers.keras.GraphsignalCallback
GraphsignalCallback()
Keras callback interface for automatic profiling of training and/or inference. Only some batches will be profiled; the profiler decides which batch to profile for optimal statistics and low overhead.
Usage: model.fit(..., callbacks=[GraphsignalCallback()])
or model.predict(..., callbacks=[GraphsignalCallback()])
.
See Model class for more information on adding callbacks.
graphsignal.profilers.pytorch_lightning.GraphsignalCallback
GraphsignalCallback(batch_size: Optional[int] = None)
PyTorch Lightning callback for automatic profiling of training. Only some batches will be profiled; the profiler decides which batches to profile for optimal statistics and low overhead.
Usage: Trainer(..., callbacks=[GraphsignalCallback()])
.
See Trainer class for more information on adding callbacks.
Arguments:
batch_size
: Batch size of theDataLoader
. It is used to calculate run speed.
graphsignal.profilers.huggingface.GraphsignalPTCallback
GraphsignalPTCallback()
Hugging Face PyTorch callback for automatic profiling of training. Only some steps will be profiled; the profiler decides which step to profile for optimal statistics and low overhead.
Usage: Trainer(..., callbacks=[GraphsignalPTCallback()]
or trainer.add_callback(GraphsignalPTCallback())
.
See Trainer class for more information on adding callbacks.
graphsignal.profilers.huggingface.GraphsignalTFCallback
GraphsignalTFCallback()
Hugging Face TensorFlow callback for automatic profiling of training. Only some steps will be profiled; the profiler decides which step to profile for optimal statistics and low overhead.
Usage: Trainer(..., callbacks=[GraphsignalTFCallback()]
or trainer.add_callback(GraphsignalTFCallback())
.
See Trainer class for more information on adding callbacks.