noether.training.callbacks.profiler

Classes

PyTorchProfilerCallback

Profiles the training loop with torch.profiler.profile.

Module Contents

class noether.training.callbacks.profiler.PyTorchProfilerCallback(callback_config, **kwargs)

Bases: noether.core.callbacks.periodic.PeriodicCallback

Profiles the training loop with torch.profiler.profile.

The profiler is entered in before_training(), stepped once per optimizer update in track_after_update_step(), and exited in after_training(). Traces are written to <run_output_path>/<trace_subdir> via tensorboard_trace_handler and can be loaded in TensorBoard (tensorboard --logdir <path>) or inspected in chrome://tracing.

Note

every_n_updates=1 must be set so that track_after_update_step is called on every update (any every_n_* value works — it only gates the unused periodic_callback hook, not the tracking hooks).

Example

callbacks:
- kind: callbacks.PyTorchProfilerCallback
    every_n_updates: 1
    wait: 1
    warmup: 1
    active: 3
    repeat: 2
    record_shapes: true
    profile_memory: false
    with_stack: false
    with_flops: false
    with_modules: true
    activities:
    - cpu
    - cuda
Parameters:
before_training(*, update_counter)

Hook called once before the training loop starts.

This method is intended to be overridden by derived classes to perform initialization tasks before training begins. Common use cases include:

  • Initializing experiment tracking (e.g., logging hyperparameters)

  • Printing model summaries or architecture details

  • Initializing specific data structures or buffers needed during training

  • Performing sanity checks on the data or configuration

Note

This method is executed within a torch.no_grad() context.

Parameters:

update_counter (noether.core.utils.training.counter.UpdateCounter) – UpdateCounter instance to access current training progress.

Return type:

None

track_after_update_step(*, update_counter, times)

Hook called after each optimizer update step.

This method is invoked after a successful optimizer step and parameter update. It is typically used for tracking metrics that should be recorded once per update cycle, such as:

  • Latest loss values

  • Learning rates

  • Model parameter statistics (norms, etc.)

  • Training throughput and timing measurements

Unlike periodic_callback(), this hook is called on every update step, making it suitable for maintaining running averages or high-frequency telemetry.

Note

This method is executed within a torch.no_grad() context.

Parameters:
Return type:

None

after_training(*, update_counter)

Hook called once after the training loop finishes.

This method is intended to be overridden by derived classes to perform cleanup or final reporting tasks after training is complete. Common use cases include:

  • Performing a final evaluation on the test set

  • Saving final model weights or artifacts

  • Sending notifications (e.g., via Slack or email) about the completed run

  • Closing or finalizing experiment tracking sessions

Note

This method is executed within a torch.no_grad() context.

Parameters:

update_counter (noether.core.utils.training.counter.UpdateCounter) – UpdateCounter instance to access current training progress.

Return type:

None