noether.core.callbacks.base¶

Classes¶

`CallBackBaseConfig`	Internal base class for all registry-based configs.
`CallbackBase`	Base class for callbacks that execute something before/after training.

Module Contents¶

class noether.core.callbacks.base.CallBackBaseConfig(/, **data)¶

Bases: noether.core.schemas.lib._RegistryBase

Internal base class for all registry-based configs.

Provides auto-registration via __init_subclass__. Not meant to be used directly - use specific config base classes instead.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

kind: str | None = None¶

id: str | None = None¶: Optional unique identifier for this callback instance. Required when multiple stateful callbacks of the same type exist (e.g., two BestCheckpointCallbacks tracking different metrics). Used as the key when saving/loading callback state dicts to ensure correct matching on resume.

every_n_epochs: int | None = None¶: Epoch-based interval. Invokes the callback after every n epochs. Mutually exclusive with other intervals.

every_n_updates: int | None = None¶: Update-based interval. Invokes the callback after every n updates. Mutually exclusive with other intervals.

every_n_samples: int | None = None¶: Sample-based interval. Invokes the callback after every n samples. Mutually exclusive with other intervals.

batch_size: int | None = None¶

None (use the same batch_size as for training).

Type:: Batch size to use for this callback. Default

model_config¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_callback_frequency()¶

Ensures that exactly one frequency (‘every_n_*’) is specified and that ‘batch_size’ is present if ‘every_n_samples’ is used.

Return type:: CallBackBaseConfig

classmethod check_positive_values(v)¶

Ensures that all integer-based frequency and batch size fields are positive.

Parameters:: v (int | None)
Return type:: int | None

classmethod check_kind_is_not_empty(v)¶

Ensures the ‘kind’ field is a non-empty string.

Parameters:: v (str)
Return type:: str

class noether.core.callbacks.base.CallbackBase(trainer, model, data_container, tracker, log_writer, checkpoint_writer, metric_property_provider, name=None)¶

Base class for callbacks that execute something before/after training.

Allows overwriting before_training and after_training.

If the callback is stateful (i.e., it tracks something across the training process that needs to be loaded if the run is resumed), there are two ways to implement loading the callback state:

state_dict: write current state into a state dict. When the trainer saves the current checkpoint to the disk, it will also store the state_dict of all callbacks within the trainer state_dict. Once a run is resumed, a callback can load its state from the previously stored state_dict by overwriting the load_state_dict.
resume_from_checkpoint: If a callback is storing large files onto the disk, it would be redundant to also store them within its state_dict. Therefore, this method is called on resume to allow callbacks to load their state from files on the disk.

Callbacks have access to a LogWriter, with which callbacks can log metrics. The LogWriter is a singleton.

Examples

# THIS IS INSIDE A CUSTOM CALLBACK

# log only to experiment tracker, not stdout
self.writer.add_scalar(key="classification_accuracy", value=0.2)
# log to experiment tracker and stdout (as "0.24")
self.writer.add_scalar(
    key="classification_accuracy",
    value=0.23623,
    logger=self.logger,
    format_str=".2f",
)

Note

As evaluations are pretty much always done in torch.no_grad() contexts, the hooks implemented by callbacks are always executed within a torch.no_grad() context.

Parameters:

trainer (noether.training.trainers.BaseTrainer) – Trainer of the current run.
model (noether.core.models.ModelBase) – Model of the current run.
data_container (noether.data.container.DataContainer) – DataContainer instance that provides access to all datasets.
tracker (noether.core.trackers.BaseTracker) – BaseTracker instance to log metrics to stdout/disk/online platform.
log_writer (noether.core.writers.LogWriter) – LogWriter instance to log metrics to stdout/disk/online platform.
checkpoint_writer (noether.core.writers.CheckpointWriter) – CheckpointWriter instance to save checkpoints during training.
metric_property_provider (noether.core.providers.metric_property.MetricPropertyProvider) – MetricPropertyProvider instance to access properties of metrics.
name (str | None) – Name of the callback.

trainer: noether.training.trainers.BaseTrainer¶: Trainer of the current run. Can be used to access training state.

model: noether.core.models.ModelBase¶: Model of the current run. Can be used to access model parameters.

data_container: noether.data.container.DataContainer¶: Data container of the current run. Can be used to access all datasets.

tracker: noether.core.trackers.BaseTracker¶: Tracker of the current run. Can be used for direct access to the experiment tracking platform.

writer: noether.core.writers.LogWriter¶: Log writer of the current run. Can be used to log metrics to stdout/disk/online platform.

metric_property_provider: noether.core.providers.metric_property.MetricPropertyProvider¶: Metric property provider of the current run. Defines properties of metrics (e.g., whether higher values are better).

checkpoint_writer: noether.core.writers.CheckpointWriter¶: Checkpoint writer of the current run. Can be used to store checkpoints during training.

name = None¶

get_children()¶

Return nested callbacks owned by this callback, if any.

Composite callbacks (e.g. EmaCallback when it owns eval_callbacks) use this to expose their children to the trainer so nested PeriodicDataIteratorCallback instances still get their samplers registered on the shared data loader. Dispatch of lifecycle hooks on the children remains the responsibility of the owning callback.

Returns:: List of child callbacks (empty by default).
Return type:: list[CallbackBase]

property checkpoint_key: str¶

Key used to identify this callback’s state in checkpoints.

Returns the callback’s id if set, otherwise falls back to the class name.

Return type:: str

static validate_checkpoint_keys(callbacks)¶

Validate that all stateful callbacks have unique checkpoint keys.

Should be called early (e.g. when callbacks are first assembled) so that duplicate-key errors surface immediately rather than hours into training when the first checkpoint is saved.

Parameters:: callbacks (list[CallbackBase]) – list of callbacks to validate.
Raises:: ValueError – If two stateful callbacks produce the same checkpoint key.
Return type:: None

static build_callback_state_dict(callbacks)¶

Build a keyed dict of state dicts for all stateful callbacks.

Parameters:: callbacks (list[CallbackBase]) – list of callbacks to save state for.
Returns:: Dict mapping checkpoint keys to state dicts (only stateful callbacks included).
Raises:: ValueError – If two stateful callbacks produce the same checkpoint key.
Return type:: dict[str, Any]

static load_callback_state_dicts(callbacks, checkpoint_data, logger)¶

Load state dicts into callbacks, matching by key (dict) or position (legacy list).

Modifies callbacks in-place via their load_state_dict method (analogous to torch.nn.Module.load_state_dict).

Parameters:

callbacks (list[CallbackBase]) – current callbacks to load state into (mutated in-place).
checkpoint_data (dict[str, Any] | list[Any]) – either a dict keyed by checkpoint_key (new format) or a list (legacy format).
logger (logging.Logger) – logger for warnings.

Return type:

None

state_dict()¶

If a callback is stateful, the state will be stored when a checkpoint is stored to the disk.

Returns:: State of the callback. By default, callbacks are non-stateful and return None.
Return type:: dict[str, torch.Tensor] | None

load_state_dict(state_dict)¶

If a callback is stateful, the state will be stored when a checkpoint is stored to the disk and can be loaded with this method upon resuming a run.

Parameters:: state_dict (dict[str, Any]) – State to be loaded. By default, callbacks are non-stateful and load_state_dict does nothing.
Return type:: None

resume_from_checkpoint(resumption_paths, model)¶

If a callback stores large files to disk and is stateful (e.g., an EMA of the model), it would be unnecessarily wasteful to also store the state in the callbacks state_dict. Therefore, resume_from_checkpoint is called when resuming a run, which allows callbacks to load their state from any file that was stored on the disk.

Parameters:

resumption_paths (noether.core.providers.path.PathProvider) – PathProvider instance to access paths from the checkpoint to resume from.
model (noether.core.models.ModelBase) – model of the current training run.

Return type:

None

property logger: logging.Logger¶

Logger for logging to stdout.

Return type:: logging.Logger

before_training(*, update_counter)¶

Hook called once before the training loop starts.

This method is intended to be overridden by derived classes to perform initialization tasks before training begins. Common use cases include:

Initializing experiment tracking (e.g., logging hyperparameters)
Printing model summaries or architecture details
Initializing specific data structures or buffers needed during training
Performing sanity checks on the data or configuration

Note

This method is executed within a torch.no_grad() context.

Parameters:: update_counter (noether.core.utils.training.counter.UpdateCounter) – UpdateCounter instance to access current training progress.
Return type:: None

after_training(*, update_counter)¶

Hook called once after the training loop finishes.

This method is intended to be overridden by derived classes to perform cleanup or final reporting tasks after training is complete. Common use cases include:

Performing a final evaluation on the test set
Saving final model weights or artifacts
Sending notifications (e.g., via Slack or email) about the completed run
Closing or finalizing experiment tracking sessions

Note

This method is executed within a torch.no_grad() context.

Parameters:: update_counter (noether.core.utils.training.counter.UpdateCounter) – UpdateCounter instance to access current training progress.
Return type:: None