noether.core.callbacks¶

Submodules¶

Exceptions¶

EarlyStopIteration

Custom StopIteration exception for Early Stoppers.

Classes¶

`CallbackBase`	Base class for callbacks that execute something before/after training.
`CallBackBaseConfig`	Internal base class for all registry-based configs.
`BestCheckpointCallback`	Callback to save the best model based on a metric.
`BestCheckpointCallbackConfig`	Internal base class for all registry-based configs.
`CheckpointCallback`	Callback to save the model and optimizer state periodically.
`CheckpointCallbackConfig`	Internal base class for all registry-based configs.
`EmaCallback`	Callback for exponential moving average (EMA) of model weights.
`EmaCallbackConfig`	Internal base class for all registry-based configs.
`DatasetStatsCallback`	A callback that logs the length of each dataset in the data container. Is initialized by the `BaseTrainer` and should not be added manually to the trainer's callbacks.
`EtaCallback`	Callback to print the progress and estimated duration until the periodic callback will be invoked.
`LrCallback`	Callback to log the learning rate of the optimizer.
`OnlineLossCallback`	Callback to track the loss of the model after every gradient accumulation step and log the average loss.
`OnlineLossCallbackConfig`	Internal base class for all registry-based configs.
`ParamCountCallback`	Callback to log the number of trainable and frozen parameters of the model.
`PeakMemoryCallback`	Callback to log the peak memory usage of the model. Is initialized by the `BaseTrainer` and should not be added manually to the trainer's callbacks.
`ProgressCallback`	Callback to print the progress of the training such as number of epochs and updates.
`TrainTimeCallback`	Callback to log the time spent on dataloading. Is initialized by the `BaseTrainer` and should not be added manually to the trainer's callbacks.
`EarlyStopperBase`	Base class for early stoppers that is used to define the interface for early stoppers used by the trainers.
`FixedEarlyStopper`	Early stopper (training) based on a fixed number of epochs, updates, or samples.
`FixedEarlyStopperConfig`
`MetricEarlyStopper`	Early stopper (training) based on a metric value to be monitored.
`MetricEarlyStopperConfig`	Internal base class for all registry-based configs.
`BestMetricCallback`	A callback that keeps track of the best metric value over a training run for a certain metric (i.e., source_metric_key) while also logging one or more target metrics.
`BestMetricCallbackConfig`	Internal base class for all registry-based configs.
`TrackAdditionalOutputsCallback`	Callback that is invoked during training after every gradient step to track certain outputs from the update step.
`TrackAdditionalOutputsCallbackConfig`	Internal base class for all registry-based configs.
`PeriodicCallback`	Base class for callbacks that are invoked periodically during training.
`PeriodicDataIteratorCallback`	Base class for callbacks that perform periodic iterations over a dataset.
`PeriodicDataIteratorCallbackConfig`	Internal base class for all registry-based configs.

Package Contents¶

class noether.core.callbacks.CallbackBase(trainer, model, data_container, tracker, log_writer, checkpoint_writer, metric_property_provider, name=None)¶

Base class for callbacks that execute something before/after training.

Allows overwriting before_training and after_training.

If the callback is stateful (i.e., it tracks something across the training process that needs to be loaded if the run is resumed), there are two ways to implement loading the callback state:

state_dict: write current state into a state dict. When the trainer saves the current checkpoint to the disk, it will also store the state_dict of all callbacks within the trainer state_dict. Once a run is resumed, a callback can load its state from the previously stored state_dict by overwriting the load_state_dict.
resume_from_checkpoint: If a callback is storing large files onto the disk, it would be redundant to also store them within its state_dict. Therefore, this method is called on resume to allow callbacks to load their state from files on the disk.

Callbacks have access to a LogWriter, with which callbacks can log metrics. The LogWriter is a singleton.

Examples

# THIS IS INSIDE A CUSTOM CALLBACK

# log only to experiment tracker, not stdout
self.writer.add_scalar(key="classification_accuracy", value=0.2)
# log to experiment tracker and stdout (as "0.24")
self.writer.add_scalar(
    key="classification_accuracy",
    value=0.23623,
    logger=self.logger,
    format_str=".2f",
)

Note

As evaluations are pretty much always done in torch.no_grad() contexts, the hooks implemented by callbacks are always executed within a torch.no_grad() context.

Parameters:

trainer (noether.training.trainers.BaseTrainer) – Trainer of the current run.
model (noether.core.models.ModelBase) – Model of the current run.
data_container (noether.data.container.DataContainer) – DataContainer instance that provides access to all datasets.
tracker (noether.core.trackers.BaseTracker) – BaseTracker instance to log metrics to stdout/disk/online platform.
log_writer (noether.core.writers.LogWriter) – LogWriter instance to log metrics to stdout/disk/online platform.
checkpoint_writer (noether.core.writers.CheckpointWriter) – CheckpointWriter instance to save checkpoints during training.
metric_property_provider (noether.core.providers.metric_property.MetricPropertyProvider) – MetricPropertyProvider instance to access properties of metrics.
name (str | None) – Name of the callback.

trainer: noether.training.trainers.BaseTrainer¶: Trainer of the current run. Can be used to access training state.

model: noether.core.models.ModelBase¶: Model of the current run. Can be used to access model parameters.

data_container: noether.data.container.DataContainer¶: Data container of the current run. Can be used to access all datasets.

tracker: noether.core.trackers.BaseTracker¶: Tracker of the current run. Can be used for direct access to the experiment tracking platform.

writer: noether.core.writers.LogWriter¶: Log writer of the current run. Can be used to log metrics to stdout/disk/online platform.

metric_property_provider: noether.core.providers.metric_property.MetricPropertyProvider¶: Metric property provider of the current run. Defines properties of metrics (e.g., whether higher values are better).

checkpoint_writer: noether.core.writers.CheckpointWriter¶: Checkpoint writer of the current run. Can be used to store checkpoints during training.

name = None¶

get_children()¶

Return nested callbacks owned by this callback, if any.

Composite callbacks (e.g. EmaCallback when it owns eval_callbacks) use this to expose their children to the trainer so nested PeriodicDataIteratorCallback instances still get their samplers registered on the shared data loader. Dispatch of lifecycle hooks on the children remains the responsibility of the owning callback.

Returns:: List of child callbacks (empty by default).
Return type:: list[CallbackBase]

property checkpoint_key: str¶

Key used to identify this callback’s state in checkpoints.

Returns the callback’s id if set, otherwise falls back to the class name.

Return type:: str

static validate_checkpoint_keys(callbacks)¶

Validate that all stateful callbacks have unique checkpoint keys.

Should be called early (e.g. when callbacks are first assembled) so that duplicate-key errors surface immediately rather than hours into training when the first checkpoint is saved.

Parameters:: callbacks (list[CallbackBase]) – list of callbacks to validate.
Raises:: ValueError – If two stateful callbacks produce the same checkpoint key.
Return type:: None

static build_callback_state_dict(callbacks)¶

Build a keyed dict of state dicts for all stateful callbacks.

Parameters:: callbacks (list[CallbackBase]) – list of callbacks to save state for.
Returns:: Dict mapping checkpoint keys to state dicts (only stateful callbacks included).
Raises:: ValueError – If two stateful callbacks produce the same checkpoint key.
Return type:: dict[str, Any]

static load_callback_state_dicts(callbacks, checkpoint_data, logger)¶

Load state dicts into callbacks, matching by key (dict) or position (legacy list).

Modifies callbacks in-place via their load_state_dict method (analogous to torch.nn.Module.load_state_dict).

Parameters:

callbacks (list[CallbackBase]) – current callbacks to load state into (mutated in-place).
checkpoint_data (dict[str, Any] | list[Any]) – either a dict keyed by checkpoint_key (new format) or a list (legacy format).
logger (logging.Logger) – logger for warnings.

Return type:

None

state_dict()¶

If a callback is stateful, the state will be stored when a checkpoint is stored to the disk.

Returns:: State of the callback. By default, callbacks are non-stateful and return None.
Return type:: dict[str, torch.Tensor] | None

load_state_dict(state_dict)¶

If a callback is stateful, the state will be stored when a checkpoint is stored to the disk and can be loaded with this method upon resuming a run.

Parameters:: state_dict (dict[str, Any]) – State to be loaded. By default, callbacks are non-stateful and load_state_dict does nothing.
Return type:: None

resume_from_checkpoint(resumption_paths, model)¶

If a callback stores large files to disk and is stateful (e.g., an EMA of the model), it would be unnecessarily wasteful to also store the state in the callbacks state_dict. Therefore, resume_from_checkpoint is called when resuming a run, which allows callbacks to load their state from any file that was stored on the disk.

Parameters:

resumption_paths (noether.core.providers.path.PathProvider) – PathProvider instance to access paths from the checkpoint to resume from.
model (noether.core.models.ModelBase) – model of the current training run.

Return type:

None

property logger: logging.Logger¶

Logger for logging to stdout.

Return type:: logging.Logger

before_training(*, update_counter)¶

Hook called once before the training loop starts.

This method is intended to be overridden by derived classes to perform initialization tasks before training begins. Common use cases include:

Initializing experiment tracking (e.g., logging hyperparameters)
Printing model summaries or architecture details
Initializing specific data structures or buffers needed during training
Performing sanity checks on the data or configuration

Note

This method is executed within a torch.no_grad() context.

Parameters:: update_counter (noether.core.utils.training.counter.UpdateCounter) – UpdateCounter instance to access current training progress.
Return type:: None

after_training(*, update_counter)¶

Hook called once after the training loop finishes.

This method is intended to be overridden by derived classes to perform cleanup or final reporting tasks after training is complete. Common use cases include:

Performing a final evaluation on the test set
Saving final model weights or artifacts
Sending notifications (e.g., via Slack or email) about the completed run
Closing or finalizing experiment tracking sessions

Note

This method is executed within a torch.no_grad() context.

Parameters:: update_counter (noether.core.utils.training.counter.UpdateCounter) – UpdateCounter instance to access current training progress.
Return type:: None

class noether.core.callbacks.CallBackBaseConfig(/, **data)¶

Bases: noether.core.schemas.lib._RegistryBase

Internal base class for all registry-based configs.

Provides auto-registration via __init_subclass__. Not meant to be used directly - use specific config base classes instead.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

kind: str | None = None¶

id: str | None = None¶: Optional unique identifier for this callback instance. Required when multiple stateful callbacks of the same type exist (e.g., two BestCheckpointCallbacks tracking different metrics). Used as the key when saving/loading callback state dicts to ensure correct matching on resume.

every_n_epochs: int | None = None¶: Epoch-based interval. Invokes the callback after every n epochs. Mutually exclusive with other intervals.

every_n_updates: int | None = None¶: Update-based interval. Invokes the callback after every n updates. Mutually exclusive with other intervals.

every_n_samples: int | None = None¶: Sample-based interval. Invokes the callback after every n samples. Mutually exclusive with other intervals.

batch_size: int | None = None¶

None (use the same batch_size as for training).

Type:: Batch size to use for this callback. Default

model_config¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_callback_frequency()¶

Ensures that exactly one frequency (‘every_n_*’) is specified and that ‘batch_size’ is present if ‘every_n_samples’ is used.

Return type:: CallBackBaseConfig

classmethod check_positive_values(v)¶

Ensures that all integer-based frequency and batch size fields are positive.

Parameters:: v (int | None)
Return type:: int | None

classmethod check_kind_is_not_empty(v)¶

Ensures the ‘kind’ field is a non-empty string.

Parameters:: v (str)
Return type:: str

class noether.core.callbacks.BestCheckpointCallback(callback_config, **kwargs)¶

Bases: noether.core.callbacks.periodic.PeriodicCallback

Callback to save the best model based on a metric.

This callback monitors a specified metric and saves the model checkpoint whenever a new best value is achieved. It supports storing different model components when using a composite model and can save checkpoints at different tolerance thresholds.

Example config:

callbacks:
  - kind: noether.core.callbacks.BestCheckpointCallback
    name: BestCheckpointCallback
    every_n_epochs: 1
    metric_key: loss/val/total
    model_names:  # only applies when training a CompositeModel
      - encoder
    eval_callbacks:
      - kind: noether.training.callbacks.OfflineLossCallback
        every_n_epochs: 1  # ignored; the parent triggers on new-best
        dataset_key: test

Parameters:

callback_config (BestCheckpointCallbackConfig) – Configuration for the callback. See BestCheckpointCallbackConfig for available options including metric key, model names, and tolerance settings.
**kwargs – Additional arguments passed to the parent class.

metric_key¶

model_names¶

higher_is_better¶

best_metric_value¶

save_frozen_weights¶

tolerances_is_exceeded¶

tolerance_counter = 0¶

metric_at_exceeded_tolerance: dict[float, float]¶

eval_callbacks: list[noether.core.callbacks.periodic.PeriodicCallback] = []¶

get_children()¶

Non-iterator children only — iterator children are owned end-to-end here and must not be registered on the shared InterleavedSampler (we build their loaders on dispatch instead). The trainer always passes batch_size to every PeriodicCallback hook, so we can build child loaders without needing the trainer’s iterator-args bundle.

Return type:: list[noether.core.callbacks.base.CallbackBase]

state_dict()¶

Return the state of the callback for checkpointing.

Returns:: Dictionary containing the best metric value, tolerance tracking state, and counter information.
Return type:: dict[str, Any]

load_state_dict(state_dict)¶

Load the callback state from a checkpoint.

Note

This modifies the input state_dict in place.

Parameters:: state_dict (dict[str, Any]) – Dictionary containing the saved callback state.
Return type:: None

before_training(*, update_counter, **kwargs)¶

Validate callback configuration before training starts.

Parameters:

update_counter – The training update counter.
**kwargs – Additional keyword arguments forwarded to child eval callbacks.

Raises:

NotImplementedError – If resuming training with tolerances is attempted.

Return type:

None

periodic_callback(*, interval_type, **kwargs)¶

Execute the periodic callback to check and save best model.

This method is called at the configured frequency (e.g., every N epochs). It checks if the current metric value is better than the previous best, and if so, saves the model checkpoint. Also tracks tolerance-based checkpoints.

When a new best is detected, child eval callbacks (if configured) are dispatched against the live (newly-best) model. Iterator children iterate their own DataLoader (built on first use) — they do not consume from the trainer’s shared data_iter.

On interval_type="eval" (post-training eval, where the trainer loads the saved best checkpoint into the live model and calls every callback’s at_eval), children are dispatched unconditionally so they evaluate the loaded best model. No checkpoint save / tolerance bookkeeping runs in eval mode (the in-memory best_metric_value starts at ±inf in a fresh eval process).

Raises:: KeyError – If the log cache is empty or the metric key is not found.
Parameters:: interval_type (noether.core.callbacks.periodic.IntervalType)
Return type:: None

after_training(**kwargs)¶

Log the best metric values at different tolerance thresholds after training completes.

Parameters:: **kwargs – Additional keyword arguments forwarded to child eval callbacks.
Return type:: None

class noether.core.callbacks.BestCheckpointCallbackConfig(/, **data)¶

Bases: noether.core.callbacks.base.CallBackBaseConfig

Internal base class for all registry-based configs.

Provides auto-registration via __init_subclass__. Not meant to be used directly - use specific config base classes instead.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

name: Literal['BestCheckpointCallback'] = None¶

metric_key: str = None¶: “The key of the metric to be used for checking the best model.

save_frozen_weights: bool = None¶: Whether to also save the frozen weights of the model.

tolerances: list[int] | None = None¶: “If provided, this callback will produce multiple best models which differ in the amount of intervals they allow the metric to not improve. For example, tolerance=[5] with every_n_epochs=1 will store a checkpoint where at most 5 epochs have passed until the metric improved. Additionally, the best checkpoint over the whole training will always be stored (i.e., tolerance=infinite). When setting different tolerances, one can evaluate different early stopping configurations with one training run.

model_names: list[str] | None = None¶: Which model name to save (e.g., if only the encoder of an autoencoder should be stored, one could pass model_name=’encoder’ here). This only applies when training a CompositeModel. If None, all models are saved.

eval_callbacks: list[Annotated[Any, Discriminated(CallBackBaseConfig)]] | None = None¶

Optional nested callbacks to dispatch whenever a new best model is detected. Each child’s metric keys are automatically prefixed with best=<metric_key>/ (slashes in the metric key are replaced with dots) so they don’t collide with the live-model metrics. Children are invoked via their at_eval hook, which bypasses their own schedule — the trigger is the new-best event, not the child’s every_n_*. Tolerance- exceeded saves do not trigger children. before_training and after_training are forwarded unconditionally so children can initialize and finalize cleanly.

PeriodicDataIteratorCallback children get a dedicated DataLoader built from their sampler_config; they are not registered on the shared InterleavedSampler. This means a child’s every_n_* is irrelevant here (only the dataset_key / batch_size / pipeline matter) and the child’s schedule does not need to match this callback’s.

class noether.core.callbacks.CheckpointCallback(callback_config, **kwargs)¶

Bases: noether.core.callbacks.periodic.PeriodicCallback

Callback to save the model and optimizer state periodically.

Example config:

- kind: noether.core.callbacks.CheckpointCallback
  name: CheckpointCallback
  every_n_epochs: 1
  save_weights: true
  save_optim: true

Parameters:

callback_config (CheckpointCallbackConfig) – Configuration for the callback. See CheckpointCallbackConfig for available options.
**kwargs – Additional arguments passed to the parent class.

save_weights¶

save_optim¶

save_latest_weights¶

save_latest_optim¶

model_names¶

before_training(*, update_counter)¶

Hook called once before the training loop starts.

This method is intended to be overridden by derived classes to perform initialization tasks before training begins. Common use cases include:

Initializing experiment tracking (e.g., logging hyperparameters)
Printing model summaries or architecture details
Initializing specific data structures or buffers needed during training
Performing sanity checks on the data or configuration

Note

This method is executed within a torch.no_grad() context.

Parameters:: update_counter (noether.core.utils.training.UpdateCounter) – UpdateCounter instance to access current training progress.
Return type:: None

periodic_callback(*, interval_type, update_counter, **kwargs)¶

Hook called periodically based on the configured intervals.

This method is the primary entry point for periodic actions in subclasses. It is triggered when any of the configured intervals (every_n_epochs, every_n_updates, or every_n_samples) are reached.

Subclasses should override this method to implement periodic logic such as:

Calculating and logging expensive validation metrics
Saving specific model checkpoints or artifacts
Visualizing training progress (e.g., plotting samples)
Adjusting training hyperparameters or model state

Note

This method is executed within a torch.no_grad() context.

Parameters:

interval_type (noether.core.callbacks.periodic.IntervalType) – “epoch”, “update”, “sample” or “eval” indicating which interval triggered this callback.
update_counter (noether.core.utils.training.UpdateCounter) – UpdateCounter instance providing details about the current training progress (epoch, update, sample counts).
**kwargs – Additional keyword arguments passed from the triggering hook (e.g., from after_epoch() or after_update()).

Return type:

None

after_training(**_)¶

Hook called once after the training loop finishes.

This method is intended to be overridden by derived classes to perform cleanup or final reporting tasks after training is complete. Common use cases include:

Performing a final evaluation on the test set
Saving final model weights or artifacts
Sending notifications (e.g., via Slack or email) about the completed run
Closing or finalizing experiment tracking sessions

Note

This method is executed within a torch.no_grad() context.

Parameters:: update_counter – UpdateCounter instance to access current training progress.
Return type:: None

class noether.core.callbacks.CheckpointCallbackConfig(/, **data)¶

Bases: noether.core.callbacks.base.CallBackBaseConfig

Internal base class for all registry-based configs.

Provides auto-registration via __init_subclass__. Not meant to be used directly - use specific config base classes instead.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

name: Literal['CheckpointCallback'] = None¶

save_weights: bool = None¶: Whether to save the weights of the model every time this callback is invoked. The checkpoint name will contain the training iteration (e.g., epoch/update/sample) at which the checkpoint was saved.

save_optim: bool = None¶: Whether to save the optimizer state every time this callback is invoked. The checkpoint name will contain the training iteration (e.g., epoch/update/sample) at which the checkpoint was saved.

save_latest_weights: bool = None¶: Whether to save the latest weights of the model every time this callback is invoked. Note that the latest weights are always overwritten on the next invocation of this callback.

save_latest_optim: bool = None¶: Whether to save the latest optimizer state every time this callback is invoked. Note that the latest optimizer state is always overwritten on the next invocation of this callback

model_names: list[str] | None = None¶: The name of the model to save. If None, all models are saved.

class noether.core.callbacks.EmaCallback(callback_config, **kwargs)¶

Bases: noether.core.callbacks.periodic.PeriodicCallback

Callback for exponential moving average (EMA) of model weights.

In addition to maintaining and checkpointing EMA weights, this callback can optionally own a list of child evaluation callbacks via eval_callbacks. At each eval-time hook (after_epoch, after_update, at_eval) the EMA weights are swapped into the live model, the children are dispatched, and the live weights are restored. Children are dispatched once per target_factor and their metric keys are automatically prefixed with ema=<factor>/ to avoid collisions with live-model metrics.

Example config:

- kind: noether.core.callbacks.EmaCallback
  every_n_epochs: 10
  save_weights: false
  save_last_weights: false
  save_latest_weights: true
  target_factors:
    - 0.9999
  name: EmaCallback
  eval_callbacks:
    - kind: noether.training.callbacks.OfflineLossCallback
      every_n_epochs: 1
      dataset_key: val

Parameters:

callback_config (EmaCallbackConfig) – Configuration for the callback. See EmaCallbackConfig for available options.
**kwargs – Additional arguments passed to the parent class.

model_paths¶

target_factors¶

save_weights¶

save_last_weights¶

save_latest_weights¶

parameters: dict[tuple[str | None, float], dict[str, torch.Tensor]]¶

buffers: dict[str | None, dict[str, torch.Tensor]]¶

eval_callbacks: dict[float, list[noether.core.callbacks.base.CallbackBase]]¶

get_children()¶

Flat list of child eval callbacks (across all target_factors).

Exposed to the trainer so nested PeriodicDataIteratorCallback instances have their samplers registered on the shared data loader. The EMA callback remains responsible for dispatching lifecycle hooks to its children.

Return type:: list[noether.core.callbacks.base.CallbackBase]

resume_from_checkpoint(resumption_paths, model)¶

Resume EMA state from a checkpoint.

Tries cp=latest first (written by periodic saves), then cp=last (written by after_training, e.g. on graceful signal interrupt). If neither exists, falls back to initializing EMA from the current model weights.

Parameters:

resumption_paths (noether.core.providers.path.PathProvider) – PathProvider with paths to checkpoint files.
model – Model to resume EMA state for.

Return type:

None

before_training(**kwargs)¶

Hook called once before the training loop starts.

This method is intended to be overridden by derived classes to perform initialization tasks before training begins. Common use cases include:

Initializing experiment tracking (e.g., logging hyperparameters)
Printing model summaries or architecture details
Initializing specific data structures or buffers needed during training
Performing sanity checks on the data or configuration

Note

This method is executed within a torch.no_grad() context.

Parameters:: update_counter – UpdateCounter instance to access current training progress.
Return type:: None

apply_ema(cur_model, model_path, target_factor)¶: fused in-place implementation

track_after_update_step(**_)¶

Hook called after each optimizer update step.

This method is invoked after a successful optimizer step and parameter update. It is typically used for tracking metrics that should be recorded once per update cycle, such as:

Latest loss values
Learning rates
Model parameter statistics (norms, etc.)
Training throughput and timing measurements

Unlike periodic_callback(), this hook is called on every update step, making it suitable for maintaining running averages or high-frequency telemetry.

Note

This method is executed within a torch.no_grad() context.

Parameters:

update_counter – UpdateCounter instance to access current training progress.
times – Dictionary containing time measurements for various parts of the training step (e.g., ‘data_time’, ‘forward_time’, ‘backward_time’, ‘update_time’).

Return type:

None

after_epoch(update_counter, **kwargs)¶

Invoked after every epoch to check if callback should be invoked.

Applies torch.no_grad() context.

Parameters:

update_counter (noether.core.utils.training.counter.UpdateCounter) – UpdateCounter instance to access current training progress.
**kwargs – Additional keyword arguments.

Return type:

None

after_update(update_counter, **kwargs)¶

Invoked after every update to check if callback should be invoked.

Applies torch.no_grad() context.

Parameters:

update_counter (noether.core.utils.training.counter.UpdateCounter) – UpdateCounter instance to access current training progress.
**kwargs – Additional keyword arguments.

Return type:

None

at_eval(update_counter, **kwargs)¶

Parameters:: update_counter (noether.core.utils.training.counter.UpdateCounter)
Return type:: None

periodic_callback(*, interval_type, update_counter, **_)¶

Hook called periodically based on the configured intervals.

Subclasses should override this method to implement periodic logic such as:

Calculating and logging expensive validation metrics
Saving specific model checkpoints or artifacts
Visualizing training progress (e.g., plotting samples)
Adjusting training hyperparameters or model state

Note

This method is executed within a torch.no_grad() context.

Parameters:

interval_type (noether.core.callbacks.periodic.IntervalType) – “epoch”, “update”, “sample” or “eval” indicating which interval triggered this callback.
update_counter – UpdateCounter instance providing details about the current training progress (epoch, update, sample counts).
**kwargs – Additional keyword arguments passed from the triggering hook (e.g., from after_epoch() or after_update()).

Return type:

None

after_training(**kwargs)¶

Hook called once after the training loop finishes.

This method is intended to be overridden by derived classes to perform cleanup or final reporting tasks after training is complete. Common use cases include:

Performing a final evaluation on the test set
Saving final model weights or artifacts
Sending notifications (e.g., via Slack or email) about the completed run
Closing or finalizing experiment tracking sessions

Note

This method is executed within a torch.no_grad() context.

Parameters:: update_counter – UpdateCounter instance to access current training progress.
Return type:: None

class noether.core.callbacks.EmaCallbackConfig(/, **data)¶

Bases: noether.core.callbacks.base.CallBackBaseConfig

Internal base class for all registry-based configs.

Provides auto-registration via __init_subclass__. Not meant to be used directly - use specific config base classes instead.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

name: Literal['EmaCallback'] = None¶

target_factors: list[float] = None¶: The factors for the EMA.

model_paths: list[str | None] | None = None¶: The paths to the models to apply the EMA to (i.e., composite_model.encoder/composite_model.decoder, path of the PyTorch nn.Modules in the checkpoint). If None, the EMA is applied to the whole model. When training with a CompositeModel, the paths on the submodules (i.e., ‘encoder’, ‘decoder’, etc.) should be provided via this field, otherwise the EMA will be applied to the CompositeModel as a whole which is not possible to restore later on.

save_weights: bool = None¶: Whether to save the EMA weights.

save_last_weights: bool = None¶: Save the weights of the model when training is over (i.e., at the end of training, save the EMA weights).

save_latest_weights: bool = None¶: Save the latest EMA weights. Note that the latest weights are always overwritten on the next invocation of this callback.

eval_callbacks: list[Annotated[Any, Discriminated(CallBackBaseConfig)]] | None = None¶: Optional nested periodic callbacks to run against EMA weights. Each child retains its own schedule (every_n_epochs etc.); the EMA callback swaps its stored EMA parameters into the live model around eval-time hooks (after_epoch, after_update, at_eval) and restores the live weights on exit. Children are dispatched once per target_factor and their metric keys are automatically prefixed with ema=<factor>/ to avoid collisions with live-model metrics. Note: before_training and after_training are forwarded without swapping, so EMA initialization and the final save see live weights.

class noether.core.callbacks.DatasetStatsCallback(trainer, model, data_container, tracker, log_writer, checkpoint_writer, metric_property_provider, name=None)¶

Bases: noether.core.callbacks.base.CallbackBase

A callback that logs the length of each dataset in the data container. Is initialized by the BaseTrainer and should not be added manually to the trainer’s callbacks.

Parameters:

trainer (noether.training.trainers.BaseTrainer) – Trainer of the current run.
model (noether.core.models.ModelBase) – Model of the current run.
data_container (noether.data.container.DataContainer) – DataContainer instance that provides access to all datasets.
tracker (noether.core.trackers.BaseTracker) – BaseTracker instance to log metrics to stdout/disk/online platform.
log_writer (noether.core.writers.LogWriter) – LogWriter instance to log metrics to stdout/disk/online platform.
checkpoint_writer (noether.core.writers.CheckpointWriter) – CheckpointWriter instance to save checkpoints during training.
metric_property_provider (noether.core.providers.metric_property.MetricPropertyProvider) – MetricPropertyProvider instance to access properties of metrics.
name (str | None) – Name of the callback.

before_training(**_)¶

Hook called once before the training loop starts.

This method is intended to be overridden by derived classes to perform initialization tasks before training begins. Common use cases include:

Initializing experiment tracking (e.g., logging hyperparameters)
Printing model summaries or architecture details
Initializing specific data structures or buffers needed during training
Performing sanity checks on the data or configuration

Note

This method is executed within a torch.no_grad() context.

Parameters:: update_counter – UpdateCounter instance to access current training progress.
Return type:: None

class noether.core.callbacks.EtaCallback(callback_config, **kwargs)¶

Bases: noether.core.callbacks.periodic.PeriodicCallback

Callback to print the progress and estimated duration until the periodic callback will be invoked.

Also counts up the current epoch/update/samples and provides the average update duration. Only used in “unmanaged” runs, i.e., it is not used when the run was started via SLURM.

This callback is initialized by the BaseTrainer and should not be added manually to the trainer’s callbacks.

Parameters:

callback_config (noether.core.callbacks.base.CallBackBaseConfig) – Configuration for the callback. See CallBackBaseConfig for available options.
trainer – Trainer of the current run.
model – Model of the current run.
data_container – DataContainer instance that provides access to all datasets.
tracker – BaseTracker instance to log metrics to stdout/disk/online platform.
log_writer – LogWriter instance to log metrics.
checkpoint_writer – CheckpointWriter instance to save checkpoints.
metric_property_provider – MetricPropertyProvider instance to access properties of metrics.
name – Name of the callback.

class LoggerWasCalledHandler¶

Bases: logging.Handler

Handler instances dispatch logging events to specific destinations.

The base handler class. Acts as a placeholder which defines the Handler interface. Handlers can optionally use Formatter instances to format records as desired. By default, no formatter is specified; in this case, the ‘raw’ message as determined by record.message is logged.

Initializes the instance - basically setting the formatter to None and the filter list to empty.

was_called = False¶

emit(_)¶

Do whatever it takes to actually log the specified logging record.

This version is intended to be implemented by subclasses and so raises a NotImplementedError.

total_time = 0.0¶

time_since_last_log = 0.0¶

handler¶

before_training(*, update_counter)¶

Hook called once before the training loop starts.

This method is intended to be overridden by derived classes to perform initialization tasks before training begins. Common use cases include:

Initializing experiment tracking (e.g., logging hyperparameters)
Printing model summaries or architecture details
Initializing specific data structures or buffers needed during training
Performing sanity checks on the data or configuration

Note

This method is executed within a torch.no_grad() context.

Parameters:: update_counter (noether.core.utils.training.UpdateCounter) – UpdateCounter instance to access current training progress.
Return type:: None

track_after_update_step(*, update_counter, times)¶

Hook called after each optimizer update step.

This method is invoked after a successful optimizer step and parameter update. It is typically used for tracking metrics that should be recorded once per update cycle, such as:

Latest loss values
Learning rates
Model parameter statistics (norms, etc.)
Training throughput and timing measurements

Unlike periodic_callback(), this hook is called on every update step, making it suitable for maintaining running averages or high-frequency telemetry.

Note

This method is executed within a torch.no_grad() context.

Parameters:

update_counter (noether.core.utils.training.UpdateCounter) – UpdateCounter instance to access current training progress.
times – Dictionary containing time measurements for various parts of the training step (e.g., ‘data_time’, ‘forward_time’, ‘backward_time’, ‘update_time’).

Return type:

None

periodic_callback(*, interval_type, **_)¶

Hook called periodically based on the configured intervals.

Subclasses should override this method to implement periodic logic such as:

Calculating and logging expensive validation metrics
Saving specific model checkpoints or artifacts
Visualizing training progress (e.g., plotting samples)
Adjusting training hyperparameters or model state

Note

This method is executed within a torch.no_grad() context.

Parameters:

interval_type – “epoch”, “update”, “sample” or “eval” indicating which interval triggered this callback.
update_counter – UpdateCounter instance providing details about the current training progress (epoch, update, sample counts).
**kwargs – Additional keyword arguments passed from the triggering hook (e.g., from after_epoch() or after_update()).

Return type:

None

after_training(**_)¶

Hook called once after the training loop finishes.

This method is intended to be overridden by derived classes to perform cleanup or final reporting tasks after training is complete. Common use cases include:

Performing a final evaluation on the test set
Saving final model weights or artifacts
Sending notifications (e.g., via Slack or email) about the completed run
Closing or finalizing experiment tracking sessions

Note

This method is executed within a torch.no_grad() context.

Parameters:: update_counter – UpdateCounter instance to access current training progress.
Return type:: None

class noether.core.callbacks.LrCallback(*args, **kwargs)¶

Bases: noether.core.callbacks.periodic.PeriodicCallback

Callback to log the learning rate of the optimizer.

This callback is initialized by the BaseTrainer and should not be added manually to the trainer’s callbacks.

Parameters:

callback_config – Configuration for the callback. See CallBackBaseConfig for available options.
trainer – Trainer of the current run.
model – Model of the current run.
data_container – DataContainer instance that provides access to all datasets.
tracker – BaseTracker instance to log metrics to stdout/disk/online platform.
log_writer – LogWriter instance to log metrics.
checkpoint_writer – CheckpointWriter instance to save checkpoints.
metric_property_provider – MetricPropertyProvider instance to access properties of metrics.
name – Name of the callback.
args (Any)
kwargs (Any)

periodic_callback(**_)¶

Hook called periodically based on the configured intervals.

Subclasses should override this method to implement periodic logic such as:

Calculating and logging expensive validation metrics
Saving specific model checkpoints or artifacts
Visualizing training progress (e.g., plotting samples)
Adjusting training hyperparameters or model state

Note

This method is executed within a torch.no_grad() context.

Parameters:

interval_type – “epoch”, “update”, “sample” or “eval” indicating which interval triggered this callback.
update_counter – UpdateCounter instance providing details about the current training progress (epoch, update, sample counts).
**kwargs – Additional keyword arguments passed from the triggering hook (e.g., from after_epoch() or after_update()).

Return type:

None

class noether.core.callbacks.OnlineLossCallback(callback_config, **kwargs)¶

Bases: noether.core.callbacks.periodic.PeriodicCallback

Callback to track the loss of the model after every gradient accumulation step and log the average loss.

This callback is initialized by the BaseTrainer and should not be added manually to the trainer’s callbacks.

Initialize the OnlineLossCallback.

Parameters:

callback_config (OnlineLossCallbackConfig) – Configuration for the callback. See OnlineLossCallbackConfig for available options.
**kwargs – Additional arguments passed to the parent class.

verbose¶

tracked_losses: collections.defaultdict[str, list[torch.Tensor]]¶

track_after_accumulation_step(*, losses, **_)¶

Hook called after each individual gradient accumulation step.

This method is invoked for every batch processed during training, regardless of whether an optimizer update is performed in that step (i.e., when accumulation_steps > 1). It is primarily used for tracking metrics that should be averaged or aggregated across accumulation steps.

Common use cases include:

Logging per-batch losses for high-frequency monitoring
Accumulating statistics across batches before an optimizer update
Implementing custom logging that needs access to individual batch data

Note

This method is generally intended to be called within a torch.no_grad() context by the trainer to ensure no gradients are tracked during logging operations.

Parameters:

update_counter – UpdateCounter instance to access current training progress.
batch – The current data batch processed in this accumulation step.
losses – Dictionary of computed losses for the current batch.
update_outputs – Optional dictionary of model outputs for the current batch.
accumulation_steps – Total number of accumulation steps before an optimizer update.
accumulation_step – The current accumulation step index (0-indexed).

Return type:

None

periodic_callback(*, interval_type, **_)¶

Hook called periodically based on the configured intervals.

Subclasses should override this method to implement periodic logic such as:

Calculating and logging expensive validation metrics
Saving specific model checkpoints or artifacts
Visualizing training progress (e.g., plotting samples)
Adjusting training hyperparameters or model state

Note

This method is executed within a torch.no_grad() context.

Parameters:

interval_type (noether.core.callbacks.periodic.IntervalType) – “epoch”, “update”, “sample” or “eval” indicating which interval triggered this callback.
update_counter – UpdateCounter instance providing details about the current training progress (epoch, update, sample counts).
**kwargs – Additional keyword arguments passed from the triggering hook (e.g., from after_epoch() or after_update()).

Return type:

None

class noether.core.callbacks.OnlineLossCallbackConfig(/, **data)¶

Bases: noether.core.callbacks.base.CallBackBaseConfig

Internal base class for all registry-based configs.

Provides auto-registration via __init_subclass__. Not meant to be used directly - use specific config base classes instead.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

name: Literal['OnlineLossCallback'] = None¶

verbose: bool = None¶: Whether to also log to the (console) logger. If False, the loss will only logged to the experiment tracker.

class noether.core.callbacks.ParamCountCallback(trainer, model, data_container, tracker, log_writer, checkpoint_writer, metric_property_provider, name=None)¶

Bases: noether.core.callbacks.base.CallbackBase

Callback to log the number of trainable and frozen parameters of the model.

This callback is initialized by the BaseTrainer and should not be added manually to the trainer’s callbacks.

Parameters:

trainer (noether.training.trainers.BaseTrainer) – Trainer of the current run.
model (noether.core.models.ModelBase) – Model of the current run.
data_container (noether.data.container.DataContainer) – DataContainer instance that provides access to all datasets.
tracker (noether.core.trackers.BaseTracker) – BaseTracker instance to log metrics to stdout/disk/online platform.
log_writer (noether.core.writers.LogWriter) – LogWriter instance to log metrics to stdout/disk/online platform.
checkpoint_writer (noether.core.writers.CheckpointWriter) – CheckpointWriter instance to save checkpoints during training.
metric_property_provider (noether.core.providers.metric_property.MetricPropertyProvider) – MetricPropertyProvider instance to access properties of metrics.
name (str | None) – Name of the callback.

before_training(**_)¶

Hook called once before the training loop starts.

This method is intended to be overridden by derived classes to perform initialization tasks before training begins. Common use cases include:

Initializing experiment tracking (e.g., logging hyperparameters)
Printing model summaries or architecture details
Initializing specific data structures or buffers needed during training
Performing sanity checks on the data or configuration

Note

This method is executed within a torch.no_grad() context.

Parameters:: update_counter – UpdateCounter instance to access current training progress.
Return type:: None

class noether.core.callbacks.PeakMemoryCallback(callback_config, trainer, model, data_container, tracker, log_writer, checkpoint_writer, metric_property_provider, name=None)¶

Bases: noether.core.callbacks.periodic.PeriodicCallback

Callback to log the peak memory usage of the model. Is initialized by the BaseTrainer and should not be added manually to the trainer’s callbacks.

Parameters:

callback_config (noether.core.callbacks.base.CallBackBaseConfig) – Configuration for the callback. See CallBackBaseConfig for available options.
trainer – Trainer of the current run.
model (noether.core.models.ModelBase) – Model of the current run.
data_container (noether.data.container.DataContainer) – DataContainer instance that provides access to all datasets.
tracker (noether.core.trackers.BaseTracker) – BaseTracker instance to log metrics to stdout/disk/online platform.
log_writer (noether.core.writers.LogWriter) – LogWriter instance to log metrics.
checkpoint_writer (noether.core.writers.CheckpointWriter) – CheckpointWriter instance to save checkpoints.
metric_property_provider (noether.core.providers.MetricPropertyProvider) – MetricPropertyProvider instance to access properties of metrics.
name (str | None) – Name of the callback.

periodic_callback(**__)¶

Hook called periodically based on the configured intervals.

Subclasses should override this method to implement periodic logic such as:

Calculating and logging expensive validation metrics
Saving specific model checkpoints or artifacts
Visualizing training progress (e.g., plotting samples)
Adjusting training hyperparameters or model state

Note

This method is executed within a torch.no_grad() context.

Parameters:

interval_type – “epoch”, “update”, “sample” or “eval” indicating which interval triggered this callback.
update_counter – UpdateCounter instance providing details about the current training progress (epoch, update, sample counts).
**kwargs – Additional keyword arguments passed from the triggering hook (e.g., from after_epoch() or after_update()).

Return type:

None

class noether.core.callbacks.ProgressCallback(callback_config, trainer, model, data_container, tracker, log_writer, checkpoint_writer, metric_property_provider, name=None)¶

Bases: noether.core.callbacks.periodic.PeriodicCallback

Callback to print the progress of the training such as number of epochs and updates.

This callback is initialized by the BaseTrainer and should not be added manually to the trainer’s callbacks.

Parameters:

callback_config (noether.core.callbacks.base.CallBackBaseConfig) – Configuration for the callback. See CallBackBaseConfig for available options.
trainer – Trainer of the current run.
model (noether.core.models.ModelBase) – Model of the current run.
data_container (noether.data.container.DataContainer) – DataContainer instance that provides access to all datasets.
tracker (noether.core.trackers.BaseTracker) – BaseTracker instance to log metrics to stdout/disk/online platform.
log_writer (noether.core.writers.LogWriter) – LogWriter instance to log metrics.
checkpoint_writer (noether.core.writers.CheckpointWriter) – CheckpointWriter instance to save checkpoints.
metric_property_provider (noether.core.providers.MetricPropertyProvider) – MetricPropertyProvider instance to access properties of metrics.
name (str | None) – Name of the callback.

before_training(**_)¶

Hook called once before the training loop starts.

This method is intended to be overridden by derived classes to perform initialization tasks before training begins. Common use cases include:

Initializing experiment tracking (e.g., logging hyperparameters)
Printing model summaries or architecture details
Initializing specific data structures or buffers needed during training
Performing sanity checks on the data or configuration

Note

This method is executed within a torch.no_grad() context.

Parameters:: update_counter – UpdateCounter instance to access current training progress.
Return type:: None

periodic_callback(*, interval_type, update_counter, **_)¶

Hook called periodically based on the configured intervals.

Subclasses should override this method to implement periodic logic such as:

Calculating and logging expensive validation metrics
Saving specific model checkpoints or artifacts
Visualizing training progress (e.g., plotting samples)
Adjusting training hyperparameters or model state

Note

This method is executed within a torch.no_grad() context.

Parameters:

interval_type – “epoch”, “update”, “sample” or “eval” indicating which interval triggered this callback.
update_counter (noether.core.utils.training.UpdateCounter) – UpdateCounter instance providing details about the current training progress (epoch, update, sample counts).
**kwargs – Additional keyword arguments passed from the triggering hook (e.g., from after_epoch() or after_update()).

Return type:

None

track_after_update_step(*, update_counter, **_)¶

Hook called after each optimizer update step.

This method is invoked after a successful optimizer step and parameter update. It is typically used for tracking metrics that should be recorded once per update cycle, such as:

Latest loss values
Learning rates
Model parameter statistics (norms, etc.)
Training throughput and timing measurements

Unlike periodic_callback(), this hook is called on every update step, making it suitable for maintaining running averages or high-frequency telemetry.

Note

This method is executed within a torch.no_grad() context.

Parameters:

update_counter (noether.core.utils.training.UpdateCounter) – UpdateCounter instance to access current training progress.
times – Dictionary containing time measurements for various parts of the training step (e.g., ‘data_time’, ‘forward_time’, ‘backward_time’, ‘update_time’).

Return type:

None

class noether.core.callbacks.TrainTimeCallback(callback_config, **kwargs)¶

Bases: noether.core.callbacks.periodic.PeriodicCallback

Callback to log the time spent on dataloading. Is initialized by the BaseTrainer and should not be added manually to the trainer’s callbacks.

Parameters:

callback_config (noether.core.callbacks.base.CallBackBaseConfig) – Configuration for the callback. See CallBackBaseConfig for available options.
trainer – Trainer of the current run.
model – Model of the current run.
data_container – DataContainer instance that provides access to all datasets.
tracker – BaseTracker instance to log metrics to stdout/disk/online platform.
log_writer – LogWriter instance to log metrics.
checkpoint_writer – CheckpointWriter instance to save checkpoints.
metric_property_provider – MetricPropertyProvider instance to access properties of metrics.
name – Name of the callback.

train_times: dict[str, list[float]]¶

total_train_times: dict[str, torch.Tensor]¶

track_after_update_step(*, times, **_)¶

Hook called after each optimizer update step.

This method is invoked after a successful optimizer step and parameter update. It is typically used for tracking metrics that should be recorded once per update cycle, such as:

Latest loss values
Learning rates
Model parameter statistics (norms, etc.)
Training throughput and timing measurements

Unlike periodic_callback(), this hook is called on every update step, making it suitable for maintaining running averages or high-frequency telemetry.

Note

This method is executed within a torch.no_grad() context.

Parameters:

update_counter – UpdateCounter instance to access current training progress.
times (dict[str, float]) – Dictionary containing time measurements for various parts of the training step (e.g., ‘data_time’, ‘forward_time’, ‘backward_time’, ‘update_time’).

Return type:

None

periodic_callback(**_)¶

Hook called periodically based on the configured intervals.

Subclasses should override this method to implement periodic logic such as:

Calculating and logging expensive validation metrics
Saving specific model checkpoints or artifacts
Visualizing training progress (e.g., plotting samples)
Adjusting training hyperparameters or model state

Note

This method is executed within a torch.no_grad() context.

Parameters:

interval_type – “epoch”, “update”, “sample” or “eval” indicating which interval triggered this callback.
update_counter – UpdateCounter instance providing details about the current training progress (epoch, update, sample counts).
**kwargs – Additional keyword arguments passed from the triggering hook (e.g., from after_epoch() or after_update()).

Return type:

None

after_training(**_)¶

Hook called once after the training loop finishes.

This method is intended to be overridden by derived classes to perform cleanup or final reporting tasks after training is complete. Common use cases include:

Performing a final evaluation on the test set
Saving final model weights or artifacts
Sending notifications (e.g., via Slack or email) about the completed run
Closing or finalizing experiment tracking sessions

Note

This method is executed within a torch.no_grad() context.

Parameters:: update_counter – UpdateCounter instance to access current training progress.
Return type:: None

exception noether.core.callbacks.EarlyStopIteration¶

Bases: StopIteration

Custom StopIteration exception for Early Stoppers.

Initialize self. See help(type(self)) for accurate signature.

class noether.core.callbacks.EarlyStopperBase(callback_config, trainer, model, data_container, tracker, log_writer, checkpoint_writer, metric_property_provider, name=None)¶

Bases: noether.core.callbacks.periodic.PeriodicCallback

Base class for early stoppers that is used to define the interface for early stoppers used by the trainers.

Parameters:

callback_config (noether.core.callbacks.base.CallBackBaseConfig) – Configuration for the callback. See CallBackBaseConfig for available options.
trainer – Trainer of the current run.
model (noether.core.models.ModelBase) – Model of the current run.
data_container (noether.data.container.DataContainer) – DataContainer instance that provides access to all datasets.
tracker (noether.core.trackers.BaseTracker) – BaseTracker instance to log metrics to stdout/disk/online platform.
log_writer (noether.core.writers.LogWriter) – LogWriter instance to log metrics.
checkpoint_writer (noether.core.writers.CheckpointWriter) – CheckpointWriter instance to save checkpoints.
metric_property_provider (noether.core.providers.MetricPropertyProvider) – MetricPropertyProvider instance to access properties of metrics.
name (str | None) – Name of the callback.

to_short_interval_string()¶

Convert the interval to a short string representation used for logging.

Return type:: str

periodic_callback(*, interval_type, update_counter, **kwargs)¶

Check if training should stop and raise exception if needed.

Parameters:

interval_type (noether.core.callbacks.periodic.IntervalType) – Type of interval that triggered this callback.
update_counter (noether.core.utils.training.UpdateCounter) – UpdateCounter instance with current training state.
**kwargs – Additional keyword arguments.

Raises:

EarlyStopIteration – If training should be stopped based on the stopping criterion.

Return type:

None

class noether.core.callbacks.FixedEarlyStopper(callback_config, **kwargs)¶

Bases: noether.core.callbacks.early_stoppers.base.EarlyStopperBase

Early stopper (training) based on a fixed number of epochs, updates, or samples.

Example config:

- kind: noether.core.callbacks.FixedEarlyStopper
  stop_at_epoch: 10
  name: FixedEarlyStopper

Parameters:

callback_config (FixedEarlyStopperConfig) – The configuration for the callback. See FixedEarlyStopperConfig for available options.
**kwargs – Additional arguments to pass to the parent class.

stop_at_sample¶

stop_at_update¶

stop_at_epoch¶

class noether.core.callbacks.FixedEarlyStopperConfig(/, **data)¶

Bases: pydantic.BaseModel

Parameters:: data (Any)

kind: str | None = None¶

name: Literal['FixedEarlyStopper'] = None¶

stop_at_sample: int | None = None¶

stop_at_update: int | None = None¶

stop_at_epoch: int | None = None¶

validate_callback_frequency()¶

Ensures that exactly one stop (‘stop_at_*’) is specified

Return type:: FixedEarlyStopperConfig

class noether.core.callbacks.MetricEarlyStopper(callback_config, **kwargs)¶

Bases: noether.core.callbacks.early_stoppers.base.EarlyStopperBase

Early stopper (training) based on a metric value to be monitored.

Example config:

- kind: noether.core.callbacks.MetricEarlyStopper
  every_n_epochs: 1
  metric_key: loss/val/total
  tolerance: 0.10
  name: MetricEarlyStopper

Parameters:

callback_config (MetricEarlyStopperConfig) – Configuration for the callback. See MetricEarlyStopperConfig for available options including metric key and tolerance.
**kwargs – Additional arguments to pass to the parent class.

metric_key¶

higher_is_better¶

tolerance¶

tolerance_counter = 0¶

best_metric¶

class noether.core.callbacks.MetricEarlyStopperConfig(/, **data)¶

Bases: noether.core.callbacks.base.CallBackBaseConfig

Internal base class for all registry-based configs.

Provides auto-registration via __init_subclass__. Not meant to be used directly - use specific config base classes instead.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

name: Literal['MetricEarlyStopper'] = None¶

metric_key: str¶: The key of the metric to monitor

tolerance: int¶: The number of times the metric can stagnate before stopping training

classmethod check_tolerance_positive(v)¶

Ensures that tolerance is at least 1.

Parameters:: v (int)
Return type:: int

class noether.core.callbacks.BestMetricCallback(callback_config, **kwargs)¶

Bases: noether.core.callbacks.periodic.PeriodicCallback

A callback that keeps track of the best metric value over a training run for a certain metric (i.e., source_metric_key) while also logging one or more target metrics.

For example, track the test loss the epoch with the best validation loss to simulate early stopping.

Example config:

- kind: noether.core.callbacks.BestMetricCallback
  every_n_epochs: 1
  source_metric_key: loss/val/total
  target_metric_keys:
    -  loss/test/total

In this example, whenever a new best validation loss is found, the corresponding test loss is logged under the key loss/test/total/at_best/loss/val/total.

Parameters:

callback_config (BestMetricCallbackConfig) – Configuration for the callback. See BestMetricCallbackConfig for available options including source and target metric keys.
**kwargs – Additional keyword arguments provided to the parent class.

source_metric_key¶

target_metric_keys¶

optional_target_metric_keys¶

higher_is_better¶

best_metric_value¶

previous_log_values: dict[str, Any]¶

periodic_callback(**__)¶

Hook called periodically based on the configured intervals.

Subclasses should override this method to implement periodic logic such as:

Calculating and logging expensive validation metrics
Saving specific model checkpoints or artifacts
Visualizing training progress (e.g., plotting samples)
Adjusting training hyperparameters or model state

Note

This method is executed within a torch.no_grad() context.

Parameters:

interval_type – “epoch”, “update”, “sample” or “eval” indicating which interval triggered this callback.
update_counter – UpdateCounter instance providing details about the current training progress (epoch, update, sample counts).
**kwargs – Additional keyword arguments passed from the triggering hook (e.g., from after_epoch() or after_update()).

Return type:

None

class noether.core.callbacks.BestMetricCallbackConfig(/, **data)¶

Bases: noether.core.callbacks.base.CallBackBaseConfig

Internal base class for all registry-based configs.

Provides auto-registration via __init_subclass__. Not meant to be used directly - use specific config base classes instead.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

name: Literal['BestMetricCallback'] = None¶: The metric to use to dermine whether the current model obtained a new best (e.g., loss/valid/total)

source_metric_key: str = None¶: The metrics to keep track of (e.g., loss/test/total)

target_metric_keys: list[str] | None = None¶: The metrics to keep track of if they are present (useful when different model configurations log different evaluation metrics to avoid reconfiguring the callback).

optional_target_metric_keys: list[str] | None = None¶

class noether.core.callbacks.TrackAdditionalOutputsCallback(callback_config, **kwargs)¶

Bases: noether.core.callbacks.periodic.PeriodicCallback

Callback that is invoked during training after every gradient step to track certain outputs from the update step. The update_outputs that are provided in the track_after_accumulation_step method are the additional_outputs field from the TrainerResult returned by the trainer’s update step.

The update_outputs that are provided in the track_after_accumulation_step method are the additional_outputs field from the TrainerResult returned by the trainer’s update step.

The provided update_outputs are assumed to be a dictionary and outputs that match keys or patterns are tracked. An update output matches if either the key matches exactly, e.g. {“some_output”: …} and keys[“some_output”]; or if one of the patterns is contained in the update key name, e.g. {“some_loss”: …} and patterns = [“loss”].

Example config:

- kind: noether.core.callbacks.TrackAdditionalOutputsCallback
  name: TrackAdditionalOutputsCallback
  every_n_updates: 1
  keys:
    - "surface_pressure_loss"

Parameters:

callback_config (TrackAdditionalOutputsCallbackConfig) – Configuration for the callback. See TrackAdditionalOutputsCallbackConfig for available options including keys and patterns to track.
**kwargs – Additional keyword arguments provided to the parent class.

out: pathlib.Path | None¶

patterns¶

keys¶

verbose¶

tracked_values: collections.defaultdict[str, list]¶

reduce¶

log_output¶

save_output¶

track_after_accumulation_step(*, update_counter, update_outputs, **_)¶

Track the specified outputs after each accumulation step.

Parameters:

update_counter – UpdateCounter object to track the number of updates.
update_outputs – The additional_outputs field from the TrainerResult returned by the trainer’s update step. Note that the base train_step method in the base trainer does not provide any additional outputs by default, and hence this callback can only be used if the train_step is modified to provide additional outputs.
**_ – Additional unused keyword arguments.

Return type:

None

periodic_callback(*, update_counter, **_)¶

Hook called periodically based on the configured intervals.

Subclasses should override this method to implement periodic logic such as:

Calculating and logging expensive validation metrics
Saving specific model checkpoints or artifacts
Visualizing training progress (e.g., plotting samples)
Adjusting training hyperparameters or model state

Note

This method is executed within a torch.no_grad() context.

Parameters:

interval_type – “epoch”, “update”, “sample” or “eval” indicating which interval triggered this callback.
update_counter (noether.core.utils.training.UpdateCounter) – UpdateCounter instance providing details about the current training progress (epoch, update, sample counts).
**kwargs – Additional keyword arguments passed from the triggering hook (e.g., from after_epoch() or after_update()).

Return type:

None

class noether.core.callbacks.TrackAdditionalOutputsCallbackConfig(/, **data)¶

Bases: noether.core.callbacks.base.CallBackBaseConfig

Internal base class for all registry-based configs.

Provides auto-registration via __init_subclass__. Not meant to be used directly - use specific config base classes instead.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

name: Literal['TrackAdditionalOutputsCallback'] = None¶

keys: list[str] | None = None¶: List of keys to track in the additional_outputs of the TrainerResult returned by the trainer’s update step.

patterns: list[str] | None = None¶: List of patterns to track in the additional_outputs of the TrainerResult returned by the trainer’s update step. Matched if it is contained in one of the update_outputs keys.

verbose: bool = None¶: If True uses the logger to print the tracked values otherwise uses no logger.

reduce: Literal['mean', 'last'] = None¶: The reduction method to be applied to the tracked values to reduce to scalar. Currently supports ‘mean’ and ‘last’.

log_output: bool = None¶: Whether to log the tracked scalar values.

save_output: bool = None¶: Whether to save the tracked scalar values to disk.

class noether.core.callbacks.PeriodicCallback(callback_config, trainer, model, data_container, tracker, log_writer, checkpoint_writer, metric_property_provider, name=None)¶

Bases: noether.core.callbacks.base.CallbackBase

Base class for callbacks that are invoked periodically during training.

PeriodicCallback extends CallbackBase to support periodic execution based on training progress. Callbacks can be configured to run at regular intervals defined by epochs, updates (optimizer steps), or samples (data points processed). This class implements the infrastructure for periodic invocation while child classes define the actual behavior via the periodic_callback() method.

Interval Configuration:

Callbacks can be configured to run periodically using one or more of:

every_n_epochs: Execute callback every N epochs
every_n_updates: Execute callback every N optimizer updates
every_n_samples: Execute callback every N samples processed

Tracking vs. Periodic Execution:

The class provides two types of hooks:

Tracking hooks (track_after_accumulation_step(), track_after_update_step()): Called on every accumulation/update step to track metrics continuously (e.g., for running averages). I.e., if you want to log an exponential moving average of the loss every epoch, the logging is done in the periodic callback; however, the tracking of the loss values for computing the moving average is done in the tracking hook.
Periodic hook (periodic_callback()): Called only when the configured interval is reached, typically for expensive operations like evaluation or checkpointing.

Examples

Creating a custom periodic callback that logs metrics every 10 epochs:

class CustomMetricCallback(PeriodicCallback):
    def periodic_callback(
        self,
        *,
        interval_type: IntervalType,
        update_counter: UpdateCounter,
        **kwargs,
    ) -> None:
        # This method is called every 10 epochs
        metric_value = self.compute_expensive_metric()
        self.writer.add_scalar(
            key="custom_metric",
            value=metric_value,
            logger=self.logger,
        )


# Configure in YAML:
# callbacks:
#   - kind: path.to.CustomMetricCallback
#     every_n_epochs: 10

Tracking metrics at every update and logging periodically:

class RunningAverageCallback(PeriodicCallback):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.loss_accumulator = []

    def track_after_update_step(self, *, update_counter: UpdateCounter, times: dict[str, float]) -> None:
        # Track at every update
        self.loss_accumulator.append(self.trainer.last_loss)

    def periodic_callback(
        self,
        *,
        interval_type: IntervalType,
        update_counter: UpdateCounter,
        **kwargs,
    ) -> None:
        # Log periodically
        avg_loss = sum(self.loss_accumulator) / len(self.loss_accumulator)
        self.writer.add_scalar("avg_loss", avg_loss, logger=self.logger)
        self.loss_accumulator.clear()

every_n_epochs¶: If set, callback is invoked every N epochs.

every_n_updates¶: If set, callback is invoked every N optimizer updates.

every_n_samples¶: If set, callback is invoked every N samples processed.

batch_size¶: Batch size used during training.

Parameters:

callback_config (noether.core.callbacks.base.CallBackBaseConfig) – Configuration for the callback. See CallBackBaseConfig for available options.
trainer – Trainer of the current run.
model (noether.core.models.ModelBase) – Model of the current run.
data_container (noether.data.container.DataContainer) – DataContainer instance that provides access to all datasets.
tracker (noether.core.trackers.BaseTracker) – BaseTracker instance to log metrics to stdout/disk/online platform.
log_writer (noether.core.writers.LogWriter) – LogWriter instance to log metrics.
checkpoint_writer (noether.core.writers.CheckpointWriter) – CheckpointWriter instance to save checkpoints.
metric_property_provider (noether.core.providers.MetricPropertyProvider) – MetricPropertyProvider instance to access properties of metrics.
name (str | None) – Name of the callback.

every_n_epochs¶

every_n_updates¶

every_n_samples¶

batch_size¶

periodic_callback(*, interval_type, update_counter, **kwargs)¶

Hook called periodically based on the configured intervals.

Subclasses should override this method to implement periodic logic such as:

Calculating and logging expensive validation metrics
Saving specific model checkpoints or artifacts
Visualizing training progress (e.g., plotting samples)
Adjusting training hyperparameters or model state

Note

This method is executed within a torch.no_grad() context.

Parameters:

interval_type (IntervalType) – “epoch”, “update”, “sample” or “eval” indicating which interval triggered this callback.
update_counter (noether.core.utils.training.counter.UpdateCounter) – UpdateCounter instance providing details about the current training progress (epoch, update, sample counts).
**kwargs – Additional keyword arguments passed from the triggering hook (e.g., from after_epoch() or after_update()).

Return type:

None

track_after_accumulation_step(*, update_counter, batch, losses, update_outputs, accumulation_steps, accumulation_step)¶

Hook called after each individual gradient accumulation step.

Common use cases include:

Logging per-batch losses for high-frequency monitoring
Accumulating statistics across batches before an optimizer update
Implementing custom logging that needs access to individual batch data

Note

This method is generally intended to be called within a torch.no_grad() context by the trainer to ensure no gradients are tracked during logging operations.

Parameters:

update_counter (noether.core.utils.training.counter.UpdateCounter) – UpdateCounter instance to access current training progress.
batch (Any) – The current data batch processed in this accumulation step.
losses (dict[str, torch.Tensor]) – Dictionary of computed losses for the current batch.
update_outputs (dict[str, torch.Tensor] | None) – Optional dictionary of model outputs for the current batch.
accumulation_steps (int) – Total number of accumulation steps before an optimizer update.
accumulation_step (int) – The current accumulation step index (0-indexed).

Return type:

None

track_after_update_step(*, update_counter, times)¶

Hook called after each optimizer update step.

This method is invoked after a successful optimizer step and parameter update. It is typically used for tracking metrics that should be recorded once per update cycle, such as:

Latest loss values
Learning rates
Model parameter statistics (norms, etc.)
Training throughput and timing measurements

Unlike periodic_callback(), this hook is called on every update step, making it suitable for maintaining running averages or high-frequency telemetry.

Note

This method is executed within a torch.no_grad() context.

Parameters:

update_counter (noether.core.utils.training.counter.UpdateCounter) – UpdateCounter instance to access current training progress.
times (dict[str, float]) – Dictionary containing time measurements for various parts of the training step (e.g., ‘data_time’, ‘forward_time’, ‘backward_time’, ‘update_time’).

Return type:

None

after_epoch(update_counter, **kwargs)¶

Invoked after every epoch to check if callback should be invoked.

Applies torch.no_grad() context.

Parameters:

update_counter (noether.core.utils.training.counter.UpdateCounter) – UpdateCounter instance to access current training progress.
**kwargs – Additional keyword arguments.

Return type:

None

after_update(update_counter, **kwargs)¶

Invoked after every update to check if callback should be invoked.

Applies torch.no_grad() context.

Parameters:

update_counter (noether.core.utils.training.counter.UpdateCounter) – UpdateCounter instance to access current training progress.
**kwargs – Additional keyword arguments.

Return type:

None

at_eval(update_counter, **kwargs)¶

Parameters:: update_counter (noether.core.utils.training.counter.UpdateCounter)
Return type:: None

updates_till_next_invocation(update_counter)¶

Calculate how many updates remain until this callback is invoked.

Parameters:: update_counter (noether.core.utils.training.counter.UpdateCounter) – UpdateCounter instance to access current training progress.
Returns:: Number of updates remaining until the next callback invocation.
Return type:: int

updates_per_interval(update_counter)¶

Calculate how many updates are from one invocation of this callback to the next.

Parameters:: update_counter (noether.core.utils.training.counter.UpdateCounter) – UpdateCounter instance to access current training progress.
Returns:: Number of updates between callback invocations.
Return type:: int

get_interval_string_verbose()¶

Return interval configuration as a verbose string.

Returns:: Interval as, e.g., “every_n_epochs=1” for epoch-based intervals.
Return type:: str

to_short_interval_string()¶

Return interval configuration as a short string.

Returns:: Interval as, e.g., “E1” if every_n_epochs=1 for epoch-based intervals.
Return type:: str

class noether.core.callbacks.PeriodicDataIteratorCallback(callback_config, trainer, model, data_container, tracker, log_writer, checkpoint_writer, metric_property_provider, name=None, development=False)¶

Bases: PeriodicCallback

Base class for callbacks that perform periodic iterations over a dataset.

PeriodicDataIteratorCallback extends PeriodicCallback to support evaluations or computations that require iterating over an entire dataset. This is commonly used for validation/test set evaluation, computing metrics on held-out data, or any operation that needs to process batches from a dataset at regular training intervals.

The class integrates with the training data pipeline by registering samplers that control when and how data is loaded. It handles the complete iteration workflow: data loading, batch processing, result collation across distributed ranks, and final processing.

Workflow:

Iteration (_iterate_over_dataset()): When the periodic interval is reached, iterate through the dataset in batches.
Process Data (process_data()): Process a single batch (e.g., run model inference) and return results.
Collation (_collate_result()): Aggregate results across all batches and distributed ranks.
Processing (process_results()): Compute final metrics or perform actions with the aggregated results.

Key Features:

Distributed Support: Automatically handles distributed evaluation with proper gathering across ranks and padding removal.
Flexible Collation: Supports collating various result types (tensors, dicts of tensors, lists).
Data Pipeline Integration: Uses SamplerIntervalConfig to integrate with the interleaved sampler for efficient data loading.
Progress Tracking: Provides progress bars and timing information for data loading.

Template Methods to Override:

Child classes must implement process_data() and process_results():

process_data(): Process a single batch (e.g., run model inference).
process_results(): Process the aggregated results from all batches.

Examples

Basic validation accuracy callback that evaluates on a test set every epoch:

class AccuracyCallback(PeriodicDataIteratorCallback):
    def __init__(self, *args, dataset_key="test", **kwargs):
        super().__init__(*args, **kwargs)
        self.dataset_key = dataset_key

    def process_data(self, batch, *, trainer_model):
        # Run inference on batch
        x = batch["x"].to(trainer_model.device)
        y_true = batch["class"].clone()
        y_pred = trainer_model(x)
        return {"predictions": y_pred, "labels": y_true}

    def process_results(self, results, *, interval_type, update_counter, **_):
        # Compute accuracy from aggregated results
        y_pred = results["predictions"]
        y_true = results["labels"]
        accuracy = (y_pred.argmax(dim=1) == y_true).float().mean()

        self.writer.add_scalar(
            key="test/accuracy",
            value=accuracy.item(),
            logger=self.logger,
            format_str=".4f",
        )


# Configure in YAML:
# callbacks:
#   - kind: path.to.AccuracyCallback
#     every_n_epochs: 1
#     dataset_key: "test"

Advanced example with multiple return values and custom collation:

class DetailedEvaluationCallback(PeriodicDataIteratorCallback):
    def process_data(self, batch, *, trainer_model):
        x = batch["x"].to(trainer_model.device)
        y = batch["label"]

        # Return multiple outputs as tuple
        logits = trainer_model(x)
        embeddings = trainer_model.get_embeddings(x)
        return logits, embeddings, y

    def process_results(self, results, *, interval_type, update_counter, **_):
        # results is a tuple: (all_logits, all_embeddings, all_labels)
        logits, embeddings, labels = results

        # Compute multiple metrics
        accuracy = (logits.argmax(dim=1) == labels).float().mean()
        mean_embedding_norm = embeddings.norm(dim=-1).mean()

        self.writer.add_scalar("accuracy", accuracy.item())
        self.writer.add_scalar("embedding_norm", mean_embedding_norm.item())

dataset_key¶: Key to identify the dataset to iterate over from self.data_container. Automatically set from the callback config.

sampler_config¶: Configuration for the sampler that controls dataset iteration. Automatically set when dataset is initialized.

total_data_time¶: Cumulative time spent waiting for data loading across all periodic callbacks.

Note

The process_data() method is called within a torch.no_grad() context automatically.
For distributed training, results are automatically gathered across all ranks with proper padding removal.

Parameters:

callback_config (PeriodicDataIteratorCallbackConfig) – Configuration for the callback. See PeriodicDataIteratorCallbackConfig for available options.
trainer – Trainer of the current run.
model (noether.core.models.ModelBase) – Model of the current run.
data_container (noether.data.container.DataContainer) – DataContainer instance that provides access to all datasets.
tracker (noether.core.trackers.BaseTracker) – BaseTracker instance to log metrics to stdout/disk/online platform.
log_writer (noether.core.writers.LogWriter) – LogWriter instance to log metrics.
checkpoint_writer (noether.core.writers.CheckpointWriter) – CheckpointWriter instance to save checkpoints.
metric_property_provider (noether.core.providers.MetricPropertyProvider) – MetricPropertyProvider instance to access properties of metrics.
name (str | None) – Name of the callback.
development (bool)

dataset_key¶

total_data_time¶

_sampler_config_from_key(key, properties=None, max_size=None)¶

Parameters:

key (str | None) – Key for identifying the dataset from self.data_container. Uses the first dataset if None.
properties (set[str] | None) – Optionally specifies a subset of properties to load from the dataset.
max_size (int | None) – If provided, only uses a subset of the full dataset. Default: None (no subset).

Returns:

SamplerIntervalConfig for the registered dataset.

Return type:

noether.data.samplers.SamplerIntervalConfig

abstractmethod process_data(batch, *, trainer_model)¶

Template method that is called for each batch that is loaded from the dataset.

This method should process a single batch and return results that will be collated.

Parameters:

batch – The loaded batch.
trainer_model (torch.nn.Module) – Model of the current training run.

Returns:

Processed results for this batch. Can be a tensor, dict of tensors, list, or tuple.

Return type:

Any

process_results(results, *, interval_type, update_counter, **_)¶

Template method that is called with the collated results from dataset iteration.

For example, metrics can be computed from the results for the entire test/validation dataset and logged.

Parameters:

results (Any) – The collated results that were produced by _iterate_over_dataset() and the individual process_data() calls.
interval_type – The type of interval that triggered this callback invocation.
update_counter (noether.core.utils.training.counter.UpdateCounter) – UpdateCounter with the current training state.
**_ – Additional unused keyword arguments.

Return type:

None

periodic_callback(*, interval_type, update_counter, data_iter, trainer_model, batch_size, **_)¶

Hook called periodically based on the configured intervals.

Subclasses should override this method to implement periodic logic such as:

Calculating and logging expensive validation metrics
Saving specific model checkpoints or artifacts
Visualizing training progress (e.g., plotting samples)
Adjusting training hyperparameters or model state

Note

This method is executed within a torch.no_grad() context.

Parameters:

interval_type (IntervalType) – “epoch”, “update”, “sample” or “eval” indicating which interval triggered this callback.
update_counter (noether.core.utils.training.counter.UpdateCounter) – UpdateCounter instance providing details about the current training progress (epoch, update, sample counts).
**kwargs – Additional keyword arguments passed from the triggering hook (e.g., from after_epoch() or after_update()).
data_iter (collections.abc.Iterator)
batch_size (int)

Return type:

None

after_training(**_)¶

Hook called once after the training loop finishes.

This method is intended to be overridden by derived classes to perform cleanup or final reporting tasks after training is complete. Common use cases include:

Performing a final evaluation on the test set
Saving final model weights or artifacts
Sending notifications (e.g., via Slack or email) about the completed run
Closing or finalizing experiment tracking sessions

Note

This method is executed within a torch.no_grad() context.

Parameters:: update_counter – UpdateCounter instance to access current training progress.
Return type:: None

class noether.core.callbacks.PeriodicDataIteratorCallbackConfig(/, **data)¶

Bases: noether.core.callbacks.base.CallBackBaseConfig, abc.ABC

Internal base class for all registry-based configs.

Provides auto-registration via __init_subclass__. Not meant to be used directly - use specific config base classes instead.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

name: str¶

dataset_key: str = None¶: The key of the dataset to be used for the loss calculation. Can be any key that is registered in the DataContainer.