noether.core.callbacks.checkpoint.best_checkpoint¶
Classes¶
Internal base class for all registry-based configs. |
|
Callback to save the best model based on a metric. |
Module Contents¶
- class noether.core.callbacks.checkpoint.best_checkpoint.BestCheckpointCallbackConfig(/, **data)¶
Bases:
noether.core.callbacks.base.CallBackBaseConfigInternal base class for all registry-based configs.
Provides auto-registration via __init_subclass__. Not meant to be used directly - use specific config base classes instead.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
data (Any)
- name: Literal['BestCheckpointCallback'] = None¶
- tolerances: list[int] | None = None¶
“If provided, this callback will produce multiple best models which differ in the amount of intervals they allow the metric to not improve. For example, tolerance=[5] with every_n_epochs=1 will store a checkpoint where at most 5 epochs have passed until the metric improved. Additionally, the best checkpoint over the whole training will always be stored (i.e., tolerance=infinite). When setting different tolerances, one can evaluate different early stopping configurations with one training run.
- model_names: list[str] | None = None¶
Which model name to save (e.g., if only the encoder of an autoencoder should be stored, one could pass model_name=’encoder’ here). This only applies when training a CompositeModel. If None, all models are saved.
- eval_callbacks: list[Annotated[Any, Discriminated(CallBackBaseConfig)]] | None = None¶
Optional nested callbacks to dispatch whenever a new best model is detected. Each child’s metric keys are automatically prefixed with
best=<metric_key>/(slashes in the metric key are replaced with dots) so they don’t collide with the live-model metrics. Children are invoked via theirat_evalhook, which bypasses their own schedule — the trigger is the new-best event, not the child’severy_n_*. Tolerance- exceeded saves do not trigger children.before_trainingandafter_trainingare forwarded unconditionally so children can initialize and finalize cleanly.PeriodicDataIteratorCallbackchildren get a dedicatedDataLoaderbuilt from theirsampler_config; they are not registered on the sharedInterleavedSampler. This means a child’severy_n_*is irrelevant here (only thedataset_key/batch_size/pipelinematter) and the child’s schedule does not need to match this callback’s.
- class noether.core.callbacks.checkpoint.best_checkpoint.BestCheckpointCallback(callback_config, **kwargs)¶
Bases:
noether.core.callbacks.periodic.PeriodicCallbackCallback to save the best model based on a metric.
This callback monitors a specified metric and saves the model checkpoint whenever a new best value is achieved. It supports storing different model components when using a composite model and can save checkpoints at different tolerance thresholds.
Example config:
callbacks: - kind: noether.core.callbacks.BestCheckpointCallback name: BestCheckpointCallback every_n_epochs: 1 metric_key: loss/val/total model_names: # only applies when training a CompositeModel - encoder eval_callbacks: - kind: noether.training.callbacks.OfflineLossCallback every_n_epochs: 1 # ignored; the parent triggers on new-best dataset_key: test
- Parameters:
callback_config (BestCheckpointCallbackConfig) – Configuration for the callback. See
BestCheckpointCallbackConfigfor available options including metric key, model names, and tolerance settings.**kwargs – Additional arguments passed to the parent class.
- metric_key¶
- model_names¶
- higher_is_better¶
- best_metric_value¶
- save_frozen_weights¶
- tolerances_is_exceeded¶
- tolerance_counter = 0¶
- eval_callbacks: list[noether.core.callbacks.periodic.PeriodicCallback] = []¶
- get_children()¶
Non-iterator children only — iterator children are owned end-to-end here and must not be registered on the shared
InterleavedSampler(we build their loaders on dispatch instead). The trainer always passesbatch_sizeto everyPeriodicCallbackhook, so we can build child loaders without needing the trainer’s iterator-args bundle.- Return type:
- state_dict()¶
Return the state of the callback for checkpointing.
- load_state_dict(state_dict)¶
Load the callback state from a checkpoint.
Note
This modifies the input state_dict in place.
- before_training(*, update_counter, **kwargs)¶
Validate callback configuration before training starts.
- Parameters:
update_counter – The training update counter.
**kwargs – Additional keyword arguments forwarded to child eval callbacks.
- Raises:
NotImplementedError – If resuming training with tolerances is attempted.
- Return type:
None
- periodic_callback(*, interval_type, **kwargs)¶
Execute the periodic callback to check and save best model.
This method is called at the configured frequency (e.g., every N epochs). It checks if the current metric value is better than the previous best, and if so, saves the model checkpoint. Also tracks tolerance-based checkpoints.
When a new best is detected, child eval callbacks (if configured) are dispatched against the live (newly-best) model. Iterator children iterate their own
DataLoader(built on first use) — they do not consume from the trainer’s shareddata_iter.On
interval_type="eval"(post-training eval, where the trainer loads the saved best checkpoint into the live model and calls every callback’sat_eval), children are dispatched unconditionally so they evaluate the loaded best model. No checkpoint save / tolerance bookkeeping runs in eval mode (the in-memorybest_metric_valuestarts at ±inf in a fresh eval process).- Raises:
KeyError – If the log cache is empty or the metric key is not found.
- Parameters:
interval_type (noether.core.callbacks.periodic.IntervalType)
- Return type:
None
- after_training(**kwargs)¶
Log the best metric values at different tolerance thresholds after training completes.
- Parameters:
**kwargs – Additional keyword arguments forwarded to child eval callbacks.
- Return type:
None