noether.core.presets.base¶
Attributes¶
Classes¶
Base class for domain-specific configuration presets. |
Module Contents¶
- noether.core.presets.base.logger¶
- noether.core.presets.base.CHECKPOINT_CALLBACK = 'noether.core.callbacks.CheckpointCallback'¶
- noether.core.presets.base.BEST_CHECKPOINT_CALLBACK = 'noether.core.callbacks.BestCheckpointCallback'¶
- noether.core.presets.base.EMA_CALLBACK = 'noether.core.callbacks.EmaCallback'¶
- noether.core.presets.base.OFFLINE_LOSS_CALLBACK = 'noether.training.callbacks.OfflineLossCallback'¶
- noether.core.presets.base.MEAN_STD_NORMALIZER = 'noether.data.preprocessors.normalizers.MeanStdNormalization'¶
- noether.core.presets.base.POSITION_NORMALIZER = 'noether.data.preprocessors.normalizers.PositionNormalizer'¶
- noether.core.presets.base.LR_SCHEDULE_LINEAR_WARMUP_COSINE = 'noether.core.schedules.LinearWarmupCosineDecaySchedule'¶
- noether.core.presets.base.OPTIMIZER_LION = 'noether.core.optimizer.Lion'¶
- class noether.core.presets.base.DomainPreset¶
Bases:
abc.ABCBase class for domain-specific configuration presets.
A preset encapsulates domain knowledge - data specs, dataset statistics, normalizer conventions, pipeline defaults - so that training scripts only specify what’s unique to the experiment (model architecture, hyperparameters, dataset path).
- Subclasses must define:
data_specs: property returning the domain’s data specification objectdataset_statistics: property returning pre-computed dataset statisticsnormalizer_spec: property returning a declarative normalizer mappingexcluded_properties: properties to exclude from dataset loadingtarget_properties: list of target property names for this domain
- Subclasses should also set class attributes:
dataset_kind: fully qualified dataset class pathstats: raw statistics dictpipeline_defaults: default pipeline parameterspipeline_model_overrides: per-model pipeline parameter overridesforward_properties_map: per-model forward property lists (with"_default"fallback)
The
build_normalizers,build_pipeline,build_dataset,build_model,forward_properties,standard_callbacks, andbuild_configmethods have default implementations that can be overridden when needed.- property data_specs: Any¶
- Abstractmethod:
- Return type:
Any
Return the domain’s data specification (e.g., AeroDataSpecs).
- property dataset_statistics: dict[str, list[float] | float]¶
Return pre-computed dataset statistics as a flat dict.
Resolution order: 1. If
stats_fileis set on the preset, loads from that YAML file. 2. Ifstatsdict is set on the preset, returns a copy. 3. Ifdataset_kindpoints to a class with aSTATS_FILEattribute, loads from that.Subclasses can override this property for custom logic.
- property normalizer_spec: dict[str, str | tuple[str, dict[str, Any]]]¶
-
Declarative normalizer mapping.
Keys are data source names (e.g.,
"surface_pressure"). Values are either:"mean_std"- auto-builds MeanStdNormalizerConfig from stats("position", {"scale": 1000})- auto-builds PositionNormalizerConfig
Stats are resolved by convention: for a key
"surface_pressure"with type"mean_std", the builder looks up"surface_pressure_mean"and"surface_pressure_std"indataset_statistics.
- abstractmethod target_properties()¶
Return the list of target property names for this domain.
- forward_properties(model_kind)¶
Return the list of forward properties for the given model architecture.
Looks up
forward_properties_mapby model kind, falling back to"_default".
- build_pipeline(model_kind, **overrides)¶
Build a pipeline config by merging defaults, model overrides, and user overrides.
Subclasses must override this to construct the appropriate pipeline config. The default implementation merges
pipeline_defaults, model-specific overrides frompipeline_model_overrides, and any caller-provided overrides into a single dict and returns it. Subclasses should callsuper()to get the merged params.- Parameters:
model_kind (str)
overrides (Any)
- Return type:
Any
- abstractmethod build_dataset(*, split, root, model_kind, wrappers=None, **overrides)¶
Build a dataset config for the given split.
- Parameters:
- Return type:
- build_normalizers()¶
Build normalizer configs from the declarative
normalizer_spec.- Uses naming conventions to look up statistics:
"mean_std"-> looks for{key}_meanand{key}_std"position"-> looks forraw_pos_minandraw_pos_max
- static standard_callbacks(*, log_every_n_epochs=1, save_every_n_epochs=10, eval_dataset_key='test', batch_size=1, ema=True, ema_factors=None, best_metric_key='loss/test/total')¶
Build a standard set of training callbacks.
- Returns a list containing:
CheckpointCallback (periodic checkpoints)
OfflineLossCallback (validation loss)
BestCheckpointCallback (saves best model by metric)
EmaCallback (optional exponential moving average)
- Parameters:
log_every_n_epochs (int) – frequency for loss logging and validation.
save_every_n_epochs (int) – frequency for checkpoint saving and EMA.
eval_dataset_key (str) – dataset key for offline evaluation.
batch_size (int) – batch size for evaluation callbacks.
ema (bool) – whether to include EMA callback.
ema_factors (set[float] | None) – EMA decay factors. Defaults to None, numerically it will be {0.9999}.
best_metric_key (str) – metric key for best checkpoint selection.
- Return type:
- build_optimizer(*, kind=OPTIMIZER_LION, lr=5e-05, weight_decay=0.05, clip_grad_norm=1.0, warmup_percent=0.05, end_lr=1e-06)¶
Build an optimizer config with sensible defaults.
- Parameters:
kind (str) – optimizer class path.
lr (float) – learning rate.
weight_decay (float) – weight decay.
clip_grad_norm (float | None) – gradient clipping norm. None to disable.
warmup_percent (float) – fraction of training for linear warmup.
end_lr (float | None) – final learning rate for cosine decay. None to disable scheduling.
- Return type:
- build_model(*, model_kind, optimizer=None, **model_params)¶
Build a model config from the model kind and parameters.
Automatically injects
data_specs,forward_properties,optimizer_config, andkindso the user only provides architecture knobs.If the model kind has registered defaults in
_MODEL_DEFAULTS, those are applied before constructing the config (e.g., AB-UPT sub-configs).- Parameters:
model_kind (str) – fully qualified class path of the model.
optimizer (noether.core.schemas.optimizers.OptimizerConfig | None) – optimizer config. Uses
build_optimizer()defaults if None.**model_params (Any) – model-specific parameters (e.g.,
hidden_dim,num_heads).
- Returns:
A model config object.
- Return type:
Any
- build_config(*, model_kind, model_params=None, model_config=None, optimizer=None, trainer_kind, trainer_params=None, dataset_root, output_path=None, datasets=None, extra_datasets=None, callbacks_override=None, extra_callbacks=None, accelerator=None, max_epochs=500, batch_size=1, seed=42, **config_overrides)¶
Assemble a complete ConfigSchema with all domain defaults filled in.
Provide either
model_config(pre-built) ormodel_params(dict of architecture knobs likehidden_dim,num_heads). Ifmodel_paramsis used,build_model()is called automatically.- Parameters:
model_kind (str) – fully qualified class path of the model.
model_params (dict[str, Any] | None) – model architecture parameters (used with
build_model).model_config (Any | None) – pre-built model config object. Mutually exclusive with
model_params.optimizer (noether.core.schemas.optimizers.OptimizerConfig | None) – optimizer config. Defaults to Lion with cosine decay via
build_optimizer().trainer_kind (str) – fully qualified class path of the trainer.
trainer_params (dict[str, Any] | None) – additional trainer-specific parameters (e.g., loss weights).
dataset_root (str) – root directory of the dataset.
output_path (str | None) – output directory. Defaults to
{dataset_root}/outputs.datasets (dict[str, str] | list[str] | None) – splits to create. Either a list of split names (e.g.,
["train", "test"]) where keys equal splits, or a dict mapping keys to splits for custom naming (e.g.,{"my_train": "train"}). Defaults to["train", "test"].extra_datasets (dict[str, noether.core.schemas.dataset.DatasetBaseConfig] | None) – additional pre-built dataset configs to merge in (e.g., repeated test sets).
callbacks_override (list | None) – replace the default callback list entirely. Defaults to
standard_callbacks().extra_callbacks (list | None) – additional callbacks appended to the default (or overridden) list.
accelerator (str | None) – “cpu”, “gpu”, or “mps”. Auto-detected if None.
max_epochs (int) – maximum training epochs.
batch_size (int) – effective batch size.
seed (int) – random seed.
**config_overrides (Any) – additional fields passed to ConfigSchema.
- Returns:
A fully populated ConfigSchema ready for
HydraRunner().main().- Return type: