noether.core.presets.base

Attributes

Classes

DomainPreset

Base class for domain-specific configuration presets.

Module Contents

noether.core.presets.base.logger
noether.core.presets.base.CHECKPOINT_CALLBACK = 'noether.core.callbacks.CheckpointCallback'
noether.core.presets.base.BEST_CHECKPOINT_CALLBACK = 'noether.core.callbacks.BestCheckpointCallback'
noether.core.presets.base.EMA_CALLBACK = 'noether.core.callbacks.EmaCallback'
noether.core.presets.base.OFFLINE_LOSS_CALLBACK = 'noether.training.callbacks.OfflineLossCallback'
noether.core.presets.base.MEAN_STD_NORMALIZER = 'noether.data.preprocessors.normalizers.MeanStdNormalization'
noether.core.presets.base.POSITION_NORMALIZER = 'noether.data.preprocessors.normalizers.PositionNormalizer'
noether.core.presets.base.LR_SCHEDULE_LINEAR_WARMUP_COSINE = 'noether.core.schedules.LinearWarmupCosineDecaySchedule'
noether.core.presets.base.OPTIMIZER_LION = 'noether.core.optimizer.Lion'
class noether.core.presets.base.DomainPreset

Bases: abc.ABC

Base class for domain-specific configuration presets.

A preset encapsulates domain knowledge - data specs, dataset statistics, normalizer conventions, pipeline defaults - so that training scripts only specify what’s unique to the experiment (model architecture, hyperparameters, dataset path).

Subclasses must define:
  • data_specs: property returning the domain’s data specification object

  • dataset_statistics: property returning pre-computed dataset statistics

  • normalizer_spec: property returning a declarative normalizer mapping

  • excluded_properties: properties to exclude from dataset loading

  • target_properties: list of target property names for this domain

Subclasses should also set class attributes:
  • dataset_kind: fully qualified dataset class path

  • stats: raw statistics dict

  • pipeline_defaults: default pipeline parameters

  • pipeline_model_overrides: per-model pipeline parameter overrides

  • forward_properties_map: per-model forward property lists (with "_default" fallback)

The build_normalizers, build_pipeline, build_dataset, build_model, forward_properties, standard_callbacks, and build_config methods have default implementations that can be overridden when needed.

dataset_kind: str = ''
stats: dict[str, list[float]]
stats_file: str | None = None
pipeline_defaults: dict[str, Any]
pipeline_model_overrides: dict[str, dict[str, Any]]
forward_properties_map: dict[str, list[str]]
property data_specs: Any
Abstractmethod:

Return type:

Any

Return the domain’s data specification (e.g., AeroDataSpecs).

property dataset_statistics: dict[str, list[float] | float]

Return pre-computed dataset statistics as a flat dict.

Resolution order: 1. If stats_file is set on the preset, loads from that YAML file. 2. If stats dict is set on the preset, returns a copy. 3. If dataset_kind points to a class with a STATS_FILE attribute, loads from that.

Subclasses can override this property for custom logic.

Return type:

dict[str, list[float] | float]

property normalizer_spec: dict[str, str | tuple[str, dict[str, Any]]]
Abstractmethod:

Return type:

dict[str, str | tuple[str, dict[str, Any]]]

Declarative normalizer mapping.

Keys are data source names (e.g., "surface_pressure"). Values are either:

  • "mean_std" - auto-builds MeanStdNormalizerConfig from stats

  • ("position", {"scale": 1000}) - auto-builds PositionNormalizerConfig

Stats are resolved by convention: for a key "surface_pressure" with type "mean_std", the builder looks up "surface_pressure_mean" and "surface_pressure_std" in dataset_statistics.

property excluded_properties: set[str] | None
Abstractmethod:

Return type:

set[str] | None

Properties to exclude from dataset loading, or None.

abstractmethod target_properties()

Return the list of target property names for this domain.

Return type:

list[str]

forward_properties(model_kind)

Return the list of forward properties for the given model architecture.

Looks up forward_properties_map by model kind, falling back to "_default".

Parameters:

model_kind (str)

Return type:

list[str]

build_pipeline(model_kind, **overrides)

Build a pipeline config by merging defaults, model overrides, and user overrides.

Subclasses must override this to construct the appropriate pipeline config. The default implementation merges pipeline_defaults, model-specific overrides from pipeline_model_overrides, and any caller-provided overrides into a single dict and returns it. Subclasses should call super() to get the merged params.

Parameters:
  • model_kind (str)

  • overrides (Any)

Return type:

Any

abstractmethod build_dataset(*, split, root, model_kind, wrappers=None, **overrides)

Build a dataset config for the given split.

Parameters:
  • split (str)

  • root (str)

  • model_kind (str)

  • wrappers (list[noether.core.schemas.dataset.DatasetWrappers] | None)

  • overrides (Any)

Return type:

noether.core.schemas.dataset.DatasetBaseConfig

build_normalizers()

Build normalizer configs from the declarative normalizer_spec.

Uses naming conventions to look up statistics:
  • "mean_std" -> looks for {key}_mean and {key}_std

  • "position" -> looks for raw_pos_min and raw_pos_max

Returns:

Dict mapping data source names to lists of normalizer configs.

Return type:

dict[str, list[noether.core.schemas.normalizers.AnyNormalizer]]

static standard_callbacks(*, log_every_n_epochs=1, save_every_n_epochs=10, eval_dataset_key='test', batch_size=1, ema=True, ema_factors=None, best_metric_key='loss/test/total')

Build a standard set of training callbacks.

Returns a list containing:
  • CheckpointCallback (periodic checkpoints)

  • OfflineLossCallback (validation loss)

  • BestCheckpointCallback (saves best model by metric)

  • EmaCallback (optional exponential moving average)

Parameters:
  • log_every_n_epochs (int) – frequency for loss logging and validation.

  • save_every_n_epochs (int) – frequency for checkpoint saving and EMA.

  • eval_dataset_key (str) – dataset key for offline evaluation.

  • batch_size (int) – batch size for evaluation callbacks.

  • ema (bool) – whether to include EMA callback.

  • ema_factors (set[float] | None) – EMA decay factors. Defaults to None, numerically it will be {0.9999}.

  • best_metric_key (str) – metric key for best checkpoint selection.

Return type:

list

build_optimizer(*, kind=OPTIMIZER_LION, lr=5e-05, weight_decay=0.05, clip_grad_norm=1.0, warmup_percent=0.05, end_lr=1e-06)

Build an optimizer config with sensible defaults.

Parameters:
  • kind (str) – optimizer class path.

  • lr (float) – learning rate.

  • weight_decay (float) – weight decay.

  • clip_grad_norm (float | None) – gradient clipping norm. None to disable.

  • warmup_percent (float) – fraction of training for linear warmup.

  • end_lr (float | None) – final learning rate for cosine decay. None to disable scheduling.

Return type:

noether.core.schemas.optimizers.OptimizerConfig

build_model(*, model_kind, optimizer=None, **model_params)

Build a model config from the model kind and parameters.

Automatically injects data_specs, forward_properties, optimizer_config, and kind so the user only provides architecture knobs.

If the model kind has registered defaults in _MODEL_DEFAULTS, those are applied before constructing the config (e.g., AB-UPT sub-configs).

Parameters:
  • model_kind (str) – fully qualified class path of the model.

  • optimizer (noether.core.schemas.optimizers.OptimizerConfig | None) – optimizer config. Uses build_optimizer() defaults if None.

  • **model_params (Any) – model-specific parameters (e.g., hidden_dim, num_heads).

Returns:

A model config object.

Return type:

Any

build_config(*, model_kind, model_params=None, model_config=None, optimizer=None, trainer_kind, trainer_params=None, dataset_root, output_path=None, datasets=None, extra_datasets=None, callbacks_override=None, extra_callbacks=None, accelerator=None, max_epochs=500, batch_size=1, seed=42, **config_overrides)

Assemble a complete ConfigSchema with all domain defaults filled in.

Provide either model_config (pre-built) or model_params (dict of architecture knobs like hidden_dim, num_heads). If model_params is used, build_model() is called automatically.

Parameters:
  • model_kind (str) – fully qualified class path of the model.

  • model_params (dict[str, Any] | None) – model architecture parameters (used with build_model).

  • model_config (Any | None) – pre-built model config object. Mutually exclusive with model_params.

  • optimizer (noether.core.schemas.optimizers.OptimizerConfig | None) – optimizer config. Defaults to Lion with cosine decay via build_optimizer().

  • trainer_kind (str) – fully qualified class path of the trainer.

  • trainer_params (dict[str, Any] | None) – additional trainer-specific parameters (e.g., loss weights).

  • dataset_root (str) – root directory of the dataset.

  • output_path (str | None) – output directory. Defaults to {dataset_root}/outputs.

  • datasets (dict[str, str] | list[str] | None) – splits to create. Either a list of split names (e.g., ["train", "test"]) where keys equal splits, or a dict mapping keys to splits for custom naming (e.g., {"my_train": "train"}). Defaults to ["train", "test"].

  • extra_datasets (dict[str, noether.core.schemas.dataset.DatasetBaseConfig] | None) – additional pre-built dataset configs to merge in (e.g., repeated test sets).

  • callbacks_override (list | None) – replace the default callback list entirely. Defaults to standard_callbacks().

  • extra_callbacks (list | None) – additional callbacks appended to the default (or overridden) list.

  • accelerator (str | None) – “cpu”, “gpu”, or “mps”. Auto-detected if None.

  • max_epochs (int) – maximum training epochs.

  • batch_size (int) – effective batch size.

  • seed (int) – random seed.

  • **config_overrides (Any) – additional fields passed to ConfigSchema.

Returns:

A fully populated ConfigSchema ready for HydraRunner().main().

Return type:

noether.core.schemas.schema.ConfigSchema