noether.core.presets.base¶

Attributes¶

`logger`
`CHECKPOINT_CALLBACK`
`BEST_CHECKPOINT_CALLBACK`
`EMA_CALLBACK`
`OFFLINE_LOSS_CALLBACK`
`LR_SCHEDULE_LINEAR_WARMUP_COSINE`
`OPTIMIZER_LION`

Classes¶

DomainPreset

Base class for domain-specific configuration presets.

Module Contents¶

noether.core.presets.base.logger¶

noether.core.presets.base.CHECKPOINT_CALLBACK = 'noether.core.callbacks.CheckpointCallback'¶

noether.core.presets.base.BEST_CHECKPOINT_CALLBACK = 'noether.core.callbacks.BestCheckpointCallback'¶

noether.core.presets.base.EMA_CALLBACK = 'noether.core.callbacks.EmaCallback'¶

noether.core.presets.base.OFFLINE_LOSS_CALLBACK = 'noether.training.callbacks.OfflineLossCallback'¶

noether.core.presets.base.LR_SCHEDULE_LINEAR_WARMUP_COSINE = 'noether.core.schedules.LinearWarmupCosineDecaySchedule'¶

noether.core.presets.base.OPTIMIZER_LION = 'noether.core.optimizer.Lion'¶

class noether.core.presets.base.DomainPreset¶

Bases: abc.ABC

Base class for domain-specific configuration presets.

A preset encapsulates domain knowledge - data specs, dataset statistics, normalizer conventions, pipeline defaults - so that training scripts only specify what’s unique to the experiment (model architecture, hyperparameters, dataset path).

Subclasses must define:

data_specs: property returning the domain’s data specification object
dataset_statistics: property returning pre-computed dataset statistics
normalizer_spec: property returning a declarative normalizer mapping
excluded_properties: properties to exclude from dataset loading
target_properties: list of target property names for this domain

Subclasses should also set class attributes:

dataset_kind: fully qualified dataset class path
stats: raw statistics dict
pipeline_defaults: default pipeline parameters
pipeline_model_overrides: per-model pipeline parameter overrides
forward_properties_map: per-model forward property lists (with "_default" fallback)

The build_normalizers, build_pipeline, build_dataset, build_model, forward_properties, standard_callbacks, and build_config methods have default implementations that can be overridden when needed.

dataset_kind: str = ''¶

stats: dict[str, list[float]]¶

stats_file: str | None = None¶

pipeline_defaults: dict[str, Any]¶

pipeline_model_overrides: dict[str, dict[str, Any]]¶

forward_properties_map: dict[str, list[str]]¶

property data_specs: Any¶

Abstractmethod:
Return type:: Any

Return the domain’s data specification (e.g., ModelDataSpecs).

property dataset_statistics: dict[str, list[float] | float]¶

Return pre-computed dataset statistics as a flat dict.

Resolution order: 1. If stats_file is set on the preset, loads from that YAML file. 2. If stats dict is set on the preset, returns a copy. 3. If dataset_kind points to a class with a STATS_FILE attribute, loads from that.

Subclasses can override this property for custom logic.

Return type:: dict[str, list[float] | float]

property normalizer_spec: dict[str, noether.data.preprocessors.normalizers.FieldNormalizerConfig]¶

Abstractmethod:
Return type:: dict[str, noether.data.preprocessors.normalizers.FieldNormalizerConfig]

Declarative normalizer mapping.

Keys are data source names (e.g., "surface_pressure"). Values are FieldNormalizerConfig instances that declare the normalization strategy. Statistics are resolved at runtime by the dataset from its STATS_FILE.

Example:

{
    "surface_pressure": FieldNormalizerConfig(strategy="mean_std"),
    "surface_position": FieldNormalizerConfig(strategy="position", scale=1000),
    "volume_vorticity": FieldNormalizerConfig(
        strategy="mean_std",
        logscale=True,
        stat_keys={"mean": "volume_vorticity_logscale_mean", "std": "volume_vorticity_logscale_std"},
    ),
}

property excluded_properties: set[str] | None¶

Abstractmethod:
Return type:: set[str] | None

Properties to exclude from dataset loading, or None.

abstractmethod target_properties()¶

Return the list of target property names for this domain.

Return type:: list[str]

forward_properties(model_kind)¶

Return the list of forward properties for the given model architecture.

Looks up forward_properties_map by model kind, falling back to "_default".

Parameters:: model_kind (str)
Return type:: list[str]

build_pipeline(model_kind, **overrides)¶

Build a pipeline config by merging defaults, model overrides, and user overrides.

Subclasses must override this to construct the appropriate pipeline config. The default implementation merges pipeline_defaults, model-specific overrides from pipeline_model_overrides, and any caller-provided overrides into a single dict and returns it. Subclasses should call super() to get the merged params.

Parameters:

model_kind (str)
overrides (Any)

Return type:

Any

abstractmethod build_dataset(*, split, root, model_kind, wrappers=None, **overrides)¶

Build a dataset config for the given split.

Parameters:

split (str)
root (str)
model_kind (str)
wrappers (list[noether.data.base.wrappers.DatasetWrappers] | None)
overrides (Any)

Return type:

noether.data.base.dataset.DatasetBaseConfig

build_normalizers()¶

Build normalizer configs from the declarative normalizer_spec.

Wraps each FieldNormalizerConfig from the spec into a single-element list, matching the format expected by DatasetBaseConfig.dataset_normalizers.

Returns:: Dict mapping data source names to lists of normalizer configs.
Return type:: dict[str, list[noether.data.preprocessors.normalizers.AnyNormalizer]]

static standard_callbacks(*, log_every_n_epochs=1, save_every_n_epochs=10, eval_dataset_key='test', batch_size=1, ema=True, ema_factors=None, best_metric_key='loss/test/total')¶

Build a standard set of training callbacks.

Returns a list containing:

CheckpointCallback (periodic checkpoints)
OfflineLossCallback (validation loss)
BestCheckpointCallback (saves best model by metric)
EmaCallback (optional exponential moving average)

Parameters:

log_every_n_epochs (int) – frequency for loss logging and validation.
save_every_n_epochs (int) – frequency for checkpoint saving and EMA.
eval_dataset_key (str) – dataset key for offline evaluation.
batch_size (int) – batch size for evaluation callbacks.
ema (bool) – whether to include EMA callback.
ema_factors (set[float] | None) – EMA decay factors. Defaults to None, numerically it will be {0.9999}.
best_metric_key (str) – metric key for best checkpoint selection.

Return type:

list

build_optimizer(*, kind=OPTIMIZER_LION, lr=5e-05, weight_decay=0.05, clip_grad_norm=1.0, warmup_percent=0.05, end_lr=1e-06)¶

Build an optimizer config with sensible defaults.

Parameters:

kind (str) – optimizer class path.
lr (float) – learning rate.
weight_decay (float) – weight decay.
clip_grad_norm (float | None) – gradient clipping norm. None to disable.
warmup_percent (float) – fraction of training for linear warmup.
end_lr (float | None) – final learning rate for cosine decay. None to disable scheduling.

Return type:

noether.core.optimizer.schemas.OptimizerConfig

build_model(*, model_kind, optimizer=None, **model_params)¶

Build a model config from the model kind and parameters.

Automatically injects data_specs, forward_properties, optimizer_config, and kind so the user only provides architecture knobs.

If the model kind has registered defaults in _MODEL_DEFAULTS, those are applied before constructing the config (e.g., AB-UPT sub-configs).

Parameters:

model_kind (str) – fully qualified class path of the model.
optimizer (noether.core.optimizer.schemas.OptimizerConfig | None) – optimizer config. Uses build_optimizer() defaults if None.
**model_params (Any) – model-specific parameters (e.g., hidden_dim, num_heads).

Returns:

A model config object.

Return type:

Any

build_config(*, model_kind, model_params=None, model_config=None, optimizer=None, trainer_kind, trainer_params=None, dataset_root, output_path=None, datasets=None, extra_datasets=None, callbacks_override=None, extra_callbacks=None, accelerator=None, max_epochs=500, batch_size=1, seed=42, **config_overrides)¶

Assemble a complete ConfigSchema with all domain defaults filled in.

Provide either model_config (pre-built) or model_params (dict of architecture knobs like hidden_dim, num_heads). If model_params is used, build_model() is called automatically.

Parameters:

model_kind (str) – fully qualified class path of the model.
model_params (dict[str, Any] | None) – model architecture parameters (used with build_model).
model_config (Any | None) – pre-built model config object. Mutually exclusive with model_params.
optimizer (noether.core.optimizer.schemas.OptimizerConfig | None) – optimizer config. Defaults to Lion with cosine decay via build_optimizer().
trainer_kind (str) – fully qualified class path of the trainer.
trainer_params (dict[str, Any] | None) – additional trainer-specific parameters (e.g., loss weights).
dataset_root (str) – root directory of the dataset.
output_path (str | None) – output directory. Defaults to {dataset_root}/outputs.
datasets (dict[str, str] | list[str] | None) – splits to create. Either a list of split names (e.g., ["train", "test"]) where keys equal splits, or a dict mapping keys to splits for custom naming (e.g., {"my_train": "train"}). Defaults to ["train", "test"].
extra_datasets (dict[str, noether.data.base.dataset.DatasetBaseConfig] | None) – additional pre-built dataset configs to merge in (e.g., repeated test sets).
callbacks_override (list | None) – replace the default callback list entirely. Defaults to standard_callbacks().
extra_callbacks (list | None) – additional callbacks appended to the default (or overridden) list.
accelerator (str | None) – “cpu”, “gpu”, or “mps”. Auto-detected if None.
max_epochs (int) – maximum training epochs.
batch_size (int) – effective batch size.
seed (int) – random seed.
**config_overrides (Any) – additional fields passed to ConfigSchema.

Returns:

A fully populated ConfigSchema ready for HydraRunner().main().

Return type:

noether.core.schemas.ConfigSchema