Configuration¶

Note

This section assumes familiarity with Hydra configuration management and Pydantic schemas. If you’re new to these tools, we recommend reviewing their official documentation before proceeding.

The configuration is the backbone of the Noether Framework, enabling reproducible, modular, and type-safe experiment definitions. All experiments are defined through YAML configuration files that use:

Hydra for hierarchical composition and command-line overrides
Pydantic for runtime data validation and type safety

For a higher-level overview of how configs and code-driven workflows compare, see Key Concepts.

Configuration architecture¶

This tutorial uses a hierarchical configuration pattern where:

Base configurations define default settings for each component (datasets, models, trainers, etc.)
Experiment configurations compose and override base configs for specific experiments
Command-line overrides allow quick parameter sweeps without file changes

The main entry point for any experiment is a top-level configuration file like configs/train_shapenet.yaml, which serves as the composition root that brings together all required components.

Example: ShapeNet-Car configuration¶

train_shapenet.yaml demonstrates the structure of a complete experiment configuration. Let’s break down its key components:

# @package _global_

# Define key values here that are used multiple times in the config files.
dataset_root: <path to your shapenet dataset root>
dataset_kind: noether.data.datasets.cfd.ShapeNetCarDataset
excluded_properties: 
  - surface_friction
  - surface_area
  - volume_pressure
  - volume_vorticity
defaults:
  - data_specs: shapenet_car
  - dataset_normalizers: shapenet_dataset_normalizers
  - model: ??? # models are undefined and will be defined per experiment 
  - trainer: shapenet_trainer
  - datasets: shapenet_dataset
  - tracker: ?? # trackers are undefined and will be defined depending on either development, training or evaluation
  - callbacks: training_callbacks_shapenet
  - pipeline: shapenet_pipeline
  - optimizer: adamw
  - slurm: slurm_config 
  - _self_

Each entry like dataset_normalizers: shapenet_dataset_normalizers tells Hydra to load configs/dataset_normalizers/shapenet_dataset_normalizers.yaml and merge it into the final configuration.

The ??? marker indicates required fields that must be specified in experiment configs. The _self_ marker controls when the current file’s values override inherited ones (placing it last gives the current file the highest priority).

Complete configuration structure:

To run an experiment, you need configurations for:

Model: Architecture and hyperparameters
Trainer: Trainer config
Callbacks: Evaluation, logging, and monitoring
Tracker: tracker
Dataset(s): Dataset config
Pipeline: Data preprocessing and collation
Optimizer: Optimization algorithm

Most components remain constant across experiments on the same dataset. For example, when training different models on ShapeNet-Car, only the model and tracker configurations typically change, while dataset, pipeline, trainer, and callbacks remain fixed.

Example: Dataset configuration

The base dataset configuration configs/datasets/shapenet_dataset.yaml demonstrates config composition:

train:
  root: ${dataset_root}
  kind: ${dataset_kind}
  split: train
  pipeline: ${pipeline}
  dataset_normalizers: ${dataset_normalizers}
  excluded_properties: ${excluded_properties}
test:
  root:  ${dataset_root}
  kind: ${dataset_kind}
  split: test
  pipeline: ${pipeline}
  dataset_normalizers: ${dataset_normalizers}
  excluded_properties: ${excluded_properties}
test_repeat:
  root:  ${dataset_root}
  kind: ${dataset_kind}
  split: test
  pipeline: ${pipeline}
  dataset_normalizers: ${dataset_normalizers}
  excluded_properties: ${excluded_properties}
  dataset_wrappers:
    - kind: noether.data.base.wrappers.RepeatWrapper
      repetitions: 10
    

Notice the ${variable_name} references? These resolve to values defined in the top-level train_shapenet.yaml. This pattern avoids duplication: dataset_root is defined once, used everywhere.

Config groups and directory structure:

The configs/ directory roughly mirrors the component structure:

configs/
├── train_shapenet.yaml          # Top-level composition
├── datasets/                    # Dataset config group
│   ├── shapenet_dataset.yaml
│   ├── ahmedml_dataset.yaml
│   └── ...
├── model/                       # Model config group
│   ├── transformer.yaml
│   ├── upt.yaml
│   └── ...
├── trainer/                     # Trainer config group
│   └── shapenet_trainer.yaml
└── experiment/                  # Experiment-specific overrides
    └── shapenet/
        ├── transformer.yaml
        ├── upt.yaml
        └── ...

Defining experiment configurations¶

Experiment-specific configurations compose base configs and apply targeted overrides. An experiment file should:

Select a specific model variant
Choose a tracker (W&B, trackio, Tensorboard or disabled)
Override any experiment-specific hyperparameters

Example: Transformer experiment

The Transformer experiment configuration configs/experiment/shapenet/transformer.yaml:

# @package _global_
defaults:
  - override /model: transformer  
  - override /tracker: development_tracker
  - override /optimizer: lion
  
name: shapenet-car-transformer-float16

trainer:
  precision: float16

Breaking down the experiment config:

override /model: transformer: Use configs/model/transformer.yaml instead of the placeholder ??? in the base config
override /tracker: development_tracker: Select the W&B tracker configuration
override /optimizer: lion: Override the default AdamW optimizer with Lion
trainer.precision: float16: Override the trainer’s default float32 precision

The override keyword ensures the experiment’s choice takes precedence over any defaults, preventing accidental config merging issues.

Creating new experiments:

To run a different model on the same dataset:

Create a new experiment file (e.g., configs/experiment/shapenet/my_model.yaml)
Specify the model config to use
Add any model-specific overrides
Keep tracker and other settings as needed

Running experiments¶

Basic execution:

To train a model with a specific configuration (from the recipes/aero_cfd/ directory):

uv run noether-train --hp configs/train_shapenet.yaml \
  +experiment/shapenet=transformer tracker=disabled trainer.max_epochs=10

uv run noether-train --hp configs/train_shapenet.yaml \
  +experiment/shapenet=ab_upt tracker=disabled trainer.max_epochs=10

To enable experiment tracking, simply remove the tracker=disabled override:

uv run noether-train --hp configs/train_shapenet.yaml \
  +experiment/shapenet=transformer

Important

All training commands must be run from inside the recipe folder (recipes/aero_cfd/).

Warning

Make sure to either set dataset_root in train_shapenet.yaml or add it to the command line via dataset_root="<path to dataset root>".

You’ll need to configure your W&B API key on first run and update configs/tracker/development_tracker.yaml with your project details.

Single parameter overrides:

uv run noether-train --hp configs/train_shapenet.yaml \
  +experiment/shapenet=transformer \
  trainer.max_epochs=100

Multiple parameter overrides:

To modify multiple related parameters (e.g., changing Transformer dimensions):

uv run noether-train --hp configs/train_shapenet.yaml \
  +experiment/shapenet=transformer \
  model.hidden_dim=256 \
  model.transformer_block_config.num_heads=4

Note: When changing hidden_dim, ensure num_heads divides it evenly (i.e., hidden_dim % num_heads == 0).

For more details on CLI-based training, see Training First Model (with Configs). To run experiments using Python code instead of YAML configs, see Training First Model (with Code).

For launching training jobs on a SLURM cluster, see How to launch a SLURM job from the command line.

Pydantic schemas for type safety¶

While Hydra handles configuration composition, Pydantic schemas provide runtime validation and type safety. Every class in the Noether Framework has a corresponding Pydantic schema that validates configuration: checks types, ranges, and constraints before training begins.

Schema hierarchy:

All schemas in the Noether Framework follow an inheritance pattern. For example, model schemas inherit from ModelBaseConfig:

class ModelBaseConfig(_RegistryBase):
    _registry: ClassVar[dict[str, type]] = {}
    _type_field: ClassVar[str] = "kind"

    kind: str | None = None
    """Kind of model to use, i.e. class path"""
    name: str
    """Name of the model. Needs to be unique"""
    optimizer_config: AnyOptimizerConfig | None = Field(None, discriminator="kind")
    """The optimizer configuration to use for training the model. When a model is used for inference only, this can be left as None."""
    initializers: list[Annotated[AnyInitializer, Field(discriminator=kind)]] | None = None
    """List of initializers configs to use for the model."""
    is_frozen: bool | None = False
    """Whether to freeze the model parameters (i.e., not trainable)."""
    forward_properties: list[str] | None = []
    """List of properties to be used as inputs for the forward pass of the model. Only relevant when the train_step of the BaseTrainer is used. When overridden in a class method, this property is ignored."""

    model_config = {"extra": "forbid"}

The extra: "forbid" setting ensures that typos in YAML files are caught immediately, preventing silent configuration errors.

Example: Transformer configuration schema¶

All models in Noether use schema composition and validation. The schema hierarchy for the Transformer models looks like:

ModelBaseConfig (base for all models)
    └── TransformerConfig (Transformer-specific config)
          └── TransformerBlockConfig (component config)

TransformerBlockConfig defines individual block parameters:

class TransformerBlockConfig(BaseModel):
    """Configuration for a transformer block."""

    hidden_dim: int = Field(..., ge=1)
    """Hidden Dimension of the transformer block."""

    num_heads: int = Field(..., ge=1)
    """Number of attention heads."""

    mlp_hidden_dim: int | None = Field(None)
    """Hidden dimension of the MLP layer. If set to None, the mlp_hidden dim is set to hidden_dim * mlp_expansion_factor in the TransformerConfig. If both are None, an error is raised."""

    mlp_expansion_factor: int | None = Field(None, ge=1)
    """Expansion factor for the MLP hidden dimension relative to the hidden dimension. If 'mlp_hidden_dim' is not set, this factor is used to compute it as hidden_dim * mlp_expansion_factor."""

    drop_path: float = Field(0.0, ge=0.0, le=1.0)
    """Probability to drop the attention or MLP module. Defaults to 0.0."""

    attention_constructor: Literal[
        "dot_product",
        "perceiver",
        "transolver",
        "transolver_plusplus",
    ] = "dot_product"
    """Constructor of the attention module. Defaults to 'dot_product'."""

    layerscale: float | None = Field(None, ge=0.0)
    """ Init scale value to scale layer activations. Defaults to None."""

    condition_dim: int | None = Field(None)
    """Dimension of the conditioning vector. If none, no conditioning is applied. If provided, the transformer block will turn into a Diffusion Transformer (DiT) block."""

    bias: bool = Field(True)
    """Whether to use biases in norm/projections. Defaults to True."""

    eps: float = Field(1e-6, gt=0.0)
    """Epsilon Value for the layer nornalization. Defaults to 1e-6."""

    init_weights: InitWeightsMode = Field("truncnormal002")
    """Initialization method for the weight matrices of the network. Defaults to "truncnormal002"""

    use_rope: bool = Field(False)
    """Whether to use Rotary Positional Embeddings (RoPE)."""

    max_wavelength: int | None = Field(10_000)
    """Theta parameter for the transformer sine/cosine embedding. Default: 10_000"""

    attention_arguments: dict = {}
    """Additional arguments for the attention module that are only needed for a specific attention implementation."""

    @model_validator(mode="after")
    def set_mlp_hidden_dim(self):
        # Validate hidden_dim is divisible by num_heads
        if self.hidden_dim % self.num_heads != 0:
            raise ValueError(f"hidden_dim ({self.hidden_dim}) must be divisible by num_heads ({self.num_heads}).")

        if self.mlp_hidden_dim is None:
            if self.mlp_expansion_factor is None:
                raise ValueError("Either 'mlp_hidden_dim' or 'mlp_expansion_factor' must be provided.")
            self.mlp_hidden_dim = self.hidden_dim * self.mlp_expansion_factor
        return self

    @model_validator(mode="after")
    def set_wavelength_for_rope(self):
        if self.use_rope and self.max_wavelength is None:
            raise ValueError("max_wavelength must be provided when use_rope is True.")
        return self

    @computed_field
    def linear_projection_config(self) -> "LinearProjectionConfig":
        return LinearProjectionConfig(
            input_dim=self.hidden_dim,
            output_dim=self.hidden_dim,
            bias=self.bias,
            init_weights=self.init_weights,
        )

    @computed_field
    def layerscale_config(self) -> "LayerScaleConfig":
        return LayerScaleConfig(
            hidden_dim=self.hidden_dim,
            init_values=self.layerscale,
        )

    @computed_field
    def drop_path_config(self) -> "UnquantizedDropPathConfig":
        return UnquantizedDropPathConfig(drop_prob=self.drop_path)

    @computed_field
    def modulation_linear_projection_config(self) -> "LinearProjectionConfig | None":
        if self.condition_dim is not None:
            return LinearProjectionConfig(
                input_dim=self.condition_dim,
                output_dim=self.hidden_dim * 6,
                init_weights="zeros",
            )
        return None

    @computed_field
    def up_act_down_mlp_config(self) -> "UpActDownMLPConfig":
        return UpActDownMLPConfig(
            input_dim=self.hidden_dim,
            hidden_dim=self.mlp_hidden_dim,
            bias=self.bias,
            init_weights=self.init_weights,
        )

TransformerConfig extends the block config:

class TransformerConfig(ModelBaseConfig, InjectSharedFieldFromParentMixin):
    """Configuration for a Transformer model."""

    model_config = ConfigDict(extra="forbid")

    hidden_dim: int = Field(..., ge=1)
    """Hidden dimension of the model. Used for all transformer blocks."""

    depth: int = Field(..., ge=1)
    """Number of transformer blocks in the model."""

    transformer_block_config: Annotated[TransformerBlockConfig, Shared]

Multiple inheritance means TransformerConfig inherits:

Model management from ModelBaseConfig (optimizer, freezing, etc.)
Block parameters from TransformerBlockConfig (attention, MLP, etc.)
Adds Transformer model parameters (depth)
Overrides defaults (sets mlp_expansion_factor = 4)

From schema to YAML¶

Understanding the schema tells you which YAML fields are required and optional. Here is the full Transformer model config:

kind: noether.modeling.models.AeroTransformer
name: transformer
hidden_dim: 192
transformer_block_config: 
  num_heads: 3
  mlp_expansion_factor: 4
  use_rope: true
depth: 12
optimizer_config: ${optimizer}
data_specs: ${data_specs}
forward_properties:
  - surface_position
  - volume_position
  - surface_features
  - volume_features

Configuration inheritance¶

UPT and AB-UPT models support automatic configuration injection from parent to submodules for shared parameters between parent and submodules.

When you set hidden_dim, num_heads, or mlp_expansion_factor at the top level of a UPT config (or just hidden_dim for AB-UPT), these values automatically propagate to submodules unless explicitly overridden. This reduces redundancy and keeps consistency across your model architecture.

For a deeper understanding of how configuration inheritance works, see Configuration Inheritance.

Example - UPT configuration:

kind: noether.modeling.models.AeroUPT
name: upt
hidden_dim: 192
approximator_depth: 12
num_heads: 3
mlp_expansion_factor: 4
use_rope: true
# use_bias_layers: true  TODO(markus): implement them for AeroUPT
data_specs: ${data_specs}
supernode_pooling_config:
  input_dim: ${data_specs.position_dim} 
  radius: 9
approximator_config:
  use_rope: ${model.use_rope}
decoder_config:
  depth: 12
  input_dim: ${data_specs.position_dim}  
  perceiver_block_config:
     use_rope: ${model.use_rope} 
optimizer_config: ${optimizer}
forward_properties:
  - surface_mask_query
  - surface_position_batch_idx
  - surface_position_supernode_idx
  - surface_position
  - surface_query_position
  - surface_query_features
  - volume_query_position
  - volume_query_features

Configuration schemas¶

In the Noether Framework, Pydantic schemas are used to validate configuration at runtime. Each component (model, trainer, dataset, etc.) has a corresponding config class that inherits from a base schema.

For example, the trainer config schema for this recipe is defined in trainers/aerodynamics_cfd.py:

class AerodynamicsCfdTrainerConfig(BaseTrainerConfig):
    surface_weight: float = 1.0
    """ Weight of the predicted values on the surface mesh. Defaults to 1.0.."""
    volume_weight: float = 1.0
    """Weight of the predicted values in the volume. Defaults to 1.0."""
    surface_pressure_weight: float = 1.0
    """Weight of the predicted values for the surface pressure. Defaults to 1.0."""
    surface_friction_weight: float = 0.0
    """Weight of the predicted values for the surface wall shear stress. Defaults to 0.0."""
    volume_velocity_weight: float = 1.0
    """Weight of the predicted values for the volume velocity. Defaults to 1.0."""
    volume_pressure_weight: float = 0.0
    """Weight of the predicted values for the volume total pressure coefficient. Defaults to 0.0."""
    volume_vorticity_weight: float = 0.0
    """Weight of the predicted values for the volume vorticity. Defaults to 0.0."""
    use_physics_features: bool = False

The kind field in most configs specifies the class path for instantiation. The Factory pattern uses this to dynamically import and instantiate the correct class with the validated configuration.