Training First Model (with Code)¶

Prerequisites¶

you cloned the Noether Framework
you have the recipes/aero_cfd/ folder in the repo
you prepared the ShapeNet-Car dataset

The fetching and preprocessing instructions are in the README.md located in the src/noether/data/datasets/cfd/shapenet_car/ folder. Review them first and proceed with the next steps when ready.

What we build in this tutorial¶

We will build a training run in Python code (no YAML). The code produces the same config object that the CLI would normally create.

You will learn:

how the config is structured (datasets, model, trainer, callbacks)
what dataset stats/specs and normalizers do
how to start training with HydraRunner().main()

Implementation¶

Overview¶

Sometimes you want to run training via code. We get it.

Previously we covered how to train using CLI and configs, now we will focus on making things to work via the Python code.

Relevant files for this can be found under src/noether/training/ folder. Let’s briefly go over it:

training/callbacks - callbacks executed during and post training
training/cli/ - the CLI definition that we use in the previous tutorial
training/runners/ - core Hydra runner logic that we will use to make the code to work
training/trainers/ - BaseTrainer that can be extended downstream, used directly in the Hydra runner

In this tutorial we skip yaml configs and build the run config in Python.

Note

The example code below uses typed configs and many small schema classes. If this looks overwhelming, don’t worry: it will make sense as you follow the steps.

Step 1: Create an entry point¶

Let’s create a new file recipes/aero_cfd/train_shapenet_upt.py and use it to run our pipeline. This file will live alongside the recipe and let us see the difference between “configs vs. code” workflows.

Step 2: Create necessary imports¶

Most schema classes live in noether.core.schemas. They help us keep typing consistent and validate inputs at runtime. These configs are Pydantic models, so if something is wrong you will get a clear validation error.

from __future__ import annotations

from pathlib import Path
from typing import Any, Literal, Sequence

import torch

from noether.core.schemas.callbacks import (
    BestCheckpointCallbackConfig,
    CheckpointCallbackConfig,
    EmaCallbackConfig,
    OfflineLossCallbackConfig,
)
from noether.core.schemas.dataset import ModelDataSpecs, StandardDatasetConfig, DatasetWrappers, RepeatWrapperConfig
from noether.core.schemas.modules import (
    DeepPerceiverDecoderConfig,
    SupernodePoolingConfig,
    PerceiverBlockConfig,
    TransformerBlockConfig,
)
from noether.core.schemas.normalizers import AnyNormalizer, MeanStdNormalizerConfig, PositionNormalizerConfig
from noether.core.schemas.optimizers import OptimizerConfig
from noether.core.schemas.schedules import LinearWarmupCosineDecayScheduleConfig
from noether.core.schemas.statistics import AeroStatsSchema
from noether.core.schemas.models.upt import UPTConfig
from noether.training.runners import HydraRunner
from recipes.aero_cfd.callbacks import (
    AeroMetricsCallbackConfig,
)
from recipes.aero_cfd.pipeline import AeroCFDPipelineConfig
from recipes.aero_cfd.trainers import AerodynamicsCfdTrainerConfig

Let’s go over each group to better understand the outline:

Data:

noether.core.configs - the main config classes, contains configuration for all other components
noether.core.schemas.dataset - dataset related configs and types for type-hinting
noether.core.schemas.modules - building blocks of our models, we will use UPT architecture
noether.core.schemas.normalizers - data normalization configs to be applied during the data loading
noether.core.schemas.statistics - data statistics aggregation, e.g. mean, std, etc.
etc.

Training:

noether.core.schemas.callbacks - relevant callbacks for our training
noether.core.schemas.optimizers - optimizer config
recipes.aero_cfd.trainers - trainer configuration
recipes.aero_cfd.models - configs for model initialization
etc.

Execution:

noether.training.runners - orchestrators responsible for pipeline execution

Note

HydraRunner from noether comes with a few public methods: run() is used by the CLI, and main() can be used to run the pipeline via Python. We use the latter to avoid YAML files in this tutorial.

Step 3: Declare main() function¶

def main() -> None:
    dataset_root = Path("/Users/user/datasets/shapenet_car")  # feel free to change this to match your structure
    output_path = dataset_root / "outputs"

    data_specs = build_specs()
    dataset_normalizer = build_dataset_normalizer()

    model_forward_properties = [
        "surface_mask_query",
        "surface_position_batch_idx",
        "surface_position_supernode_idx",
        "surface_position",
        "surface_query_position",
        "volume_query_position",
    ]

    upt_model_config = build_model_config(data_specs, model_forward_properties)
    aero_trainer_config = build_trainer_config(data_specs, model_forward_properties)

    HydraRunner().main(
        device=torch.device("mps"),
        config=ConfigSchema(...),  # note that '...' is a placeholder that we will populate later
    )

if __name__ == "__main__":
    main()

This is the core outline of the pipeline. If you will run this code you will get a lot of errors, but conceptually we are ready.

Worth noting that we have several functions that start with build_* - we will create them in a minute, but first let’s take a look at what we have:

We declared input and output directories for our training.
We will use python-based configs to define our pipeline. These configs will be created via build_* methods or directly in the HydraRunner().main(config=ConfigSchema(...)) declaration.
We declare model properties that will be sent to the forward pass.
The final execution will be handled by noether’s internal runner.

Step 4: Dataset configs¶

Now we will declare dataset constants and convenience build_ methods (you can place it right under the imports):

DATASET_STATS = {
    "raw_pos_min":  [-4.5],
    "raw_pos_max": [6.0],
    "surface_pressure_mean": [-36.4098],
    "surface_pressure_std": [48.6757],
    "volume_velocity_mean": [0.00293915, -0.0230546, 17.546032],
    "volume_velocity_std": [1.361689, 1.267649, 5.850353],
    "volume_sdf_mean": [3.74222e-01],
    "volume_sdf_std": [1.78948e-01],
}
DATA_SPECS = {
    "position_dim": 3,
    "surface_feature_dim": {
        "surface_sdf": 1,
        "surface_normals": 3,
    },
    "volume_feature_dim": {
        "volume_sdf": 1,
        "volume_normals": 3,
    },
    "surface_output_dims": {
        "pressure": 1,
    },
    "volume_output_dims":{
        "velocity": 3,
    },
}


def build_stats() -> AeroStatsSchema:
    return AeroStatsSchema(**DATASET_STATS)


def build_specs() -> ModelDataSpecs:
    return ModelDataSpecs(**DATA_SPECS)

def build_dataset_config(
    mode: Literal["train", "test"],
    dataset_root: str,
    data_specs: dict[str, Any] | ModelDataSpecs,
    dataset_statistics: dict[str, Sequence[float]],
    dataset_normalizer: dict[str, list[AnyNormalizer]],
    dataset_wrappers: list[DatasetWrappers] | None = None,
) -> StandardDatasetConfig:
    return StandardDatasetConfig(
        kind="noether.data.datasets.cfd.ShapeNetCarDataset",
        root=dataset_root,
        pipeline=AeroCFDPipelineConfig(
            kind="recipes.aero_cfd.pipeline.multistage_pipelines.aero_multistage.AeroMultistagePipeline",
            num_surface_points=3586, # max = 3586
            num_volume_points=4096,   # max = 28504
            num_surface_queries=3586,
            num_volume_queries=4096,
            num_supernodes=3586,
            sample_query_points=False,
            use_physics_features=False,
            dataset_statistics=AeroStatsSchema(**dataset_statistics),
            data_specs=data_specs if isinstance(data_specs, ModelDataSpecs) else ModelDataSpecs(**data_specs),
        ),
        split=mode,
        dataset_normalizers=dataset_normalizer,
        dataset_wrappers=dataset_wrappers,
        included_properties=None,
        excluded_properties={"surface_friction", "volume_pressure", "volume_vorticity"},
    )

This config defines our datasets (both train and test). Note the kind fields: they are strings that point to a Python class path inside the codebase. The factory uses them to build real objects, just like in the config-driven workflow.

Step 5: Trainer config¶

def build_trainer_config(model_forward_properties: list[str]) -> AerodynamicsCfdTrainerConfig:
    batch_size = 1
    loss_and_log_every_n_epochs = 1
    save_and_ema_every_n_epochs = 10

    return AerodynamicsCfdTrainerConfig(
        kind="recipes.aero_cfd.trainers.aerodynamics_cfd.AerodynamicsCFDTrainer",
        surface_weight=1.0,
        volume_weight=1.0,
        surface_pressure_weight=1.0,
        volume_velocity_weight=1.0,
        use_physics_features=False,
        precision="float32",
        max_epochs=500,
        effective_batch_size=batch_size,
        log_every_n_epochs=loss_and_log_every_n_epochs,
        callbacks=[
            CheckpointCallbackConfig(
                kind="noether.core.callbacks.CheckpointCallback",
                save_weights=True,
                save_latest_weights=True,
                save_latest_optim=False,
                every_n_epochs=save_and_ema_every_n_epochs,
            ),
            # validation loss
            OfflineLossCallbackConfig(
                kind="noether.training.callbacks.OfflineLossCallback",
                batch_size=batch_size,
                every_n_epochs=loss_and_log_every_n_epochs,
                dataset_key="test",
            ),
            BestCheckpointCallbackConfig(
                kind="noether.core.callbacks.BestCheckpointCallback",
                every_n_epochs=batch_size,
                metric_key="loss/test/total",
            ),
            # test loss
            AeroMetricsCallbackConfig(
                kind="recipes.aero_cfd.callbacks.aero_metrics.AeroMetricsCallback",
                batch_size=1,
                every_n_epochs=loss_and_log_every_n_epochs,
                dataset_key="test",
                forward_properties=model_forward_properties,
            ),
            AeroMetricsCallbackConfig(
                kind="recipes.aero_cfd.callbacks.aero_metrics.AeroMetricsCallback",
                batch_size=1,
                every_n_epochs=500,
                dataset_key="test_repeat",
                forward_properties=model_forward_properties,
            ),
            # ema
            EmaCallbackConfig(
                kind="noether.core.callbacks.EmaCallback",
                every_n_epochs=save_and_ema_every_n_epochs,
                save_weights=False,
                save_last_weights=False,
                save_latest_weights=True,
                target_factors={0.9999},
            ),
        ],
        forward_properties=model_forward_properties,
        target_properties=[
            "surface_pressure_target",
            "volume_velocity_target",
        ],
    )

Trainer configuration is at the core of the training pipeline, as you can see the callbacks is the crucial component of it.

Step 6: Filling the gaps¶

At last, we will populate the placeholder fields that we declared for the HydraRunner:

HydraRunner().main(
    device=torch.device("mps"),
    config=ConfigSchema(
        name=None,
        accelerator="mps",  # can be "cpu", "gpu", "mps"
        stage_name="train",
        dataset_kind="noether.data.datasets.cfd.ShapeNetCarDataset",
        dataset_root=dataset_root.as_posix(),
        resume_run_id=None,
        resume_stage_name=None,
        resume_checkpoint=None,
        seed=42,
        dataset_statistics=DATASET_STATS,
        dataset_normalizer=dataset_normalizer,
        tracker=None,
        run_id=None,
        devices=None,
        num_workers=None,
        datasets={
            "train": build_dataset_config(
                mode="train",
                dataset_root=dataset_root.as_posix(),
                data_specs=data_specs,
                dataset_statistics=DATASET_STATS,
                dataset_normalizer=dataset_normalizer,
                dataset_wrappers=None,
            ),
            "test": build_dataset_config(
                mode="test",
                dataset_root=dataset_root.as_posix(),
                data_specs=data_specs,
                dataset_statistics=DATASET_STATS,
                dataset_normalizer=dataset_normalizer,
                dataset_wrappers=None,
            ),
            "test_repeat": build_dataset_config(
                mode="test",
                dataset_root=dataset_root.as_posix(),
                data_specs=data_specs,
                dataset_statistics=DATASET_STATS,
                dataset_normalizer=dataset_normalizer,
                dataset_wrappers=[RepeatWrapperConfig(
                    kind="noether.data.base.wrappers.RepeatWrapper",
                    repetitions=10,
                )],
            ),
        },
        model=upt_model_config,
        trainer=aero_trainer_config,
        debug=False,
        store_code_in_output=False,
        output_path=output_path.as_posix(),
    ),
)

As you can see above, there are multiple arguments that were defined with None. They are present here to show available settings that you can modify to your needs. You can also freely remove them from the code to make a bit more lightweight. This won’t break the logic.

Step 7: Run training¶

You are ready to start the training! If you are using an IDE - simply run the file. Otherwise, in the repo root from your terminal:

uv run python -m recipes.aero_cfd.train_shapenet_upt

This makes Python add the repo root to sys.path. Alternatively, you can add repo root to PYTHONPATH:

PYTHONPATH=. uv run python recipes/aero_cfd/train_shapenet_upt.py

If everything is set up correctly, you should see the logs indicating successful initialization and training (use your task manager and/or activity monitor to see if the hardware is properly utilized).

The output directory will be populated with files like this:

shapenet_car/outputs/YYYY-MM-DD_<SHORT_ID>
└── train
    ├── basetracker
    │   └── config.yaml
    ├── hp_resolved.yaml
    └── log.txt

After the training progresses, check this folder again - you will find the checkpoints and other training artifacts there.