Training First Model (with Code)¶
Prerequisites¶
you cloned the Noether Framework
you have a
tutorial/folder in the repo rootyou prepared the
ShapeNet-Cardataset
The fetching and preprocessing instructions are in the README.md located in the
src/noether/data/datasets/cfd/shapenet_car/ folder. Review them first and proceed with the next steps when ready.
What we build in this tutorial¶
We will build a training run in Python code (no YAML). The code produces the same config object that the CLI would normally create.
You will learn:
how the config is structured (datasets, model, trainer, callbacks)
what dataset stats/specs and normalizers do
how to start training with
HydraRunner().main()
Implementation¶
Overview¶
Sometimes you want to run training via code. We get it.
Previously we covered how to train using CLI and configs, now we will focus on making things to work via the Python code.
Relevant files for this can be found under src/noether/training/ folder. Let’s briefly go over it:
training/callbacks- callbacks executed during and post trainingtraining/cli/- the CLI definition that we use in the previous tutorialtraining/runners/- coreHydrarunner logic that we will use to make the code to worktraining/trainers/-BaseTrainerthat can be extended downstream, used directly in theHydrarunner
In this tutorial we skip yaml configs and build the run config in Python.
Note
The example code below uses typed configs and many small schema classes. If this looks overwhelming, don’t worry: it will make sense as you follow the steps.
Step 1: Create an entry point¶
Let’s create a new file tutorial/train_shapenet_upt.py and use it to run our pipeline. Why “tutorial”? Because
it has necessary components to get us started and here we want to see a difference between “configs vs. code” workflows.
Step 2: Create necessary imports¶
Most schema classes live in noether.core.schemas. They help us keep typing consistent and validate inputs at
runtime. These configs are Pydantic models, so if something is wrong you will get a clear validation error.
from __future__ import annotations
from pathlib import Path
from typing import Any, Literal, Sequence
import torch
from noether.core.configs import StaticConfig
from noether.core.schemas.callbacks import (
BestCheckpointCallbackConfig,
CheckpointCallbackConfig,
EmaCallbackConfig,
OfflineLossCallbackConfig,
)
from noether.core.schemas.dataset import AeroDataSpecs, StandardDatasetConfig, DatasetWrappers, RepeatWrapperConfig
from noether.core.schemas.modules import (
DeepPerceiverDecoderConfig,
SupernodePoolingConfig,
PerceiverBlockConfig,
TransformerBlockConfig,
)
from noether.core.schemas.normalizers import AnyNormalizer, MeanStdNormalizerConfig, PositionNormalizerConfig
from noether.core.schemas.optimizers import OptimizerConfig
from noether.core.schemas.schedules import LinearWarmupCosineDecayScheduleConfig
from noether.core.schemas.schema import ConfigSchema, StaticConfigSchema
from noether.core.schemas.statistics import AeroStatsSchema
from noether.training.runners import HydraRunner
from tutorial.callbacks.surface_volume_evaluation_metrics import (
SurfaceVolumeEvaluationMetricsCallbackConfig,
)
from tutorial.schemas.models.upt_config import UPTConfig
from tutorial.schemas.pipelines.aero_pipeline_config import AeroCFDPipelineConfig
from tutorial.schemas.trainers.automotive_aerodynamics_trainer_config import AutomotiveAerodynamicsCfdTrainerConfig
Let’s go over each group to better understand the outline:
Data:
noether.core.configs- the main config classes, contains configuration for all other componentsnoether.core.schemas.dataset- dataset related configs and types for type-hintingnoether.core.schemas.modules- building blocks of our models, we will use UPT architecturenoether.core.schemas.normalizers- data normalization configs to be applied during the data loadingnoether.core.schemas.statistics- data statistics aggregation, e.g. mean, std, etc.etc.
Training:
noether.core.schemas.callbacks- relevant callbacks for our trainingnoether.core.schemas.optimizers- optimizer configtutorial.schemas.trainers.automotive_aerodynamics_trainer_config- trainer configurationtutorial.schemas.models- configs for model initializationetc.
Execution:
noether.training.runners- orchestrators responsible for pipeline execution
Note
HydraRunner from noether comes with a few public methods: run() is used by the CLI, and main()
can be used to run the pipeline via Python. We use the latter to avoid YAML files in this tutorial.
Step 3: Declare main() function¶
def main() -> None:
dataset_root = Path("/Users/user/datasets/shapenet_car") # feel free to change this to match your structure
output_path = dataset_root / "outputs"
data_specs = build_specs()
dataset_normalizer = build_dataset_normalizer()
model_forward_properties = [
"surface_mask_query",
"surface_position_batch_idx",
"surface_position_supernode_idx",
"surface_position",
"surface_query_position",
"volume_query_position",
]
upt_model_config = build_model_config(data_specs, model_forward_properties)
aero_trainer_config = build_trainer_config(data_specs, model_forward_properties)
HydraRunner().main(
device=torch.device("mps"),
config=ConfigSchema(...), # note that '...' is a placeholder that we will populate later
output_path=output_path.as_posix(),
)
if __name__ == "__main__":
main()
This is the core outline of the pipeline. If you will run this code you will get a lot of errors, but conceptually we are ready.
Worth noting that we have several functions that start with build_* - we will create them in a minute, but first
let’s take a look at what we have:
We declared input and output directories for our training.
We will use python-based configs to define our pipeline. These configs will be created via
build_*methods or directly in theHydraRunner().main(config=ConfigSchema(...))declaration.We declare model properties that will be sent to the forward pass.
The final execution will be handled by
noether’s internal runner.
Step 4: Dataset configs¶
Now we will declare dataset constants and convenience build_ methods (you can place it right under the imports):
DATASET_STATS = {
"raw_pos_min": [-4.5],
"raw_pos_max": [6.0],
"surface_pressure_mean": [-36.4098],
"surface_pressure_std": [48.6757],
"volume_velocity_mean": [0.00293915, -0.0230546, 17.546032],
"volume_velocity_std": [1.361689, 1.267649, 5.850353],
"volume_sdf_mean": [3.74222e-01],
"volume_sdf_std": [1.78948e-01],
}
DATA_SPECS = {
"position_dim": 3,
"surface_feature_dim": {
"surface_sdf": 1,
"surface_normals": 3,
},
"volume_feature_dim": {
"volume_sdf": 1,
"volume_normals": 3,
},
"surface_output_dims": {
"pressure": 1,
},
"volume_output_dims":{
"velocity": 3,
},
}
def build_stats() -> AeroStatsSchema:
return AeroStatsSchema(**DATASET_STATS)
def build_specs() -> AeroDataSpecs:
return AeroDataSpecs(**DATA_SPECS)
def build_dataset_config(
mode: Literal["train", "test"],
dataset_root: str,
data_specs: dict[str, Any] | AeroDataSpecs,
dataset_statistics: dict[str, Sequence[float]],
dataset_normalizer: dict[str, list[AnyNormalizer]],
dataset_wrappers: list[DatasetWrappers] | None = None,
) -> StandardDatasetConfig:
return StandardDatasetConfig(
kind="noether.data.datasets.cfd.ShapeNetCarDataset",
root=dataset_root,
pipeline=AeroCFDPipelineConfig(
kind="tutorial.pipeline.AeroMultistagePipeline",
num_surface_points=3586, # max = 3586
num_volume_points=4096, # max = 28504
num_surface_queries=3586,
num_volume_queries=4096,
num_supernodes=3586,
sample_query_points=False,
use_physics_features=False,
dataset_statistics=AeroStatsSchema(**dataset_statistics),
data_specs=data_specs if isinstance(data_specs, AeroDataSpecs) else AeroDataSpecs(**data_specs),
),
split=mode,
dataset_normalizers=dataset_normalizer,
dataset_wrappers=dataset_wrappers,
included_properties=None,
excluded_properties={"surface_friction", "volume_pressure", "volume_vorticity"},
)
This config defines our datasets (both train and test). Note the kind fields: they are strings that
point to a Python class path inside the codebase. The factory uses them to build real objects, just like in the
config-driven workflow.
Step 5: Trainer config¶
def build_trainer_config(model_forward_properties: list[str]) -> AutomotiveAerodynamicsCfdTrainerConfig:
batch_size = 1
loss_and_log_every_n_epochs = 1
save_and_ema_every_n_epochs = 10
return AutomotiveAerodynamicsCfdTrainerConfig(
kind="tutorial.trainers.AutomotiveAerodynamicsCFDTrainer",
surface_weight=1.0,
volume_weight=1.0,
surface_pressure_weight=1.0,
volume_velocity_weight=1.0,
use_physics_features=False,
precision="float32",
max_epochs=500,
effective_batch_size=batch_size,
log_every_n_epochs=loss_and_log_every_n_epochs,
callbacks=[
CheckpointCallbackConfig(
kind="noether.core.callbacks.CheckpointCallback",
save_weights=True,
save_latest_weights=True,
save_latest_optim=False,
every_n_epochs=save_and_ema_every_n_epochs,
),
# validation loss
OfflineLossCallbackConfig(
kind="noether.training.callbacks.OfflineLossCallback",
batch_size=batch_size,
every_n_epochs=loss_and_log_every_n_epochs,
dataset_key="test",
),
BestCheckpointCallbackConfig(
kind="noether.core.callbacks.BestCheckpointCallback",
every_n_epochs=batch_size,
metric_key="loss/test/total",
),
# test loss
SurfaceVolumeEvaluationMetricsCallbackConfig(
kind="tutorial.callbacks.SurfaceVolumeEvaluationMetricsCallback",
batch_size=1,
every_n_epochs=loss_and_log_every_n_epochs,
dataset_key="test",
forward_properties=model_forward_properties,
),
SurfaceVolumeEvaluationMetricsCallbackConfig(
kind="tutorial.callbacks.SurfaceVolumeEvaluationMetricsCallback",
batch_size=1,
every_n_epochs=500,
dataset_key="test_repeat",
forward_properties=model_forward_properties,
),
# ema
EmaCallbackConfig(
kind="noether.core.callbacks.EmaCallback",
every_n_epochs=save_and_ema_every_n_epochs,
save_weights=False,
save_last_weights=False,
save_latest_weights=True,
target_factors={0.9999},
),
],
forward_properties=model_forward_properties,
target_properties=[
"surface_pressure_target",
"volume_velocity_target",
],
)
Trainer configuration is at the core of the training pipeline, as you can see the callbacks is the crucial component of it.
Step 6: Filling the gaps¶
At last, we will populate the placeholder fields that we declared for the HydraRunner:
HydraRunner().main(
device=torch.device("mps"),
config=ConfigSchema(
name=None,
accelerator="mps", # can be "cpu", "gpu", "mps"
stage_name="train",
dataset_kind="noether.data.datasets.cfd.ShapeNetCarDataset",
dataset_root=dataset_root.as_posix(),
resume_run_id=None,
resume_stage_name=None,
resume_checkpoint=None,
seed=42,
dataset_statistics=DATASET_STATS,
dataset_normalizer=dataset_normalizer,
output_path=output_path.as_posix(),
tracker=None,
run_id=None,
devices=None,
num_workers=None,
datasets={
"train": build_dataset_config(
mode="train",
dataset_root=dataset_root.as_posix(),
data_specs=data_specs,
dataset_statistics=DATASET_STATS,
dataset_normalizer=dataset_normalizer,
dataset_wrappers=None,
),
"test": build_dataset_config(
mode="test",
dataset_root=dataset_root.as_posix(),
data_specs=data_specs,
dataset_statistics=DATASET_STATS,
dataset_normalizer=dataset_normalizer,
dataset_wrappers=None,
),
"test_repeat": build_dataset_config(
mode="test",
dataset_root=dataset_root.as_posix(),
data_specs=data_specs,
dataset_statistics=DATASET_STATS,
dataset_normalizer=dataset_normalizer,
dataset_wrappers=[RepeatWrapperConfig(
kind="noether.data.base.wrappers.RepeatWrapper",
repetitions=10,
)],
),
},
model=upt_model_config,
trainer=aero_trainer_config,
debug=False,
store_code_in_output=False,
output_path=output_path.as_posix(),
),
)
As you can see above, there are multiple arguments that were defined with None. They are present here to show
available settings that you can modify to your needs. You can also freely remove them from the code to make a bit more
lightweight. This won’t break the logic.
Step 7: Run training¶
You are ready to start the training! If you are using an IDE - simply run the file. Otherwise, in the repo root from your terminal:
uv run python -m tutorial.train_shapenet_upt
This makes Python add the repo root to sys.path, so from tutorial.* works. Alternatively, you can add repo root
to PYTHONPATH:
PYTHONPATH=. uv run python tutorial/train_shapenet_upt.py
If everything is set up correctly, you should see the logs indicating successful initialization and training (use your task manager and/or activity monitor to see if the hardware is properly utilized).
The output directory will be populated with files like this:
shapenet_car/outputs/YYYY-MM-DD_<SHORT_ID>
└── train
├── basetracker
│ └── config.yaml
├── hp_resolved.yaml
└── log.txt
After the training progresses, check this folder again - you will find the checkpoints and other training artifacts there.