Training First Model (with Code) ================================ Prerequisites -------------- - you cloned the **Noether Framework** - you have a ``tutorial/`` folder in the repo root - you prepared the ``ShapeNet-Car`` dataset The fetching and preprocessing instructions are in the ``README.md`` located in the ``src/noether/data/datasets/cfd/shapenet_car/`` folder. Review them first and proceed with the next steps when ready. What we build in this tutorial ------------------------------ We will build a training run in Python code (no YAML). The code produces the same config object that the CLI would normally create. You will learn: - how the config is structured (datasets, model, trainer, callbacks) - what dataset stats/specs and normalizers do - how to start training with ``HydraRunner().main()`` Implementation -------------- Overview ~~~~~~~~ Sometimes you want to run training via code. We get it. Previously we covered :doc:`how to train using CLI and configs `, now we will focus on making things to work via the Python code. Relevant files for this can be found under ``src/noether/training/`` folder. Let's briefly go over it: - ``training/callbacks`` - callbacks executed during and post training - ``training/cli/`` - the CLI definition that we use in the :doc:`previous tutorial ` - ``training/runners/`` - core ``Hydra`` runner logic that we will use to make the code to work - ``training/trainers/`` - ``BaseTrainer`` that can be extended downstream, used directly in the ``Hydra`` runner In this tutorial we skip ``yaml`` configs and build the run config in Python. .. note:: The example code below uses typed configs and many small schema classes. If this looks overwhelming, don't worry: it will make sense as you follow the steps. Step 1: Create an entry point ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let's create a new file ``tutorial/train_shapenet_upt.py`` and use it to run our pipeline. Why "tutorial"? Because it has necessary components to get us started and here we want to see a difference between "configs vs. code" workflows. Step 2: Create necessary imports ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Most schema classes live in ``noether.core.schemas``. They help us keep typing consistent and validate inputs at runtime. These configs are Pydantic models, so if something is wrong you will get a clear validation error. .. code-block:: python from __future__ import annotations from pathlib import Path from typing import Any, Literal, Sequence import torch from noether.core.configs import StaticConfig from noether.core.schemas.callbacks import ( BestCheckpointCallbackConfig, CheckpointCallbackConfig, EmaCallbackConfig, OfflineLossCallbackConfig, ) from noether.core.schemas.dataset import AeroDataSpecs, DatasetBaseConfig, DatasetWrappers, RepeatWrapperConfig from noether.core.schemas.modules import ( DeepPerceiverDecoderConfig, SupernodePoolingConfig, PerceiverBlockConfig, TransformerBlockConfig, ) from noether.core.schemas.normalizers import AnyNormalizer, MeanStdNormalizerConfig, PositionNormalizerConfig from noether.core.schemas.optimizers import OptimizerConfig from noether.core.schemas.schedules import LinearWarmupCosineDecayScheduleConfig from noether.core.schemas.schema import ConfigSchema, StaticConfigSchema from noether.core.schemas.statistics import AeroStatsSchema from noether.training.runners import HydraRunner from tutorial.callbacks.surface_volume_evaluation_metrics import ( SurfaceVolumeEvaluationMetricsCallbackConfig, ) from tutorial.schemas.models.upt_config import UPTConfig from tutorial.schemas.pipelines.aero_pipeline_config import AeroCFDPipelineConfig from tutorial.schemas.trainers.automotive_aerodynamics_trainer_config import AutomotiveAerodynamicsCfdTrainerConfig Let's go over each group to better understand the outline: Data: - ``noether.core.configs`` - a code alternative to ``static_config.yaml`` from the previous tutorial, i.e. "output_path" - ``noether.core.schemas.dataset`` - dataset related configs and types for type-hinting - ``noether.core.schemas.modules`` - building blocks of our models, we will use UPT architecture - ``noether.core.schemas.normalizers`` - data normalization configs to be applied during the data loading - ``noether.core.schemas.statistics`` - data statistics aggregation, e.g. mean, std, etc. - etc. Training: - ``noether.core.schemas.callbacks`` - relevant callbacks for our training - ``noether.core.schemas.optimizers`` - optimizer config - ``tutorial.schemas.trainers.automotive_aerodynamics_trainer_config`` - trainer configuration - ``tutorial.schemas.models`` - configs for model initialization - etc. Execution: - ``noether.training.runners`` - orchestrators responsible for pipeline execution .. note:: ``HydraRunner`` from ``noether`` comes with a few public methods: ``run()`` is used by the CLI, and ``main()`` can be used to run the pipeline via Python. We use the latter to avoid YAML files in this tutorial. Step 3: Declare main() function ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python def main() -> None: dataset_root = Path("/Users/user/datasets/shapenet_car") # feel free to change this to match your structure output_path = dataset_root / "outputs" data_specs = build_specs() dataset_normalizer = build_dataset_normalizer() model_forward_properties = [ "surface_mask_query", "surface_position_batch_idx", "surface_position_supernode_idx", "surface_position", "surface_query_position", "volume_query_position", ] upt_model_config = build_model_config(data_specs, model_forward_properties) aero_trainer_config = build_trainer_config(data_specs, model_forward_properties) HydraRunner().main( device=torch.device("mps"), config=ConfigSchema(...), # note that '...' is a placeholder that we will populate later static_config=StaticConfig( config=StaticConfigSchema( output_path=output_path.as_posix(), ), ), ) if __name__ == "__main__": main() This is the core outline of the pipeline. If you will run this code you will get a lot of errors, but conceptually we are ready. Worth noting that we have several functions that start with ``build_*`` - we will create them in a minute, but first let's take a look at what we have: 1. We declared input and output directories for our training. 2. We will use python-based configs to define our pipeline. These configs will be created via ``build_*`` methods or directly in the ``HydraRunner().main(config=ConfigSchema(...))`` declaration. 3. We declare model properties that will be sent to the forward pass. 4. The final execution will be handled by ``noether``'s internal runner. Step 4: Dataset configs ~~~~~~~~~~~~~~~~~~~~~~~ Now we will declare dataset constants and convenience ``build_`` methods (you can place it right under the imports): .. code-block:: python DATASET_STATS = { "raw_pos_min": [-4.5], "raw_pos_max": [6.0], "surface_pressure_mean": [-36.4098], "surface_pressure_std": [48.6757], "volume_velocity_mean": [0.00293915, -0.0230546, 17.546032], "volume_velocity_std": [1.361689, 1.267649, 5.850353], "volume_sdf_mean": [3.74222e-01], "volume_sdf_std": [1.78948e-01], } DATA_SPECS = { "position_dim": 3, "surface_feature_dim": { "surface_sdf": 1, "surface_normals": 3, }, "volume_feature_dim": { "volume_sdf": 1, "volume_normals": 3, }, "surface_output_dims": { "pressure": 1, }, "volume_output_dims":{ "velocity": 3, }, } def build_stats() -> AeroStatsSchema: return AeroStatsSchema(**DATASET_STATS) def build_specs() -> AeroDataSpecs: return AeroDataSpecs(**DATA_SPECS) .. code-block:: python def build_dataset_config( mode: Literal["train", "test"], dataset_root: str, data_specs: dict[str, Any] | AeroDataSpecs, dataset_statistics: dict[str, Sequence[float]], dataset_normalizer: dict[str, list[AnyNormalizer]], dataset_wrappers: list[DatasetWrappers] | None = None, ) -> DatasetBaseConfig: return DatasetBaseConfig( kind="noether.data.datasets.cfd.ShapeNetCarDataset", root=dataset_root, pipeline=AeroCFDPipelineConfig( kind="tutorial.pipelines.AeroMultistagePipeline", num_surface_points=3586, # max = 3586 num_volume_points=4096, # max = 28504 num_surface_queries=3586, num_volume_queries=4096, num_supernodes=3586, sample_query_points=False, use_physics_features=False, dataset_statistics=AeroStatsSchema(**dataset_statistics), data_specs=data_specs if isinstance(data_specs, AeroDataSpecs) else AeroDataSpecs(**data_specs), ), split=mode, dataset_normalizers=dataset_normalizer, dataset_wrappers=dataset_wrappers, included_properties=None, excluded_properties={"surface_friction", "volume_pressure", "volume_vorticity"}, ) This config defines our datasets (both ``train`` and ``test``). Note the ``kind`` fields: they are strings that point to a Python class path inside the codebase. The factory uses them to build real objects, just like in the config-driven workflow. Step 5: Trainer config ~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python def build_trainer_config( data_specs: AeroDataSpecs, model_forward_properties: list[str], ) -> AutomotiveAerodynamicsCfdTrainerConfig: batch_size = 1 loss_and_log_every_n_epochs = 1 save_and_ema_every_n_epochs = 10 return AutomotiveAerodynamicsCfdTrainerConfig( kind="tutorial.trainers.AutomotiveAerodynamicsCFDTrainer", surface_weight=1.0, volume_weight=1.0, surface_pressure_weight=1.0, volume_velocity_weight=1.0, use_physics_features=False, precision="float32", max_epochs=500, effective_batch_size=batch_size, log_every_n_epochs=loss_and_log_every_n_epochs, callbacks=[ CheckpointCallbackConfig( kind="noether.core.callbacks.CheckpointCallback", save_weights=True, save_latest_weights=True, save_latest_optim=False, every_n_epochs=save_and_ema_every_n_epochs, ), # validation loss OfflineLossCallbackConfig( kind="noether.training.callbacks.OfflineLossCallback", batch_size=batch_size, every_n_epochs=loss_and_log_every_n_epochs, dataset_key="test", ), BestCheckpointCallbackConfig( kind="noether.core.callbacks.BestCheckpointCallback", every_n_epochs=batch_size, metric_key="loss/test/total", ), # test loss SurfaceVolumeEvaluationMetricsCallbackConfig( kind="tutorial.callbacks.SurfaceVolumeEvaluationMetricsCallback", batch_size=batch_size, every_n_epochs=loss_and_log_every_n_epochs, dataset_key="test", forward_properties=model_forward_properties, ), SurfaceVolumeEvaluationMetricsCallbackConfig( kind="tutorial.callbacks.SurfaceVolumeEvaluationMetricsCallback", batch_size=batch_size, every_n_epochs=500, dataset_key="test_repeat", forward_properties=model_forward_properties, ), # ema EmaCallbackConfig( kind="noether.core.callbacks.EmaCallback", every_n_epochs=save_and_ema_every_n_epochs, save_weights=False, save_last_weights=False, save_latest_weights=True, target_factors={0.9999}, ), ], forward_properties=model_forward_properties, data_specs=data_specs, target_properties=[ "surface_pressure_target", "volume_velocity_target", ], ) Trainer configuration is at the core of the training pipeline, as you can see the callbacks is the crucial component of it. Step 6: Filling the gaps ~~~~~~~~~~~~~~~~~~~~~~~~ At last, we will populate the placeholder fields that we declared for the ``HydraRunner``: .. code-block:: python HydraRunner().main( device=torch.device("mps"), config=ConfigSchema( name=None, accelerator="mps", # can be "cpu", "gpu", "mps" stage_name="train", dataset_kind="noether.data.datasets.cfd.ShapeNetCarDataset", dataset_root=dataset_root.as_posix(), resume_run_id=None, resume_stage_name=None, resume_checkpoint=None, seed=42, dataset_statistics=DATASET_STATS, dataset_normalizer=dataset_normalizer, static_config=StaticConfigSchema(output_path=output_path.as_posix()), tracker=None, run_id=None, devices=None, num_workers=None, datasets={ "train": build_dataset_config( mode="train", dataset_root=dataset_root.as_posix(), data_specs=data_specs, dataset_statistics=DATASET_STATS, dataset_normalizer=dataset_normalizer, dataset_wrappers=None, ), "test": build_dataset_config( mode="test", dataset_root=dataset_root.as_posix(), data_specs=data_specs, dataset_statistics=DATASET_STATS, dataset_normalizer=dataset_normalizer, dataset_wrappers=None, ), "test_repeat": build_dataset_config( mode="test", dataset_root=dataset_root.as_posix(), data_specs=data_specs, dataset_statistics=DATASET_STATS, dataset_normalizer=dataset_normalizer, dataset_wrappers=[RepeatWrapperConfig( kind="noether.data.base.wrappers.RepeatWrapper", repetitions=10, )], ), }, model=upt_model_config, trainer=aero_trainer_config, debug=False, store_code_in_output=False, static_config_path=None, ), static_config=StaticConfig( config=StaticConfigSchema( output_path=output_path.as_posix(), ), ), ) As you can see above, there are multiple arguments that were defined with ``None``. They are present here to show available settings that you can modify to your needs. You can also freely remove them from the code to make a bit more lightweight. This won't break the logic. Step 7: Run training ~~~~~~~~~~~~~~~~~~~~ You are ready to start the training! If you are using an IDE - simply run the file. Otherwise, in the repo root from your terminal: .. code-block:: bash uv run python tutorial/train_shapenet_upt.py If everything is set up correctly, you should see the logs indicating successful initialization and training (use your task manager and/or activity monitor to see if the hardware is properly utilized). The output directory will be populated with files like this: .. code-block:: bash shapenet_car/outputs/YYYY-MM-DD_ └── train ├── basetracker │ └── config.yaml ├── hp_resolved.yaml └── log.txt After the training progresses, check this folder again - you will find the checkpoints and other training artifacts there. That's it for this tutorial, see you in the next one!