Design Principles and Limitations¶
This document outlines the core design principles, architectural patterns, and known limitations of the Noether Framework. It serves as a guide for developers to understand the “why” behind the code structure.
Design Principles¶
The framework is built upon five major pillars aimed at robustness, reproducibility, and clear separation of concerns.
1. Configuration-Driven Development (CDD)¶
Principle: Behavior is strictly separated from implementation.
Almost every component in the system (Datasets, Models, Layers) is instantiated via a specific Configuration Object rather than raw arguments.
Implementation:
We rely heavily on Pydantic models (e.g., DatasetBaseConfig, LinearProjectionConfig) to define the expected
inputs. Components are rarely instantiated directly; instead, they are built by passing a config object
to a constructor or factory.
# Example: A TransformerBlock receives a full config object, not individual args.
block = TransformerBlock(config=my_block_config)
2. Factory Pattern & Dynamic Instantiation¶
Principle: Decoupling configuration from import logic.
The framework uses a centralized “Factory” mechanism to build objects. It relies on Class Paths (strings) to resolve classes at runtime, avoiding heavy import chains at the top level.
Implementation:
The Factory.instantiate method uses the kind field in configurations to locate the correct class constructor.
# Config:
kind = "noether.core.callbacks.checkpoint.CheckpointCallback"
# Factory logic:
class_constructor = class_constructor_from_class_path(kind)
instance = class_constructor(**kwargs)
3. Strict Type Safety & Runtime Validation¶
Principle: Catch errors at initialization, not during training.
We aggressively use Python type hints (list[torch.Tensor], Literal) and Pydantic validators to ensure invalid
states are impossible to represent.
Implementation:
Pydantic Validators:
@model_validator(mode="after")ensures fields likendimare valid (e.g., only 1, 2,
or 3).
- Type Guards: Utility functions like validate_path allow strictly typed filesystem operations.
4. Defensive Programming¶
Principle: Do not trust the input.
The code frequently asserts state validity and checks types explicitly at runtime, even when type hints are present, to prevent silent failures in complex pipelines.
Implementation:
Explicit
isinstancechecks in factories.State guards (e.g.,
assert self._stats is not None) in stateful classes likeRunningMoments.
5. Composition over Inheritance¶
Principle: Complex behaviors are built by wrapping simple objects. We avoid deep inheritance trees. Instead, we use wrappers and composition to add functionality like caching, normalization, or logging.
Implementation:
The DatasetFactory does not subclass datasets to add features; it wraps a base dataset with a list of
dataset_wrappers (e.g., normalizers).
Limitations & Trade-offs¶
While the architecture ensures robustness, it introduces specific trade-offs that developers must be aware of.
- Tight Coupling of Configs to Code:
Many classes accept a single
configobject in their__init__. This makes it difficult to use these modules “standalone” (e.g., in a notebook) without first constructing the specific Pydantic configuration object they expect.
- Circular Dependency Risks:
Configuration schemas need to know about classes (for validation), and classes need to know about schemas (for typing). This occasionally forces the use of string-based class resolution instead of direct imports to avoid cycles.
- “Stringly” Typed Architecture:
The reliance on string paths (e.g.,
kind="torch.optim.SGD") for instantiation means that automated refactoring tools (like “Rename Class” in IDEs) may miss references in YAML or Config files.
- Factory Indirection:
The
Factory.instantiatemethod contains implicit logic (e.g., checking kwargs for “kind” if the config is None). This “magic” can sometimes obscure how exactly an object is being created during debugging.
- Boilerplate Overhead:
Adding a simple new feature often requires modifying three distinct files: the Implementation Class, the Configuration Schema, and the Factory logic. This favors stability over rapid prototyping speed.