noether.core.schemas.dataset

Attributes

Classes

DatasetWrapperConfig

RepeatWrapperConfig

ShuffleWrapperConfig

SubsetWrapperConfig

PipelineConfig

Internal base class for all registry-based configs.

DatasetBaseConfig

Internal base class for all registry-based configs.

StandardDatasetConfig

Base config for datasets with fixed splits.

DatasetSplitIDs

Base class for dataset split ID validation with overlap checking.

FieldDimSpec

A specification for a group of named data fields and their dimensions.

DomainDataSpec

Data specification for a single domain (e.g., surface, volume, wake).

ModelDataSpecs

Base data specification for models that operate on arbitrary named domains.

Module Contents

class noether.core.schemas.dataset.DatasetWrapperConfig(/, **data)

Bases: pydantic.BaseModel

Parameters:

data (Any)

kind: str
class noether.core.schemas.dataset.RepeatWrapperConfig(/, **data)

Bases: DatasetWrapperConfig

Parameters:

data (Any)

repetitions: int = None

The number of times to repeat the dataset.

class noether.core.schemas.dataset.ShuffleWrapperConfig(/, **data)

Bases: DatasetWrapperConfig

Parameters:

data (Any)

seed: int | None = None

Random seed for shuffling. If None, a random seed is used.

class noether.core.schemas.dataset.SubsetWrapperConfig(/, **data)

Bases: DatasetWrapperConfig

Parameters:

data (Any)

indices: collections.abc.Sequence | None = None
start_index: int | None = None
end_index: int | None = None
start_percent: float | None = None
end_percent: float | None = None
class noether.core.schemas.dataset.PipelineConfig(/, **data)

Bases: noether.core.schemas.lib._RegistryBase

Internal base class for all registry-based configs.

Provides auto-registration via __init_subclass__. Not meant to be used directly - use specific config base classes instead.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

kind: str
noether.core.schemas.dataset.DatasetWrappers
noether.core.schemas.dataset.TPipelineConfig
class noether.core.schemas.dataset.DatasetBaseConfig[TPipelineConfig: PipelineConfig](/, **data)

Bases: noether.core.schemas.lib._RegistryBase

Internal base class for all registry-based configs.

Provides auto-registration via __init_subclass__. Not meant to be used directly - use specific config base classes instead.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

kind: str | None = None

Kind of dataset to use.

pipeline: Annotated[TPipelineConfig | None, Discriminated(PipelineConfig)] = None

Config of the pipeline to use for the dataset.

dataset_normalizers: dict[str, list[Annotated[Any, Discriminated(NormalizerConfig)]] | Annotated[Any, Discriminated(NormalizerConfig)]] | None = None

List of normalizers to apply to the dataset. The key is the data source name.

dataset_wrappers: list[DatasetWrappers] | None = None
included_properties: set[str] | None = None

Set of properties (i.e., getitem_* methods that are called) of this dataset that will be loaded, if not set all properties are loaded

excluded_properties: set[str] | None = None

Set of properties of this dataset that will NOT be loaded, even if they are present in the included list

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class noether.core.schemas.dataset.StandardDatasetConfig(/, **data)

Bases: DatasetBaseConfig, abc.ABC

Base config for datasets with fixed splits.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

root: str

Root directory of the dataset.

split: Literal['train', 'val', 'test']

Which split of the dataset to use. Must be one of “train”, “val”, or “test”.

class noether.core.schemas.dataset.DatasetSplitIDs(/, **data)

Bases: pydantic.BaseModel, abc.ABC

Base class for dataset split ID validation with overlap checking.

This base class provides: 1. Automatic validation that train/val/test splits don’t have overlapping IDs 2. Optional size validation for datasets that have expected split sizes

Subclasses can optionally define class variables for size validation: - EXPECTED_TRAIN_SIZE: Expected number of training samples - EXPECTED_VAL_SIZE: Expected number of validation samples - EXPECTED_TEST_SIZE: Expected number of test samples - DATASET_NAME: Name of the dataset for error messages

If these are not defined, only overlap checking will be performed.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

EXPECTED_TRAIN_SIZE: ClassVar[int | None] = None
EXPECTED_VAL_SIZE: ClassVar[int | None] = None
EXPECTED_TEST_SIZE: ClassVar[int | None] = None
EXPECTED_HIDDEN_TEST_SIZE: ClassVar[int | None] = None
DATASET_NAME: ClassVar[str | None] = None
train: list[int]
val: list[int]
test: list[int]
extrap: list[int] = []
interp: list[int] = []
train_subset: list[int] = []
validate_splits()

Validate splits and check for overlaps.

class noether.core.schemas.dataset.FieldDimSpec

Bases: pydantic.RootModel[collections.OrderedDict[str, int]]

A specification for a group of named data fields and their dimensions.

property field_slices: dict[str, slice]

Calculates slice indices for each field in concatenation order.

Return type:

dict[str, slice]

property total_dim: int

Calculates the total dimension of all fields combined.

Return type:

int

keys()
values()
items()
class noether.core.schemas.dataset.DomainDataSpec(/, **data)

Bases: pydantic.BaseModel

Data specification for a single domain (e.g., surface, volume, wake).

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

output_dims: FieldDimSpec

1, “velocity”: 3}.

Type:

Output fields and their dimensions for this domain, e.g. {“pressure”

feature_dim: FieldDimSpec | None = None

Input feature fields and their dimensions for this domain.

class noether.core.schemas.dataset.ModelDataSpecs(/, **data)

Bases: pydantic.BaseModel

Base data specification for models that operate on arbitrary named domains.

This is the minimal interface that model configs need from data specifications: position dimensions, available conditioning, and per-domain data descriptions.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

position_dim: int = None

Dimension of the input position vectors.

conditioning_dims: FieldDimSpec | None = None

Available conditioning features and their dimensions.

domains: dict[str, DomainDataSpec] = None

Per-domain data specifications keyed by domain name.

use_physics_features: bool = False

Whether physics features are used as input.

property total_output_dim: int

Calculates the total output dimension across all domains.

Return type:

int

property all_targets: set[str]

Returns all target field names across all domains, prefixed by domain name.

Return type:

set[str]

property all_features: set[str]

Returns all feature field names across all domains.

Return type:

set[str]