noether.core.schemas.dataset

Attributes

Classes

DatasetWrapperConfig

RepeatWrapperConfig

ShuffleWrapperConfig

SubsetWrapperConfig

DatasetBaseConfig

DatasetSplitIDs

Base class for dataset split ID validation with overlap checking.

FieldDimSpec

A specification for a group of named data fields and their dimensions.

AeroDataSpecs

Defines the complete data specification for a surrogate model.

Module Contents

class noether.core.schemas.dataset.DatasetWrapperConfig(/, **data)

Bases: pydantic.BaseModel

Parameters:

data (Any)

kind: str
class noether.core.schemas.dataset.RepeatWrapperConfig(/, **data)

Bases: DatasetWrapperConfig

Parameters:

data (Any)

repetitions: int = None

The number of times to repeat the dataset.

class noether.core.schemas.dataset.ShuffleWrapperConfig(/, **data)

Bases: DatasetWrapperConfig

Parameters:

data (Any)

seed: int | None = None

Random seed for shuffling. If None, a random seed is used.

class noether.core.schemas.dataset.SubsetWrapperConfig(/, **data)

Bases: DatasetWrapperConfig

Parameters:

data (Any)

indices: collections.abc.Sequence | None = None
start_index: int | None = None
end_index: int | None = None
start_percent: float | None = None
end_percent: float | None = None
noether.core.schemas.dataset.DatasetWrappers
class noether.core.schemas.dataset.DatasetBaseConfig(/, **data)

Bases: pydantic.BaseModel

Parameters:

data (Any)

kind: str

Kind of dataset to use.

root: str | None = None

Root directory of the dataset. If None, data is not loaded from disk, but somehow generated in memory.

pipeline: Any | None = None

Config of the pipeline to use for the dataset.

split: Literal['train', 'val', 'test']
dataset_normalizers: dict[str, list[noether.core.schemas.normalizers.AnyNormalizer]] | None = None

List of normalizers to apply to the dataset. The key is the data source name.

dataset_wrappers: list[DatasetWrappers] | None = None
included_properties: set[str] | None = None

Set of properties of this dataset that will be loaded, if not set all properties are loaded

excluded_properties: set[str] | None = None

Set of properties of this dataset that will NOT be loaded, even if they are present in the included list

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class noether.core.schemas.dataset.DatasetSplitIDs(/, **data)

Bases: pydantic.BaseModel, abc.ABC

Base class for dataset split ID validation with overlap checking.

This base class provides: 1. Automatic validation that train/val/test splits don’t have overlapping IDs 2. Optional size validation for datasets that have expected split sizes

Subclasses can optionally define class variables for size validation: - EXPECTED_TRAIN_SIZE: Expected number of training samples - EXPECTED_VAL_SIZE: Expected number of validation samples - EXPECTED_TEST_SIZE: Expected number of test samples - DATASET_NAME: Name of the dataset for error messages

If these are not defined, only overlap checking will be performed.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

EXPECTED_TRAIN_SIZE: ClassVar[int | None] = None
EXPECTED_VAL_SIZE: ClassVar[int | None] = None
EXPECTED_TEST_SIZE: ClassVar[int | None] = None
EXPECTED_HIDDEN_TEST_SIZE: ClassVar[int | None] = None
DATASET_NAME: ClassVar[str | None] = None
train: set[int]
val: set[int]
test: set[int]
extrap: set[int]
interp: set[int]
train_subset: set[int]
validate_splits()

Validate splits and check for overlaps.

class noether.core.schemas.dataset.FieldDimSpec

Bases: pydantic.RootModel[dict[str, int]]

A specification for a group of named data fields and their dimensions.

property field_slices: dict[str, slice]

Calculates slice indices for each field in concatenation order.

Return type:

dict[str, slice]

property total_dim: int

Calculates the total dimension of all fields combined.

Return type:

int

keys()
values()
items()
class noether.core.schemas.dataset.AeroDataSpecs(/, **data)

Bases: pydantic.BaseModel

Defines the complete data specification for a surrogate model.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

position_dim: int = None

Dimension of the input position vectors.

surface_feature_dim: FieldDimSpec | None = None
volume_feature_dim: FieldDimSpec | None = None
surface_output_dims: FieldDimSpec
volume_output_dims: FieldDimSpec | None = None
conditioning_dims: FieldDimSpec | None = None
use_physics_features: bool = False
property surface_feature_dim_total: int

Calculates the total surface feature dimension.

Return type:

int

property volume_feature_dim_total: int

Calculates the total volume feature dimension.

Return type:

int

property total_output_dim: int

Calculates the total output dimension by summing surface and volume output dimensions.

Return type:

int

property volume_targets: set[str]

Returns the list of volume target field names.

Return type:

set[str]

property surface_targets: set[str]

Returns the list of surface target field names.

Return type:

set[str]

property surface_features: set[str]

Returns the list of surface feature field names.

Return type:

set[str]

property volume_features: set[str]

Returns the list of volume feature field names.

Return type:

set[str]