noether.core.schemas.dataset¶

Attributes¶

DatasetWrappers

Classes¶

`DatasetWrapperConfig`
`RepeatWrapperConfig`
`ShuffleWrapperConfig`
`SubsetWrapperConfig`
`DatasetBaseConfig`
`StandardDatasetConfig`	Base config for datasets with fixed splits.
`DatasetSplitIDs`	Base class for dataset split ID validation with overlap checking.
`FieldDimSpec`	A specification for a group of named data fields and their dimensions.
`AeroDataSpecs`	Defines the complete data specification for a surrogate model.

Module Contents¶

class noether.core.schemas.dataset.DatasetWrapperConfig(/, **data)¶

Bases: pydantic.BaseModel

Parameters:: data (Any)

kind: str¶

class noether.core.schemas.dataset.RepeatWrapperConfig(/, **data)¶

Bases: DatasetWrapperConfig

Parameters:: data (Any)

repetitions: int = None¶: The number of times to repeat the dataset.

class noether.core.schemas.dataset.ShuffleWrapperConfig(/, **data)¶

Bases: DatasetWrapperConfig

Parameters:: data (Any)

seed: int | None = None¶: Random seed for shuffling. If None, a random seed is used.

class noether.core.schemas.dataset.SubsetWrapperConfig(/, **data)¶

Bases: DatasetWrapperConfig

Parameters:: data (Any)

indices: collections.abc.Sequence | None = None¶

start_index: int | None = None¶

end_index: int | None = None¶

start_percent: float | None = None¶

end_percent: float | None = None¶

noether.core.schemas.dataset.DatasetWrappers¶

class noether.core.schemas.dataset.DatasetBaseConfig(/, **data)¶

Bases: pydantic.BaseModel

Parameters:: data (Any)

kind: str¶: Kind of dataset to use.

pipeline: Any | None = None¶: Config of the pipeline to use for the dataset.

dataset_normalizers: dict[str, list[noether.core.schemas.normalizers.AnyNormalizer] | noether.core.schemas.normalizers.AnyNormalizer] | None = None¶: List of normalizers to apply to the dataset. The key is the data source name.

dataset_wrappers: list[DatasetWrappers] | None = None¶

included_properties: set[str] | None = None¶: Set of properties (i.e., getitem_* methods that are called) of this dataset that will be loaded, if not set all properties are loaded

excluded_properties: set[str] | None = None¶: Set of properties of this dataset that will NOT be loaded, even if they are present in the included list

model_config¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class noether.core.schemas.dataset.StandardDatasetConfig(/, **data)¶

Bases: DatasetBaseConfig

Base config for datasets with fixed splits.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

root: str¶: Root directory of the dataset.

split: Literal['train', 'val', 'test']¶: Which split of the dataset to use. Must be one of “train”, “val”, or “test”.

class noether.core.schemas.dataset.DatasetSplitIDs(/, **data)¶

Bases: pydantic.BaseModel, abc.ABC

Base class for dataset split ID validation with overlap checking.

This base class provides: 1. Automatic validation that train/val/test splits don’t have overlapping IDs 2. Optional size validation for datasets that have expected split sizes

Subclasses can optionally define class variables for size validation: - EXPECTED_TRAIN_SIZE: Expected number of training samples - EXPECTED_VAL_SIZE: Expected number of validation samples - EXPECTED_TEST_SIZE: Expected number of test samples - DATASET_NAME: Name of the dataset for error messages

If these are not defined, only overlap checking will be performed.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

EXPECTED_TRAIN_SIZE: ClassVar[int | None] = None¶

EXPECTED_VAL_SIZE: ClassVar[int | None] = None¶

EXPECTED_TEST_SIZE: ClassVar[int | None] = None¶

EXPECTED_HIDDEN_TEST_SIZE: ClassVar[int | None] = None¶

DATASET_NAME: ClassVar[str | None] = None¶

train: set[int]¶

val: set[int]¶

test: set[int]¶

extrap: set[int]¶

interp: set[int]¶

train_subset: set[int]¶

validate_splits()¶: Validate splits and check for overlaps.

class noether.core.schemas.dataset.FieldDimSpec¶

Bases: pydantic.RootModel[collections.OrderedDict[str, int]]

A specification for a group of named data fields and their dimensions.

property field_slices: dict[str, slice]¶

Calculates slice indices for each field in concatenation order.

Return type:: dict[str, slice]

property total_dim: int¶

Calculates the total dimension of all fields combined.

Return type:: int

keys()¶

values()¶

items()¶

class noether.core.schemas.dataset.AeroDataSpecs(/, **data)¶

Bases: pydantic.BaseModel

Defines the complete data specification for a surrogate model.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

position_dim: int = None¶: Dimension of the input position vectors.

surface_feature_dim: FieldDimSpec | None = None¶

volume_feature_dim: FieldDimSpec | None = None¶

surface_output_dims: FieldDimSpec¶

volume_output_dims: FieldDimSpec | None = None¶

conditioning_dims: FieldDimSpec | None = None¶

use_physics_features: bool = False¶

property surface_feature_dim_total: int¶

Calculates the total surface feature dimension.

Return type:: int

property volume_feature_dim_total: int¶

Calculates the total volume feature dimension.

Return type:: int

property total_output_dim: int¶

Calculates the total output dimension by summing surface and volume output dimensions.

Return type:: int

property volume_targets: set[str]¶

Returns the list of volume target field names.

Return type:: set[str]

property surface_targets: set[str]¶

Returns the list of surface target field names.

Return type:: set[str]

property surface_features: set[str]¶

Returns the list of surface feature field names.

Return type:: set[str]

property volume_features: set[str]¶

Returns the list of volume feature field names.

Return type:: set[str]