noether.data.datasets.cfd.zarr_aero_dataset¶

Zarr-backed AeroDataset that subsamples by reading chunks.

This is the read side of the chunked/sharded Zarr format. Instead of loading every field of a sample and discarding most points (the .pt + PointSamplingSampleProcessor path), the dataset reads only the random chunks it needs:

pre_getitem() selects random chunks per domain and fetches just those rows (byte-range reads against the sharded arrays), splitting the fused arrays back into per-field tensors.
the inherited getitem_* / with_normalizers machinery then serves those pre-read tensors, so normalization, key names and downstream collation are unchanged.

Set num_points to None per domain (the default) to read full samples — e.g. for evaluation — or to an integer to chunk-subsample at read time.

Classes¶

`ZarrAeroDatasetConfig`	Config for Zarr-backed aerodynamic datasets with chunk-based subsampling.
`ZarrAeroDataset`	AeroDataset reading from a converted Zarr store with chunk-based subsampling.

Module Contents¶

class noether.data.datasets.cfd.zarr_aero_dataset.ZarrAeroDatasetConfig(/, **data)¶

Bases: noether.data.base.dataset.StandardDatasetConfig

Config for Zarr-backed aerodynamic datasets with chunk-based subsampling.

root points at the converted Zarr store. Leave the num_* fields None to read full samples (e.g. evaluation); set them to chunk-subsample at read time, in which case the pipeline’s PointSamplingSampleProcessor becomes a no-op.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

num_surface_points: int | None = None¶: Surface points to chunk-sample per item (None = full surface).

num_volume_points: int | None = None¶: Volume points to chunk-sample per item (None = full volume).

num_geometry_points: int | None = None¶: If set, also emit geometry_position — an independent draw of this many surface points (AB-UPT).

sampling_seed: int | None = None¶: Seed for deterministic chunk selection (None = fresh subset each call).

read_concurrency: int = 1¶: Threads used to fetch a sample’s chunks in parallel (1 = serial; raise for S3).

class noether.data.datasets.cfd.zarr_aero_dataset.ZarrAeroDataset(dataset_config, filemap, num_points=None, sampling_seed=None, read_concurrency=1, num_geometry_points=None)¶

Bases: noether.data.datasets.cfd.dataset.AeroDataset

AeroDataset reading from a converted Zarr store with chunk-based subsampling.

Parameters:

dataset_config (noether.data.base.dataset.StandardDatasetConfig) – Standard dataset config; root points at the Zarr store root.
filemap (noether.data.schemas.FileMap) – Field-to-filename mapping (same one used for conversion).
num_points (dict[str, int | None] | None) – Per-domain target counts ({"surface": 3586, "volume": 4096}). A None value reads the whole domain. Defaults to full reads.
sampling_seed (int | None) – If set, chunk selection is deterministic per sample (seed sampling_seed + idx); otherwise a fresh subset is drawn each call.
read_concurrency (int) – Threads used to fetch a sample’s chunks in parallel. Keep at 1 for local stores; raise it to hide per-request latency on S3.
num_geometry_points (int | None) – If set, also emit geometry_position — an independent random draw of this many surface points (the AB-UPT shape-encoder input, distinct from the surface anchor points). None disables it.

store_root: str | pathlib.Path = ''¶

manifest¶

reader¶

num_points¶

sampling_seed = None¶

num_geometry_points = None¶

get_all_getitem_names()¶

Restrict to getitem_* for stored fields (+ computed/derived ones that apply).

Return type:: list[str]

pre_getitem(idx)¶

Pre-read the sample so all getitem_* share one chunk read (used by __getitem__).

Parameters:: idx (int)
Return type:: dict[str, Any]

getitem_geometry_position(idx)¶

Random draw of surface positions used as the AB-UPT geometry/shape-encoder input.

Independent of the surface anchor points; only emitted when num_geometry_points is set. Normalized with the surface position normalizer.

Parameters:: idx (int)
Return type:: torch.Tensor

post_getitem(idx, pre)¶

Drop the cached tensors for idx (used by __getitem__).

Parameters:

idx (int)
pre (dict[str, Any] | None)

Return type:

None