noether.data.datasets.cfd.zarr_aero_dataset

Zarr-backed AeroDataset that subsamples by reading chunks.

This is the read side of the chunked/sharded Zarr format. Instead of loading every field of a sample and discarding most points (the .pt + PointSamplingSampleProcessor path), the dataset reads only the random chunks it needs:

  • pre_getitem() selects random chunks per domain and fetches just those rows (byte-range reads against the sharded arrays), splitting the fused arrays back into per-field tensors.

  • the inherited getitem_* / with_normalizers machinery then serves those pre-read tensors, so normalization, key names and downstream collation are unchanged.

Set num_points to None per domain (the default) to read full samples — e.g. for evaluation — or to an integer to chunk-subsample at read time.

Classes

ZarrAeroDatasetConfig

Config for Zarr-backed aerodynamic datasets with chunk-based subsampling.

ZarrAeroDataset

AeroDataset reading from a converted Zarr store with chunk-based subsampling.

Module Contents

class noether.data.datasets.cfd.zarr_aero_dataset.ZarrAeroDatasetConfig(/, **data)

Bases: noether.data.base.dataset.StandardDatasetConfig

Config for Zarr-backed aerodynamic datasets with chunk-based subsampling.

root points at the converted Zarr store. Leave the num_* fields None to read full samples (e.g. evaluation); set them to chunk-subsample at read time, in which case the pipeline’s PointSamplingSampleProcessor becomes a no-op.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

num_surface_points: int | None = None

Surface points to chunk-sample per item (None = full surface).

num_volume_points: int | None = None

Volume points to chunk-sample per item (None = full volume).

num_geometry_points: int | None = None

If set, also emit geometry_position — an independent draw of this many surface points (AB-UPT).

sampling_seed: int | None = None

Seed for deterministic chunk selection (None = fresh subset each call).

read_concurrency: int = 1

Threads used to fetch a sample’s chunks in parallel (1 = serial; raise for S3).

class noether.data.datasets.cfd.zarr_aero_dataset.ZarrAeroDataset(dataset_config, filemap, num_points=None, sampling_seed=None, read_concurrency=1, num_geometry_points=None)

Bases: noether.data.datasets.cfd.dataset.AeroDataset

AeroDataset reading from a converted Zarr store with chunk-based subsampling.

Parameters:
  • dataset_config (noether.data.base.dataset.StandardDatasetConfig) – Standard dataset config; root points at the Zarr store root.

  • filemap (noether.data.schemas.FileMap) – Field-to-filename mapping (same one used for conversion).

  • num_points (dict[str, int | None] | None) – Per-domain target counts ({"surface": 3586, "volume": 4096}). A None value reads the whole domain. Defaults to full reads.

  • sampling_seed (int | None) – If set, chunk selection is deterministic per sample (seed sampling_seed + idx); otherwise a fresh subset is drawn each call.

  • read_concurrency (int) – Threads used to fetch a sample’s chunks in parallel. Keep at 1 for local stores; raise it to hide per-request latency on S3.

  • num_geometry_points (int | None) – If set, also emit geometry_position — an independent random draw of this many surface points (the AB-UPT shape-encoder input, distinct from the surface anchor points). None disables it.

store_root: str | pathlib.Path = ''
manifest
reader
num_points
sampling_seed = None
num_geometry_points = None
get_all_getitem_names()

Restrict to getitem_* for stored fields (+ computed/derived ones that apply).

Return type:

list[str]

pre_getitem(idx)

Pre-read the sample so all getitem_* share one chunk read (used by __getitem__).

Parameters:

idx (int)

Return type:

dict[str, Any]

getitem_geometry_position(idx)

Random draw of surface positions used as the AB-UPT geometry/shape-encoder input.

Independent of the surface anchor points; only emitted when num_geometry_points is set. Normalized with the surface position normalizer.

Parameters:

idx (int)

Return type:

torch.Tensor

post_getitem(idx, pre)

Drop the cached tensors for idx (used by __getitem__).

Parameters:
Return type:

None