noether.data.datasets.cfd.zarr_aero_dataset¶
Zarr-backed AeroDataset that subsamples by reading chunks.
This is the read side of the chunked/sharded Zarr format. Instead of loading every
field of a sample and discarding most points (the .pt + PointSamplingSampleProcessor
path), the dataset reads only the random chunks it needs:
pre_getitem()selects random chunks per domain and fetches just those rows (byte-range reads against the sharded arrays), splitting the fused arrays back into per-field tensors.the inherited
getitem_*/with_normalizersmachinery then serves those pre-read tensors, so normalization, key names and downstream collation are unchanged.
Set num_points to None per domain (the default) to read full samples — e.g. for
evaluation — or to an integer to chunk-subsample at read time.
Classes¶
Config for Zarr-backed aerodynamic datasets with chunk-based subsampling. |
|
AeroDataset reading from a converted Zarr store with chunk-based subsampling. |
Module Contents¶
- class noether.data.datasets.cfd.zarr_aero_dataset.ZarrAeroDatasetConfig(/, **data)¶
Bases:
noether.data.base.dataset.StandardDatasetConfigConfig for Zarr-backed aerodynamic datasets with chunk-based subsampling.
rootpoints at the converted Zarr store. Leave thenum_*fieldsNoneto read full samples (e.g. evaluation); set them to chunk-subsample at read time, in which case the pipeline’sPointSamplingSampleProcessorbecomes a no-op.Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
data (Any)
- num_surface_points: int | None = None¶
Surface points to chunk-sample per item (
None= full surface).
- num_geometry_points: int | None = None¶
If set, also emit
geometry_position— an independent draw of this many surface points (AB-UPT).
- class noether.data.datasets.cfd.zarr_aero_dataset.ZarrAeroDataset(dataset_config, filemap, num_points=None, sampling_seed=None, read_concurrency=1, num_geometry_points=None)¶
Bases:
noether.data.datasets.cfd.dataset.AeroDatasetAeroDataset reading from a converted Zarr store with chunk-based subsampling.
- Parameters:
dataset_config (noether.data.base.dataset.StandardDatasetConfig) – Standard dataset config;
rootpoints at the Zarr store root.filemap (noether.data.schemas.FileMap) – Field-to-filename mapping (same one used for conversion).
num_points (dict[str, int | None] | None) – Per-domain target counts (
{"surface": 3586, "volume": 4096}). ANonevalue reads the whole domain. Defaults to full reads.sampling_seed (int | None) – If set, chunk selection is deterministic per sample (seed
sampling_seed + idx); otherwise a fresh subset is drawn each call.read_concurrency (int) – Threads used to fetch a sample’s chunks in parallel. Keep at
1for local stores; raise it to hide per-request latency on S3.num_geometry_points (int | None) – If set, also emit
geometry_position— an independent random draw of this many surface points (the AB-UPT shape-encoder input, distinct from the surface anchor points).Nonedisables it.
- store_root: str | pathlib.Path = ''¶
- manifest¶
- reader¶
- num_points¶
- sampling_seed = None¶
- num_geometry_points = None¶
- get_all_getitem_names()¶
Restrict to
getitem_*for stored fields (+ computed/derived ones that apply).
- pre_getitem(idx)¶
Pre-read the sample so all getitem_* share one chunk read (used by
__getitem__).