noether.data.zarr_store.writer¶

Writer that converts per-sample CFD tensors into a sharded, pre-shuffled Zarr store.

Each sample becomes an independent Zarr group (<store_root>/<sample_id>.zarr) holding one array per field (surface/position, volume/velocity, …), so fields can be read independently. Points are shuffled once at write time with a deterministic, per-sample seed so that any contiguous chunk is already a uniform-random subset of the sample — this lets the dataloader turn “sample N random points” into “read a random chunk”. All arrays of a domain share the permutation and chunk grid, so chunk c is point-aligned across fields.

Arrays are chunked along the point axis (chunk_points) with the channel axis left unchunked, and packed into a single whole-array shard compressed per-chunk with blosc+zstd — the per-sample object count therefore stays at one object per field.

Classes¶

ZarrStoreWriter

Convert CFD samples into the chunked/sharded Zarr format and track the manifest.

Module Contents¶

class noether.data.zarr_store.writer.ZarrStoreWriter(store_root, filemap, dataset_name, shuffle_seed=0, chunk_points=4096, shard_points=None, coords_dtype='float32', values_dtype='float16', field_dtypes=None, compression_level=5)¶

Convert CFD samples into the chunked/sharded Zarr format and track the manifest.

Parameters:

store_root (str | pathlib.Path) – Output location for the Zarr store. A local path or an fsspec URL (s3://, gs://, memory://, …) for object storage.
filemap (noether.data.schemas.FileMap) – Field-to-filename mapping describing which fields exist.
dataset_name (str) – Human-readable dataset name recorded in the manifest.
shuffle_seed (int) – Base seed for the per-sample point shuffle.
chunk_points (int) – Chunk size along the point axis. Pick close to the training subsample size to minimise read amplification.
shard_points (int | None) – Cap on the shard size along the point axis (rounded down to a whole number of chunks, minimum one chunk). None (default) packs each array into a single whole-array shard. Set this when per-field arrays grow large: shard bytes ≈ shard_points × dim × dtype_size, so e.g. a ~128 MB cap on a float32×3 position array is shard_points ≈ 11_000_000. Smaller shards bound the writer’s per-shard RAM and the blast radius of a corrupt object, at the cost of more objects per array.
coords_dtype (str) – Dtype for the positions array (keep float32).
values_dtype (str) – Dtype for the physical fields array (float16 halves bytes).
field_dtypes (dict[str, str] | None) – Per-field dtype overrides keyed by canonical name, e.g. {"volume_vorticity": "float32"} for fields whose values exceed the values_dtype range (float16 caps at ~6.6e4); overflowing casts are rejected at write time rather than silently stored as inf.
compression_level (int) – blosc/zstd compression level.

store_root = ''¶

filemap¶

chunk_points = 4096¶

shard_points = None¶

coords_dtype = 'float32'¶

values_dtype = 'float16'¶

field_dtypes = None¶

compression_level = 5¶

layouts¶

manifest¶

write_group(sample_id, field_arrays)¶

Write one sample’s Zarr group and return its manifest entry (no manifest mutation).

Independent per sample (its own store), so this is safe to call concurrently from multiple threads; the caller records the returned entry in the manifest.

Parameters:

sample_id (str) – Stable id used for the relative path and shuffle seed (e.g. "param1/<hash>").
field_arrays (dict[str, numpy.ndarray]) – Mapping canonical_field -> numpy array. Positions must be (N, 3); scalar fields may be (N,) or (N, 1).

Returns:

The SampleEntry describing the written group.

Raises:

ValueError – If a domain’s fields disagree on point count.

Return type:

noether.data.zarr_store.manifest.SampleEntry

write_sample(sample_id, field_arrays)¶

Write one sample and record it in the manifest (sequential convenience).

Parameters:

sample_id (str)
field_arrays (dict[str, numpy.ndarray])

Return type:

None

to_init_kwargs()¶

Constructor kwargs to rebuild an identical writer (e.g. in a worker process).

Return type:: dict[str, object]

save_manifest()¶

Persist the manifest to the store root (local path or fsspec URL).

Return type:: str