noether.core.schemas.schema

Classes

StaticConfigSchema

ConfigSchema

Main configuration schema for experiments.

Functions

master_port_from_env()

Gets the master port from the environment variable if available.

Module Contents

class noether.core.schemas.schema.StaticConfigSchema(/, **data)

Bases: pydantic.BaseModel

Parameters:

data (Any)

output_path: str

Path to store all outputs of the run, including logs, checkpoints, etc.

default_cudnn_benchmark: bool = True

Whether to enable cudnn benchmark mode by default.

default_cudnn_deterministic: bool | None = False

Whether to enable cudnn deterministic mode by default.

master_port: int | None = None

Port for distributed master node. If None, will be set from environment variable MASTER_PORT if available.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

static from_uri(uri)

Load static config from a given URI (file path).

Parameters:

uri (str)

Return type:

StaticConfigSchema

noether.core.schemas.schema.master_port_from_env()

Gets the master port from the environment variable if available.

Return type:

int

class noether.core.schemas.schema.ConfigSchema(/, **data)

Bases: pydantic.BaseModel

Main configuration schema for experiments.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

name: str | None = None

Name of the experiment.

accelerator: Literal['cpu', 'gpu', 'mps'] | None = None

Type of accelerator to use. Default is None, which lets the system choose the best available accelerator. GPU > MPS > CPU.

stage_name: str | None = None

Name of the current stage. I.e., train, finetune, test, etc

dataset_kind: str | None = None

Kind of dataset to use i.e., class path.

dataset_root: str | None = None

Root directory of the dataset.

resume_run_id: str | None = None

Run ID to resume from. If None, start a new run. This can be used to resume training from the last checkpoint of a previous run when training was interrupted/failed.

resume_stage_name: str | None = None

Stage name to resume from. If None, resume from the default stage.

resume_checkpoint: str | None = None

Path to checkpoint to resume from. If None, start from scratch.

seed: int = None

Random seed for reproducibility.

dataset_statistics: dict[str, list[float | int]] | None = None

Pre-computed dataset statistics, e.g., mean and std for normalization. Since some tensors are multi-dimensional, the statistics are stored as lists.

dataset_normalizer: dict[str, list[noether.core.schemas.normalizers.AnyNormalizer]] | None = None

List of normalizers to apply to the dataset. The key is the data source name.

tracker: noether.core.schemas.trackers.WandBTrackerSchema | None = None

Configuration for experiment tracking. If None, no tracking is used. If “disabled”, tracking is explicitly disabled. WandB is currently the only supported tracker.

run_id: str | None = None

Unique identifier for the run. If None, a new ID will be generated.

devices: str | None = None

Comma-separated list of device IDs to use. If None, all available devices will be used.

num_workers: int | None = None

Number of worker threads for data loading. If None, will use (#CPUs / #GPUs - 1) workers

cudnn_benchmark: bool = True

Whether to enable cudnn benchmark mode for this run.

cudnn_deterministic: bool = False

Whether to enable cudnn deterministic mode for this run.

datasets: dict[str, noether.core.schemas.dataset.DatasetBaseConfig] = None
model: noether.core.schemas.models.ModelBaseConfig = None
trainer: noether.core.schemas.trainers.BaseTrainerConfig = None
debug: bool = False

If True, enables debug mode with more verbose logging, no WandB logging and output written to debug directory.

store_code_in_output: bool = True

If True, store a copy of the current code in the output directory for reproducibility.

output_path: pathlib.Path

Path to output directory.

master_port: int = None

Port for distributed master node. If None, will be set from environment variable MASTER_PORT if available.

classmethod empty_dict_is_none(v)

Pre-processes tracker input before validation.

Parameters:

v (Any)

Return type:

Any

classmethod validate_output_path(value)

Validates that the output path is valid.

Parameters:

value (pathlib.Path)

Return type:

pathlib.Path

serialize_output_path(value)
Parameters:

value (Any)

Return type:

Any

classmethod get_env_master_port(value)

Sets master_port from environment variable if available.

Parameters:

value (Any)

Return type:

Any

property config_schema_kind: str

The fully qualified import path for the configuration class.

Return type:

str

set_accelerator_if_unset()

Sets the accelerator if it is not already set.

Return type:

ConfigSchema