noether.core.schemas

Submodules

Attributes

Classes

Package Contents

class noether.core.schemas.BestCheckpointCallbackConfig(/, **data)

Bases: CallBackBaseConfig

Parameters:

data (Any)

name: Literal['BestCheckpointCallback'] = None
metric_key: str = None

“The key of the metric to be used for checking the best model.

save_frozen_weights: bool = None

Whether to also save the frozen weights of the model.

tolerances: list[int] | None = None

“If provided, this callback will produce multiple best models which differ in the amount of intervals they allow the metric to not improve. For example, tolerance=[5] with every_n_epochs=1 will store a checkpoint where at most 5 epochs have passed until the metric improved. Additionally, the best checkpoint over the whole training will always be stored (i.e., tolerance=infinite). When setting different tolerances, one can evaluate different early stopping configurations with one training run.

model_names: list[str] | None = None

Which model name to save (e.g., if only the encoder of an autoencoder should be stored, one could pass model_name=’encoder’ here). This only applies when training a CompositeModel. If None, all models are saved.

class noether.core.schemas.BestMetricCallbackConfig(/, **data)

Bases: CallBackBaseConfig

Parameters:

data (Any)

name: Literal['BestMetricCallback'] = None

The metric to use to dermine whether the current model obtained a new best (e.g., loss/valid/total)

source_metric_key: str = None

The metrics to keep track of (e.g., loss/test/total)

target_metric_keys: list[str] | None = None

The metrics to keep track of if they are present (useful when different model configurations log different evaluation metrics to avoid reconfiguring the callback).

optional_target_metric_keys: list[str] | None = None
class noether.core.schemas.CallBackBaseConfig(/, **data)

Bases: pydantic.BaseModel

Parameters:

data (Any)

name: str
kind: str | None = None
id: str | None = None

Optional unique identifier for this callback instance. Required when multiple stateful callbacks of the same type exist (e.g., two BestCheckpointCallbacks tracking different metrics). Used as the key when saving/loading callback state dicts to ensure correct matching on resume.

every_n_epochs: int | None = None

Epoch-based interval. Invokes the callback after every n epochs. Mutually exclusive with other intervals.

every_n_updates: int | None = None

Update-based interval. Invokes the callback after every n updates. Mutually exclusive with other intervals.

every_n_samples: int | None = None

Sample-based interval. Invokes the callback after every n samples. Mutually exclusive with other intervals.

batch_size: int | None = None

None (use the same batch_size as for training).

Type:

Batch size to use for this callback. Default

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_callback_frequency()

Ensures that exactly one frequency (‘every_n_*’) is specified and that ‘batch_size’ is present if ‘every_n_samples’ is used.

Return type:

CallBackBaseConfig

classmethod check_positive_values(v)

Ensures that all integer-based frequency and batch size fields are positive.

Parameters:

v (int | None)

Return type:

int | None

classmethod check_kind_is_not_empty(v)

Ensures the ‘kind’ field is a non-empty string.

Parameters:

v (str)

Return type:

str

noether.core.schemas.CallbacksConfig
class noether.core.schemas.CheckpointCallbackConfig(/, **data)

Bases: CallBackBaseConfig

Parameters:

data (Any)

name: Literal['CheckpointCallback'] = None
save_weights: bool = None

Whether to save the weights of the model every time this callback is invoked. The checkpoint name will contain the training iteration (e.g., epoch/update/sample) at which the checkpoint was saved.

save_optim: bool = None

Whether to save the optimizer state every time this callback is invoked. The checkpoint name will contain the training iteration (e.g., epoch/update/sample) at which the checkpoint was saved.

save_latest_weights: bool = None

Whether to save the latest weights of the model every time this callback is invoked. Note that the latest weights are always overwritten on the next invocation of this callback.

save_latest_optim: bool = None

Whether to save the latest optimizer state every time this callback is invoked. Note that the latest optimizer state is always overwritten on the next invocation of this callback

model_names: list[str] | None = None

The name of the model to save. If None, all models are saved.

class noether.core.schemas.EmaCallbackConfig(/, **data)

Bases: CallBackBaseConfig

Parameters:

data (Any)

name: Literal['EmaCallback'] = None
target_factors: list[float] = None

The factors for the EMA.

model_paths: list[str | None] | None = None

The paths to the models to apply the EMA to (i.e., composite_model.encoder/composite_model.decoder, path of the PyTorch nn.Modules in the checkpoint). If None, the EMA is applied to the whole model. When training with a CompositeModel, the paths on the submodules (i.e., ‘encoder’, ‘decoder’, etc.) should be provided via this field, otherwise the EMA will be applied to the CompositeModel as a whole which is not possible to restore later on.

save_weights: bool = None

Whether to save the EMA weights.

save_last_weights: bool = None

Save the weights of the model when training is over (i.e., at the end of training, save the EMA weights).

save_latest_weights: bool = None

Save the latest EMA weights. Note that the latest weights are always overwritten on the next invocation of this callback.

class noether.core.schemas.FixedEarlyStopperConfig(/, **data)

Bases: pydantic.BaseModel

Parameters:

data (Any)

kind: str | None = None
name: Literal['FixedEarlyStopper'] = None
stop_at_sample: int | None = None
stop_at_update: int | None = None
stop_at_epoch: int | None = None
validate_callback_frequency()

Ensures that exactly one stop (‘stop_at_*’) is specified

Return type:

FixedEarlyStopperConfig

class noether.core.schemas.MetricEarlyStopperConfig(/, **data)

Bases: CallBackBaseConfig

Parameters:

data (Any)

name: Literal['MetricEarlyStopper'] = None
metric_key: str

The key of the metric to monitor

tolerance: int

The number of times the metric can stagnate before stopping training

classmethod check_tolerance_positive(v)

Ensures that tolerance is at least 1.

Parameters:

v (int)

Return type:

int

class noether.core.schemas.OfflineLossCallbackConfig(/, **data)

Bases: PeriodicDataIteratorCallbackConfig

Parameters:

data (Any)

name: Literal['OfflineLossCallback'] = None
output_patterns_to_log: list[str] | None = None

additional arguments passed to the parent class.

Type:

For instance, if the output key is ‘some_loss’ and the pattern is [‘loss’]. **kwargs

class noether.core.schemas.OnlineLossCallbackConfig(/, **data)

Bases: CallBackBaseConfig

Parameters:

data (Any)

name: Literal['OnlineLossCallback'] = None
verbose: bool = None

Whether to also log to the (console) logger. If False, the loss will only logged to the experiment tracker.

class noether.core.schemas.TrackAdditionalOutputsCallbackConfig(/, **data)

Bases: CallBackBaseConfig

Parameters:

data (Any)

name: Literal['TrackAdditionalOutputsCallback'] = None
keys: list[str] | None = None

List of keys to track in the additional_outputs of the TrainerResult returned by the trainer’s update step.

patterns: list[str] | None = None

List of patterns to track in the additional_outputs of the TrainerResult returned by the trainer’s update step. Matched if it is contained in one of the update_outputs keys.

verbose: bool = None

If True uses the logger to print the tracked values otherwise uses no logger.

reduce: Literal['mean', 'last'] = None

The reduction method to be applied to the tracked values to reduce to scalar. Currently supports ‘mean’ and ‘last’.

log_output: bool = None

Whether to log the tracked scalar values.

save_output: bool = None

Whether to save the tracked scalar values to disk.

class noether.core.schemas.DatasetBaseConfig(/, **data)

Bases: pydantic.BaseModel

Parameters:

data (Any)

kind: str

Kind of dataset to use.

pipeline: Any | None = None

Config of the pipeline to use for the dataset.

dataset_normalizers: dict[str, list[noether.core.schemas.normalizers.AnyNormalizer] | noether.core.schemas.normalizers.AnyNormalizer] | None = None

List of normalizers to apply to the dataset. The key is the data source name.

dataset_wrappers: list[DatasetWrappers] | None = None
included_properties: set[str] | None = None

Set of properties (i.e., getitem_* methods that are called) of this dataset that will be loaded, if not set all properties are loaded

excluded_properties: set[str] | None = None

Set of properties of this dataset that will NOT be loaded, even if they are present in the included list

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class noether.core.schemas.StandardDatasetConfig(/, **data)

Bases: DatasetBaseConfig

Base config for datasets with fixed splits.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

root: str

Root directory of the dataset.

split: Literal['train', 'val', 'test']

Which split of the dataset to use. Must be one of “train”, “val”, or “test”.

noether.core.schemas.AnyInitializer
class noether.core.schemas.CheckpointInitializerConfig(/, **data)

Bases: InitializerConfig

Parameters:

data (Any)

kind: Literal['noether.core.initializers.CheckpointInitializer'] = None
load_optim: bool = None

Whether or not to load the optimizer state from the checkpoint. Default is True, as this is usually used to resume a training run

pop_ckpt_kwargs_keys: list[str] | None = None

which checkpoint to load. If a string is provided, must be one of (“latest”, “best_loss”). If a dictionary is provided, must contain keys “epoch”, “update”, “sample” to identify the checkpoint.

class noether.core.schemas.InitializerConfig(/, **data)

Bases: pydantic.BaseModel

Parameters:

data (Any)

kind: str = None
kwargs: dict[str, Any] | None = None

Additional keyword arguments to pass to the initializer.

run_id: str

A unique identifier for the training stage. This is used to find the correct checkpoint.

stage_name: str | None = None

The name of the stage training stage if defined. When training, the stage name is usually “train”.

model_name: str | None = None

The name of the model to load. This is the model_name used in CheckpointCallback.

checkpoint_tag: str | None | dict = None

Which checkpoint to load. Checkpoint is usually “latest” or “best_loss”, or “E*_U*_S*”, depending on which checkpoint you want to load.

model_info: str | None = None

Optional string to provide additional info about the model weights in the checkpoint filename. E.g., the stored weights are the EMA, or in a different precision.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class noether.core.schemas.PreviousRunInitializerConfig(/, **data)

Bases: CheckpointInitializerConfig

Parameters:

data (Any)

kind: Literal['noether.core.initializers.PreviousRunInitializer'] = None
load_optim: bool = None

Whether or not to load the optimizer state from the checkpoint. Default is True, as this is usually used to resume a training run

keys_to_remove: list[str] | None = None

List of keys to remove from the checkpoint.

patterns_to_remove: list[str] | None = None

List of patterns to remove from the checkpoint.

patterns_to_rename: list[dict] | None = None

List of patterns to rename in the checkpoint.

patterns_to_instantiate: list[str] | None = None

List of patterns to instantiate in the checkpoint.

class noether.core.schemas.ResumeInitializerConfig(/, **data)

Bases: CheckpointInitializerConfig

Parameters:

data (Any)

kind: Literal['noether.core.initializers.ResumeInitializer'] = None
load_optim: bool = None

Whether or not to load the optimizer state from the checkpoint. Default is True, as this is usually used to resume a training run

model_name: str = None

The name of the model to load. This is the model_name used in CheckpointCallback.

class noether.core.schemas.ModelBaseConfig(/, **data)

Bases: pydantic.BaseModel

Parameters:

data (Any)

kind: str

Kind of model to use, i.e. class path

name: str

Name of the model. Needs to be unique

optimizer_config: noether.core.schemas.optimizers.OptimizerConfig | None = None

The optimizer configuration to use for training the model. When a model is used for inference only, this can be left as None.

initializers: list[Annotated[noether.core.schemas.initializers.AnyInitializer, Field(discriminator='kind')]] | None = None

List of initializers configs to use for the model.

is_frozen: bool | None = False

Whether to freeze the model parameters (i.e., not trainable).

forward_properties: list[str] | None = []

List of properties to be used as inputs for the forward pass of the model. Only relevant when the train_step of the BaseTrainer is used. When overridden in a class method, this property is ignored.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property config_kind: str

The fully qualified import path for the configuration class.

Return type:

str

noether.core.schemas.AnyNormalizer
class noether.core.schemas.OptimizerConfig(/, **data)

Bases: pydantic.BaseModel

Parameters:

data (Any)

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

kind: str | None = None

The class path of the torch optimizer to use. E.g., ‘torch.optim.AdamW’.

lr: float | None = None

The learning rate for the optimizer.

weight_decay: float | None = None

The weight decay (L2 penalty) for the optimizer.

clip_grad_value: float | None = None

The maximum value for gradient clipping.

clip_grad_norm: float | None = None

The maximum norm for gradient clipping.

param_group_modifiers_config: list[ParamGroupModifierConfig] | None = None

List of parameter group modifiers to apply. These can modify the learning rate or weight decay for specific parameters.

exclude_bias_from_weight_decay: bool = True

If true, excludes the bias parameters (i.e., parameters that end with ‘.bias’) from the weight decay. Default true.

exclude_normalization_params_from_weight_decay: bool = True

If true, excludes the weights of normalization layers from the weight decay. This is implemented by excluding all 1D tensors from the weight decay. Default true.

weight_decay_schedule: noether.core.schemas.schedules.AnyScheduleConfig | None = None
schedule_config: noether.core.schemas.schedules.AnyScheduleConfig | None = None
return_optim_wrapper_args()
Return type:

dict

class noether.core.schemas.ParamGroupModifierConfig(/, **data)

Bases: pydantic.BaseModel

Configuration for a parameter group modifier. Both for the LrScaleByNameModifier and the WeightDecayByNameModifier,

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

kind: str | None = None

The class path of the parameter group modifier. Either noether.core.optimizer.param_group_modifiers.LrScaleByNameModifier or noether.core.optimizer.param_group_modifiers.WeightDecayByNameModifier.

scale: float | None = None

The scaling factor for the learning rate. Must be greater than 0.0. Only for the LrScaleByNameModifier.

value: float | None = None

The weight decay value. With 0.0 the parameter is excluded from the weight decay. Only for the WeightDecayByNameModifier.

name: str

The name of the parameter within the model. E.g., ‘backbone.cls_token’.

check_scale_or_value_exclusive()

Validates that either ‘scale’ or ‘value’ is provided, but not both. This is a model-level validator that runs after individual field validation.

Return type:

Self

noether.core.schemas.AnyScheduleConfig
class noether.core.schemas.ConstantScheduleConfig(/, **data)

Bases: ScheduleBaseConfig

Parameters:

data (Any)

kind: Literal['noether.core.schedules.ConstantSchedule'] = 'noether.core.schedules.ConstantSchedule'

The fully qualified class name of the scheduler.

value: float

The constant value that will be returned for all steps. Value should be equal to the learning rate defined in the optimizer.

class noether.core.schemas.CustomScheduleConfig(/, **data)

Bases: ScheduleBaseConfig

Parameters:

data (Any)

kind: Literal['noether.core.schedules.CustomSchedule'] = 'noether.core.schedules.CustomSchedule'

The fully qualified class name of the scheduler.

values: list[float]

The list of values that will be returned for each step. Values show ben as long as the number of steps.

class noether.core.schemas.DecreasingProgressScheduleConfig(/, **data)

Bases: ProgressScheduleConfig

Parameters:

data (Any)

kind: Literal['noether.core.schedules.DecreasingProgressSchedule'] = 'noether.core.schedules.DecreasingProgressSchedule'

The fully qualified class name of the scheduler.

max_value: float = None

Maximum (starting) value of the schedule.

end_value: float = None

Minimum (ending) value of the schedule.

class noether.core.schemas.IncreasingProgressScheduleConfig(/, **data)

Bases: ProgressScheduleConfig

Parameters:

data (Any)

kind: Literal['noether.core.schedules.IncreasingProgressSchedule'] = 'noether.core.schedules.IncreasingProgressSchedule'

The fully qualified class name of the scheduler.

start_value: float = None
max_value: float | None = None

Minimum (starting) value of the schedule.

class noether.core.schemas.LinearWarmupCosineDecayScheduleConfig(/, **data)

Bases: ScheduleBaseConfig

Parameters:

data (Any)

kind: Literal['noether.core.schedules.LinearWarmupCosineDecaySchedule'] = 'noether.core.schedules.LinearWarmupCosineDecaySchedule'

The fully qualified class name of the scheduler.

warmup_steps: int | None = None

The number of steps to linearly increase the value from start to max.

warmup_percent: float | None = None

The percentage of steps to linearly increase the value from start to max.

max_value: float = None

The maximum value of the scheduler from which to start the cosine decay phase. This should be equal to the learning rate defined in the optimizer. I.e., max value is learning rate

validate_warmup()

Ensures that exactly one of ‘warmup_steps’ or ‘warmup_percent’ is specified.

Return type:

LinearWarmupCosineDecayScheduleConfig

class noether.core.schemas.PeriodicBoolScheduleConfig(/, **data)

Bases: ScheduleBaseConfig

Parameters:

data (Any)

kind: Literal['noether.core.schedules.PeriodicBoolSchedule'] = 'noether.core.schedules.PeriodicBoolSchedule'

The fully qualified class name of the scheduler.

initial_state: bool

The initial (boolean) state of the scheduler (on or off).

off_value: float = None

The value to return when the scheduler is in the off state.

on_value: float = None

The value to return when the scheduler is in the on state.

off_duration: int = None

The number of steps the scheduler is in the off state.

on_duration: int = None

The number of steps the scheduler is in the on state.

invert: bool = None

Whether to invert the scheduler, i.e. return off_value when on and vice versa.

class noether.core.schemas.PolynomialDecreasingScheduleConfig(/, **data)

Bases: DecreasingProgressScheduleConfig

Parameters:

data (Any)

kind: Literal['noether.core.schedules.PolynomialDecreasingSchedule'] = 'noether.core.schedules.PolynomialDecreasingSchedule'

The fully qualified class name of the scheduler.

power: float = None

The power of the polynomial function.

class noether.core.schemas.PolynomialIncreasingScheduleConfig(/, **data)

Bases: IncreasingProgressScheduleConfig

Parameters:

data (Any)

kind: Literal['noether.core.schedules.PolynomialIncreasingSchedule'] = 'noether.core.schedules.PolynomialIncreasingSchedule'

The fully qualified class name of the scheduler.

power: float = None

The power of the polynomial function.

class noether.core.schemas.ProgressScheduleConfig(/, **data)

Bases: ScheduleBaseConfig

Parameters:

data (Any)

kind: Literal['noether.core.schedules.ProgressSchedule'] = 'noether.core.schedules.ProgressSchedule'

The fully qualified class name of the scheduler.

exclude_first: bool = None

Whether to exclude the first value of the schedule.

exclude_last: bool = None

Whether to exclude the last value of the schedule.

class noether.core.schemas.ScheduleBaseConfig(/, **data)

Bases: pydantic.BaseModel

Parameters:

data (Any)

kind: str | None = None

The fully qualified class name of the scheduler.

overhang_percent: float | None = None

The percentage by which the schedule is artificially prolonged. Mutually exclusive with overhang_steps.

overhang_steps: int | None = None

The number of steps by which the schedule is artificially prolonged. Mutually exclusive with overhang_percent.

start_value: float = None
end_value: float = None
weight_decay: float | None = None
start_percent: float | None = None

The percentage of steps at which the schedule starts.

end_percent: float | None = None

The percentage of steps at which the schedule ends.

start_step: int | None = None

The step at which the schedule starts.

end_step: int | None = None

The step at which the schedule ends.

interval: Literal['update', 'epoch'] = None

Whether the schedule is based on updates or epochs. Interval should be either “update” or “epoch”. Default is “update”. Under the hood steps is always used. However, when “epoch” is selected here, the step count is derived from epochs via the UpdateCounter.

check_mutual_exclusion()

Ensures that ‘overhang_percent’ and ‘overhang_steps’ are mutually exclusive.

Return type:

ScheduleBaseConfig

validate_start_end_steps()
Return type:

ScheduleBaseConfig

validate_start_end_percents()
Return type:

ScheduleBaseConfig

class noether.core.schemas.SchedulerConfig(/, **data)

Bases: ScheduleBaseConfig

Parameters:

data (Any)

kind: Literal['noether.core.schedules.scheduler.SchedulerConfig'] = 'noether.core.schedules.scheduler.SchedulerConfig'

The fully qualified class name of the scheduler.

warmup_percent: float = None
end_value: float = None
class noether.core.schemas.StepDecreasingScheduleConfig(/, **data)

Bases: DecreasingProgressScheduleConfig

Parameters:

data (Any)

kind: Literal['noether.core.schedules.StepDecreasingSchedule'] = 'noether.core.schedules.StepDecreasingSchedule'

The fully qualified class name of the scheduler.

factor: float = None

The factor by which the value decreases.

decreases_interval: float = None

The interval in range [0, 1] at which the value decreases.

check_interval()

Ensures that ‘interval’ is a float in the range (0, 1).

Return type:

StepDecreasingScheduleConfig

class noether.core.schemas.StepFixedScheduleConfig(/, **data)

Bases: ScheduleBaseConfig

Parameters:

data (Any)

kind: Literal['noether.core.schedules.StepFixedSchedule'] = 'noether.core.schedules.StepFixedSchedule'

The fully qualified class name of the scheduler.

start_value: float = None

The initial value of the scheduler.

factor: float = None

The factor by which the value is multiplied after reaching the next step provided in steps.

steps: list[float] = None

The steps at which the value changes, must be a list of floats in the range (0, 1).

validate_steps()

Ensures that ‘steps’ is a non-empty list of floats in the range (0, 1).

Return type:

StepFixedScheduleConfig

class noether.core.schemas.StepIntervalScheduleConfig(/, **data)

Bases: ScheduleBaseConfig

Parameters:

data (Any)

kind: Literal['noether.core.schedules.StepIntervalSchedule'] = 'noether.core.schedules.StepIntervalSchedule'

The fully qualified class name of the scheduler.

start_value: float = None

The initial value of the scheduler. I.e, the learning rate at step 0.

factor: float = None

The factor by which the value is multiplied after reaching the next interval.

update_interval: float = None

The interval in range (0, 1) at which the value changes.

check_update_interval(v)

Ensures that ‘update_interval’ is a float in the range (0, 1).

Parameters:

v (float)

Return type:

float

class noether.core.schemas.ConfigSchema(/, **data)

Bases: pydantic.BaseModel

Root configuration schema for all experiments in Noether.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

name: str | None = None

Name of the experiment.

accelerator: ACCELERATOR_TYPES = None

Type of accelerator to use. By default the system choose the best available accelerator. GPU > MPS > CPU.

stage_name: str | None = None

Name of the current stage. I.e., train, finetune, test, etc. When None, the run_id directory is used as output directory. Otherwise, run_id/stage_name is used.

dataset_kind: str | None = None

Kind of dataset to use i.e., class path.

dataset_root: str | None = None

Root directory of the dataset.

resume_run_id: str | None = None

Run ID to resume from. If None, start a new run. This can be used to resume training from the last checkpoint of a previous run when training was interrupted/failed.

resume_stage_name: str | None = None

Stage name to resume from. If None, resume from the default stage.

resume_checkpoint: str | None = None

Path to checkpoint to resume from. If None, the ‘latest’ checkpoint will be used.

seed: int = None

Random seed for reproducibility.

dataset_statistics: dict[str, list[float | int]] | None = None

Pre-computed dataset statistics, e.g., mean and std for normalization. Since some tensors are multi-dimensional, the statistics are stored as lists.

dataset_normalizer: dict[str, list[noether.core.schemas.normalizers.AnyNormalizer]] | None = None

List of normalizers to apply to the dataset. The key is the data source name.

tracker: noether.core.schemas.trackers.AnyTracker | None = None

Configuration for experiment tracking. If None, no tracking is used. If “disabled”, tracking is explicitly disabled. WandB is currently the only supported tracker.

run_id: str | None = None

Unique identifier for the run. If None, a new ID will be generated.

devices: str | None = None

Comma-separated list of device IDs to use. If None, all available devices will be used.

num_workers: int | None = None

Number of worker threads for data loading. If None, will use (#CPUs / #GPUs - 1) workers

cudnn_benchmark: bool = True

Whether to enable cudnn benchmark mode for this run.

cudnn_deterministic: bool = False

Whether to enable cudnn deterministic mode for this run.

datasets: dict[str, noether.core.schemas.dataset.DatasetBaseConfig] = None

Configuration for datasets. The key is the dataset and value is the configuration for that dataset. See DatasetBaseConfig for available options. The key “train” is reserved for the training dataset, but if not provided, the first dataset will be used as training dataset by default, other keys are arbitrary and can be used to identify datasets for different stages, e.g., “train”, “val”, “test”, etc. or different datasets for the same stage, e.g., “train_cfd”, “train_wind_turbine”, etc.

model: noether.core.schemas.models.ModelBaseConfig = None

Configuration for the model. See ModelBaseConfig for available options.

trainer: noether.core.schemas.trainers.BaseTrainerConfig = None

Configuration for the trainer. See BaseTrainerConfig for available options.

debug: bool = False

If True, enables debug mode with more verbose logging, no WandB logging and output written to debug directory.

store_code_in_output: bool = True

If True, store a copy of the current code in the output directory for reproducibility.

output_path: pathlib.Path

Path to output directory.

master_port: int = None

Port for distributed master node. If None, will be set from environment variable MASTER_PORT if available.

slurm: noether.core.schemas.slurm.SlurmConfig | None = None

Configuration for SLURM job submission.

classmethod empty_dict_is_none(v)

Pre-processes tracker input before validation.

Parameters:

v (Any)

Return type:

Any

classmethod validate_output_path(value)

Validates that the output path is valid.

Parameters:

value (pathlib.Path)

Return type:

pathlib.Path

serialize_output_path(value)
Parameters:

value (Any)

Return type:

Any

classmethod get_env_master_port(value)

Sets master_port from environment variable if available.

Parameters:

value (Any)

Return type:

Any

property config_schema_kind: str

The fully qualified import path for the configuration class.

Return type:

str

class noether.core.schemas.SlurmConfig(/, **data)

Bases: pydantic.BaseModel

Configuration for SLURM job submission via sbatch.

All fields are optional and default to None, meaning the cluster default will be used. This schema covers all sbatch directives available in SLURM.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

job_name: str | None = None

Name of the job (–job-name).

partition: str | None = None

Partition to submit the job to (–partition). Multiple partitions can be comma-separated.

reservation: str | None = None

Reserve resources from a named reservation (–reservation).

nodes: int | str | None = None

Number of nodes to allocate (–nodes). Can be an integer or a range like ‘2-4’.

ntasks: int | None = None

Total number of tasks to launch (–ntasks).

ntasks_per_node: int | None = None

Number of tasks per allocated node (–ntasks-per-node).

cpus_per_task: int | None = None

Number of CPUs per task (–cpus-per-task).

mem: str | None = None

Memory per node (–mem), e.g. ‘4G’, ‘512M’, ‘0’ for all available memory.

gpus: str | int | None = None

count’, e.g. ‘v100:2’.

Type:

Total GPUs for the job (–gpus). Can be a count or ‘type

gpus_per_node: str | int | None = None

count’, e.g. ‘a100:4’.

Type:

GPUs per node (–gpus-per-node). Can be a count or ‘type

gres: str | None = None

2,shard:1’.

Type:

Generic consumable resources (–gres), e.g. ‘gpu

time: str | None = None

‘minutes’, ‘MM:SS’, ‘HH:MM:SS’, ‘D-HH’, ‘D-HH:MM’, ‘D-HH:MM:SS’.

Type:

Wall clock time limit (–time). Formats

begin: str | None = None

00:00’, ‘now+1hour’.

Type:

Defer job start until the specified time (–begin), e.g. ‘2024-01-15T10

output: str | None = None

%j (job ID), %x (job name), %A (array master ID), %a (array task ID), %N (node name), %u (user name).

Type:

File path for stdout (–output). Supports replacement symbols

error: str | None = None

File path for stderr (–error). Same replacement symbols as output.

array: str | None = None

Job array specification (–array), e.g. ‘0-15’, ‘1,3,5,7’, ‘1-7%2’ (max 2 concurrent).

kill_on_invalid_dep: bool | None = None

Kill the job if any dependency is invalid (–kill-on-invalid-dep).

nice: int | None = None

Scheduling priority adjustment (–nice). Positive values lower priority.

chdir: str | None = None

Working directory for the job (–chdir).

env_path: str | None = None

Shell command to source before running the job (e.g. for activating a virtual environment) which should be used as ‘source env_path’.

to_srun_args()

Return a string of srun arguments for all non-None SLURM fields.

Fields that are not actual srun directives (experiment_file, source) are excluded. Boolean fields are rendered as bare flags when True and omitted when False.

Return type:

str

classmethod validate_time_format(value)

Validate SLURM time format.

Parameters:

value (str | None)

Return type:

str | None

classmethod validate_memory_format(value)

Validate SLURM memory format (number with optional K/M/G/T suffix).

Parameters:

value (str | None)

Return type:

str | None

classmethod validate_array_format(value)

Validate SLURM array specification.

Parameters:

value (str | None)

Return type:

str | None

classmethod validate_gpu_spec(value)

Validate GPU specification (count or type:count).

Parameters:

value (str | int | None)

Return type:

str | int | None

class noether.core.schemas.WandBTrackerSchema(/, **data)

Bases: pydantic.BaseModel

Parameters:

data (Any)

kind: Literal['noether.core.trackers.WandBTracker'] = None
entity: str | None = None

The entity name for the W&B project.

project: str | None = None

The project name for the W&B project.

mode: Literal['disabled', 'online', 'offline'] | None = None

he mode of W&B. Can be ‘disabled’, ‘online’, or ‘offline’.

class noether.core.schemas.BaseTrainerConfig(/, **data)

Bases: pydantic.BaseModel

Parameters:

data (Any)

kind: str
max_epochs: int | None = None

The maximum number of epochs to train for. Mutually exclusive with max_updates and max_samples. If set to 0, training will be skipped and all callbacks will be invoked once (useful for evaluation-only runs).

max_updates: int | None = None

The maximum number of updates to train for. Mutually exclusive with max_epochs and max_samples. If set to 0, training will be skipped and all callbacks will be invoked once (useful for evaluation-only runs).

max_samples: int | None = None

The maximum number of samples to train for. Mutually exclusive with max_epochs and max_updates. If set to 0, training will be skipped and all callbacks will be invoked once (useful for evaluation-only runs).

start_at_epoch: int | None = None

The epoch to start training at. This means that the trainer will skip all epochs before this epoch. Learning rate and other schedulers will be stepped accordingly. Useful for resuming training from a specific epoch.

add_default_callbacks: bool | None = None

Whether to add default callbacks. Default callbacks log things like simple dataset statistics or the current value of the learning rate if it is scheduled.

add_trainer_callbacks: bool | None = None

Whether to add trainer specific callbacks (e.g., a callback to log the training accuracy for a classification task).

effective_batch_size: int = None

the “global batch size”. In multi-GPU setups, the batch size per device, (“local batch size”) is effective_batch_size / number of devices. If gradient accumulation is used, the forward-pass batch size is derived by dividing by the number of gradient accumulation steps.

Type:

The effective batch size used for optimization. This is the number of samples that are processed before an update step is taken

precision: Literal['float32', 'fp32', 'float16', 'fp16', 'bfloat16', 'bf16'] = None

The precision to use for training (e.g., “float32”). Mixed precision training (e.g., “float16” or “bfloat16”) can be used to speed up training and reduce memory usage on supported hardware (e.g., NVIDIA GPUs).

callbacks: list[noether.core.schemas.callbacks.CallbacksConfig] | None = None

The callbacks to use for training.

initializer: noether.core.schemas.initializers.InitializerConfig | None = None

The initializer to use for training. Mainly used for resuming training via ResumeInitializer.

log_every_n_epochs: int | None = None

The integer number of epochs to periodically log at.

log_every_n_updates: int | None = None

The integer number of updates to periodically log at.

log_every_n_samples: int | None = None

The integer number of samples to periodically log at.

track_every_n_epochs: int | None = None

The integer number of epochs to periodically track metrics at.

track_every_n_updates: int | None = None

The integer number of updates to periodically track metrics at.

track_every_n_samples: int | None = None

The integer number of samples to periodically track metrics at.

max_batch_size: int | None = None

The maximum batch size to use for model forward pass in training. If the effective_batch_size is larger than max_batch_size, gradient accumulation will be used to simulate the larger batch size. For example, if effective_batch_size=8 and max_batch_size=2, 4 gradient accumulation steps will be taken before each optimizer step.

skip_nan_loss: bool = None

Whether to skip NaN losses. These can sometimes occur due to unlucky coincidences. If true, NaN losses will be skipped without terminating the training up until 100 NaN losses occurred in a row.

skip_nan_loss_max_count: int = None
disable_gradient_accumulation: bool = None

Whether to disable gradient accumulation. Gradient accumulation is sometimes used to simulate larger batch sizes, but can lead to worse generalization.

save_on_sigint: bool = None

Whether to save a checkpoint on SIGINT (Ctrl+C). SIGTERM always triggers a checkpoint save. When False (default), Ctrl+C will stop training immediately without saving.

use_torch_compile: bool = None

Whether to use torch.compile to compile the model for faster training.

find_unused_params: bool = None

Sets the find_unused_parameters flag of DistributedDataParallel.

static_graph: bool = None

Sets the static_graph flag of DistributedDataParallel.

forward_properties: list[str] | None = []

Properties (i.e., keys from the batch dict) from the input batch that are used as inputs to the model during the forward pass.

target_properties: list[str] | None = []

Properties (i.e., keys from the batch dict) from the input batch that are used as targets for the model during the forward pass.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_callback_frequency()

Ensures that exactly one frequency (‘every_n_*’) is specified and that ‘batch_size’ is present if ‘every_n_samples’ is used.

Return type:

BaseTrainerConfig