noether.core.schemas¶
Submodules¶
- noether.core.schemas.callbacks
- noether.core.schemas.dataset
- noether.core.schemas.filemap
- noether.core.schemas.initializers
- noether.core.schemas.mixins
- noether.core.schemas.models
- noether.core.schemas.modules
- noether.core.schemas.normalizers
- noether.core.schemas.optimizers
- noether.core.schemas.schedules
- noether.core.schemas.schema
- noether.core.schemas.slurm
- noether.core.schemas.statistics
- noether.core.schemas.trackers
- noether.core.schemas.trainers
Attributes¶
Classes¶
Base config for datasets with fixed splits. |
|
Configuration for a parameter group modifier. Both for the LrScaleByNameModifier and the WeightDecayByNameModifier, |
|
Root configuration schema for all experiments in Noether. |
|
Configuration for SLURM job submission via sbatch. |
|
Package Contents¶
- class noether.core.schemas.BestCheckpointCallbackConfig(/, **data)¶
Bases:
CallBackBaseConfig- Parameters:
data (Any)
- name: Literal['BestCheckpointCallback'] = None¶
- tolerances: list[int] | None = None¶
“If provided, this callback will produce multiple best models which differ in the amount of intervals they allow the metric to not improve. For example, tolerance=[5] with every_n_epochs=1 will store a checkpoint where at most 5 epochs have passed until the metric improved. Additionally, the best checkpoint over the whole training will always be stored (i.e., tolerance=infinite). When setting different tolerances, one can evaluate different early stopping configurations with one training run.
- class noether.core.schemas.BestMetricCallbackConfig(/, **data)¶
Bases:
CallBackBaseConfig- Parameters:
data (Any)
- name: Literal['BestMetricCallback'] = None¶
The metric to use to dermine whether the current model obtained a new best (e.g., loss/valid/total)
- class noether.core.schemas.CallBackBaseConfig(/, **data)¶
Bases:
pydantic.BaseModel- Parameters:
data (Any)
- id: str | None = None¶
Optional unique identifier for this callback instance. Required when multiple stateful callbacks of the same type exist (e.g., two BestCheckpointCallbacks tracking different metrics). Used as the key when saving/loading callback state dicts to ensure correct matching on resume.
- every_n_epochs: int | None = None¶
Epoch-based interval. Invokes the callback after every n epochs. Mutually exclusive with other intervals.
- every_n_updates: int | None = None¶
Update-based interval. Invokes the callback after every n updates. Mutually exclusive with other intervals.
- every_n_samples: int | None = None¶
Sample-based interval. Invokes the callback after every n samples. Mutually exclusive with other intervals.
- batch_size: int | None = None¶
None (use the same batch_size as for training).
- Type:
Batch size to use for this callback. Default
- model_config¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_callback_frequency()¶
Ensures that exactly one frequency (‘every_n_*’) is specified and that ‘batch_size’ is present if ‘every_n_samples’ is used.
- Return type:
- classmethod check_positive_values(v)¶
Ensures that all integer-based frequency and batch size fields are positive.
- noether.core.schemas.CallbacksConfig¶
- class noether.core.schemas.CheckpointCallbackConfig(/, **data)¶
Bases:
CallBackBaseConfig- Parameters:
data (Any)
- name: Literal['CheckpointCallback'] = None¶
- save_weights: bool = None¶
Whether to save the weights of the model every time this callback is invoked. The checkpoint name will contain the training iteration (e.g., epoch/update/sample) at which the checkpoint was saved.
- save_optim: bool = None¶
Whether to save the optimizer state every time this callback is invoked. The checkpoint name will contain the training iteration (e.g., epoch/update/sample) at which the checkpoint was saved.
- save_latest_weights: bool = None¶
Whether to save the latest weights of the model every time this callback is invoked. Note that the latest weights are always overwritten on the next invocation of this callback.
- class noether.core.schemas.EmaCallbackConfig(/, **data)¶
Bases:
CallBackBaseConfig- Parameters:
data (Any)
- name: Literal['EmaCallback'] = None¶
- model_paths: list[str | None] | None = None¶
The paths to the models to apply the EMA to (i.e., composite_model.encoder/composite_model.decoder, path of the PyTorch nn.Modules in the checkpoint). If None, the EMA is applied to the whole model. When training with a CompositeModel, the paths on the submodules (i.e., ‘encoder’, ‘decoder’, etc.) should be provided via this field, otherwise the EMA will be applied to the CompositeModel as a whole which is not possible to restore later on.
- class noether.core.schemas.FixedEarlyStopperConfig(/, **data)¶
Bases:
pydantic.BaseModel- Parameters:
data (Any)
- name: Literal['FixedEarlyStopper'] = None¶
- validate_callback_frequency()¶
Ensures that exactly one stop (‘stop_at_*’) is specified
- Return type:
- class noether.core.schemas.MetricEarlyStopperConfig(/, **data)¶
Bases:
CallBackBaseConfig- Parameters:
data (Any)
- name: Literal['MetricEarlyStopper'] = None¶
- class noether.core.schemas.OfflineLossCallbackConfig(/, **data)¶
Bases:
PeriodicDataIteratorCallbackConfig- Parameters:
data (Any)
- name: Literal['OfflineLossCallback'] = None¶
- class noether.core.schemas.OnlineLossCallbackConfig(/, **data)¶
Bases:
CallBackBaseConfig- Parameters:
data (Any)
- name: Literal['OnlineLossCallback'] = None¶
- class noether.core.schemas.TrackAdditionalOutputsCallbackConfig(/, **data)¶
Bases:
CallBackBaseConfig- Parameters:
data (Any)
- name: Literal['TrackAdditionalOutputsCallback'] = None¶
- keys: list[str] | None = None¶
List of keys to track in the additional_outputs of the TrainerResult returned by the trainer’s update step.
- patterns: list[str] | None = None¶
List of patterns to track in the additional_outputs of the TrainerResult returned by the trainer’s update step. Matched if it is contained in one of the update_outputs keys.
- reduce: Literal['mean', 'last'] = None¶
The reduction method to be applied to the tracked values to reduce to scalar. Currently supports ‘mean’ and ‘last’.
- class noether.core.schemas.DatasetBaseConfig(/, **data)¶
Bases:
pydantic.BaseModel- Parameters:
data (Any)
- dataset_normalizers: dict[str, list[noether.core.schemas.normalizers.AnyNormalizer] | noether.core.schemas.normalizers.AnyNormalizer] | None = None¶
List of normalizers to apply to the dataset. The key is the data source name.
- included_properties: set[str] | None = None¶
Set of properties (i.e., getitem_* methods that are called) of this dataset that will be loaded, if not set all properties are loaded
- excluded_properties: set[str] | None = None¶
Set of properties of this dataset that will NOT be loaded, even if they are present in the included list
- model_config¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class noether.core.schemas.StandardDatasetConfig(/, **data)¶
Bases:
DatasetBaseConfigBase config for datasets with fixed splits.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
data (Any)
- split: Literal['train', 'val', 'test']¶
Which split of the dataset to use. Must be one of “train”, “val”, or “test”.
- noether.core.schemas.AnyInitializer¶
- class noether.core.schemas.CheckpointInitializerConfig(/, **data)¶
Bases:
InitializerConfig- Parameters:
data (Any)
- kind: Literal['noether.core.initializers.CheckpointInitializer'] = None¶
- class noether.core.schemas.InitializerConfig(/, **data)¶
Bases:
pydantic.BaseModel- Parameters:
data (Any)
- run_id: str¶
A unique identifier for the training stage. This is used to find the correct checkpoint.
- stage_name: str | None = None¶
The name of the stage training stage if defined. When training, the stage name is usually “train”.
- model_name: str | None = None¶
The name of the model to load. This is the model_name used in CheckpointCallback.
- checkpoint_tag: str | None | dict = None¶
Which checkpoint to load. Checkpoint is usually “latest” or “best_loss”, or “E*_U*_S*”, depending on which checkpoint you want to load.
- model_info: str | None = None¶
Optional string to provide additional info about the model weights in the checkpoint filename. E.g., the stored weights are the EMA, or in a different precision.
- model_config¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class noether.core.schemas.PreviousRunInitializerConfig(/, **data)¶
Bases:
CheckpointInitializerConfig- Parameters:
data (Any)
- kind: Literal['noether.core.initializers.PreviousRunInitializer'] = None¶
- class noether.core.schemas.ResumeInitializerConfig(/, **data)¶
Bases:
CheckpointInitializerConfig- Parameters:
data (Any)
- kind: Literal['noether.core.initializers.ResumeInitializer'] = None¶
- class noether.core.schemas.ModelBaseConfig(/, **data)¶
Bases:
pydantic.BaseModel- Parameters:
data (Any)
- optimizer_config: noether.core.schemas.optimizers.OptimizerConfig | None = None¶
The optimizer configuration to use for training the model. When a model is used for inference only, this can be left as None.
- initializers: list[Annotated[noether.core.schemas.initializers.AnyInitializer, Field(discriminator='kind')]] | None = None¶
List of initializers configs to use for the model.
- forward_properties: list[str] | None = []¶
List of properties to be used as inputs for the forward pass of the model. Only relevant when the train_step of the BaseTrainer is used. When overridden in a class method, this property is ignored.
- model_config¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- noether.core.schemas.AnyNormalizer¶
- class noether.core.schemas.OptimizerConfig(/, **data)¶
Bases:
pydantic.BaseModel- Parameters:
data (Any)
- model_config¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- param_group_modifiers_config: list[ParamGroupModifierConfig] | None = None¶
List of parameter group modifiers to apply. These can modify the learning rate or weight decay for specific parameters.
- exclude_bias_from_weight_decay: bool = True¶
If true, excludes the bias parameters (i.e., parameters that end with ‘.bias’) from the weight decay. Default true.
- class noether.core.schemas.ParamGroupModifierConfig(/, **data)¶
Bases:
pydantic.BaseModelConfiguration for a parameter group modifier. Both for the LrScaleByNameModifier and the WeightDecayByNameModifier,
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
data (Any)
- kind: str | None = None¶
The class path of the parameter group modifier. Either noether.core.optimizer.param_group_modifiers.LrScaleByNameModifier or noether.core.optimizer.param_group_modifiers.WeightDecayByNameModifier.
- scale: float | None = None¶
The scaling factor for the learning rate. Must be greater than 0.0. Only for the LrScaleByNameModifier.
- value: float | None = None¶
The weight decay value. With 0.0 the parameter is excluded from the weight decay. Only for the WeightDecayByNameModifier.
- check_scale_or_value_exclusive()¶
Validates that either ‘scale’ or ‘value’ is provided, but not both. This is a model-level validator that runs after individual field validation.
- Return type:
Self
- noether.core.schemas.AnyScheduleConfig¶
- class noether.core.schemas.ConstantScheduleConfig(/, **data)¶
Bases:
ScheduleBaseConfig- Parameters:
data (Any)
- kind: Literal['noether.core.schedules.ConstantSchedule'] = 'noether.core.schedules.ConstantSchedule'¶
The fully qualified class name of the scheduler.
- class noether.core.schemas.CustomScheduleConfig(/, **data)¶
Bases:
ScheduleBaseConfig- Parameters:
data (Any)
- kind: Literal['noether.core.schedules.CustomSchedule'] = 'noether.core.schedules.CustomSchedule'¶
The fully qualified class name of the scheduler.
- class noether.core.schemas.DecreasingProgressScheduleConfig(/, **data)¶
Bases:
ProgressScheduleConfig- Parameters:
data (Any)
- kind: Literal['noether.core.schedules.DecreasingProgressSchedule'] = 'noether.core.schedules.DecreasingProgressSchedule'¶
The fully qualified class name of the scheduler.
- class noether.core.schemas.IncreasingProgressScheduleConfig(/, **data)¶
Bases:
ProgressScheduleConfig- Parameters:
data (Any)
- kind: Literal['noether.core.schedules.IncreasingProgressSchedule'] = 'noether.core.schedules.IncreasingProgressSchedule'¶
The fully qualified class name of the scheduler.
- class noether.core.schemas.LinearWarmupCosineDecayScheduleConfig(/, **data)¶
Bases:
ScheduleBaseConfig- Parameters:
data (Any)
- kind: Literal['noether.core.schedules.LinearWarmupCosineDecaySchedule'] = 'noether.core.schedules.LinearWarmupCosineDecaySchedule'¶
The fully qualified class name of the scheduler.
- warmup_steps: int | None = None¶
The number of steps to linearly increase the value from start to max.
- warmup_percent: float | None = None¶
The percentage of steps to linearly increase the value from start to max.
- max_value: float = None¶
The maximum value of the scheduler from which to start the cosine decay phase. This should be equal to the learning rate defined in the optimizer. I.e., max value is learning rate
- validate_warmup()¶
Ensures that exactly one of ‘warmup_steps’ or ‘warmup_percent’ is specified.
- Return type:
- class noether.core.schemas.PeriodicBoolScheduleConfig(/, **data)¶
Bases:
ScheduleBaseConfig- Parameters:
data (Any)
- kind: Literal['noether.core.schedules.PeriodicBoolSchedule'] = 'noether.core.schedules.PeriodicBoolSchedule'¶
The fully qualified class name of the scheduler.
- class noether.core.schemas.PolynomialDecreasingScheduleConfig(/, **data)¶
Bases:
DecreasingProgressScheduleConfig- Parameters:
data (Any)
- kind: Literal['noether.core.schedules.PolynomialDecreasingSchedule'] = 'noether.core.schedules.PolynomialDecreasingSchedule'¶
The fully qualified class name of the scheduler.
- class noether.core.schemas.PolynomialIncreasingScheduleConfig(/, **data)¶
Bases:
IncreasingProgressScheduleConfig- Parameters:
data (Any)
- kind: Literal['noether.core.schedules.PolynomialIncreasingSchedule'] = 'noether.core.schedules.PolynomialIncreasingSchedule'¶
The fully qualified class name of the scheduler.
- class noether.core.schemas.ProgressScheduleConfig(/, **data)¶
Bases:
ScheduleBaseConfig- Parameters:
data (Any)
- kind: Literal['noether.core.schedules.ProgressSchedule'] = 'noether.core.schedules.ProgressSchedule'¶
The fully qualified class name of the scheduler.
- class noether.core.schemas.ScheduleBaseConfig(/, **data)¶
Bases:
pydantic.BaseModel- Parameters:
data (Any)
- overhang_percent: float | None = None¶
The percentage by which the schedule is artificially prolonged. Mutually exclusive with overhang_steps.
- overhang_steps: int | None = None¶
The number of steps by which the schedule is artificially prolonged. Mutually exclusive with overhang_percent.
- interval: Literal['update', 'epoch'] = None¶
Whether the schedule is based on updates or epochs. Interval should be either “update” or “epoch”. Default is “update”. Under the hood steps is always used. However, when “epoch” is selected here, the step count is derived from epochs via the UpdateCounter.
- check_mutual_exclusion()¶
Ensures that ‘overhang_percent’ and ‘overhang_steps’ are mutually exclusive.
- Return type:
- validate_start_end_steps()¶
- Return type:
- validate_start_end_percents()¶
- Return type:
- class noether.core.schemas.SchedulerConfig(/, **data)¶
Bases:
ScheduleBaseConfig- Parameters:
data (Any)
- kind: Literal['noether.core.schedules.scheduler.SchedulerConfig'] = 'noether.core.schedules.scheduler.SchedulerConfig'¶
The fully qualified class name of the scheduler.
- class noether.core.schemas.StepDecreasingScheduleConfig(/, **data)¶
Bases:
DecreasingProgressScheduleConfig- Parameters:
data (Any)
- kind: Literal['noether.core.schedules.StepDecreasingSchedule'] = 'noether.core.schedules.StepDecreasingSchedule'¶
The fully qualified class name of the scheduler.
- check_interval()¶
Ensures that ‘interval’ is a float in the range (0, 1).
- Return type:
- class noether.core.schemas.StepFixedScheduleConfig(/, **data)¶
Bases:
ScheduleBaseConfig- Parameters:
data (Any)
- kind: Literal['noether.core.schedules.StepFixedSchedule'] = 'noether.core.schedules.StepFixedSchedule'¶
The fully qualified class name of the scheduler.
- factor: float = None¶
The factor by which the value is multiplied after reaching the next step provided in steps.
- steps: list[float] = None¶
The steps at which the value changes, must be a list of floats in the range (0, 1).
- validate_steps()¶
Ensures that ‘steps’ is a non-empty list of floats in the range (0, 1).
- Return type:
- class noether.core.schemas.StepIntervalScheduleConfig(/, **data)¶
Bases:
ScheduleBaseConfig- Parameters:
data (Any)
- kind: Literal['noether.core.schedules.StepIntervalSchedule'] = 'noether.core.schedules.StepIntervalSchedule'¶
The fully qualified class name of the scheduler.
- class noether.core.schemas.ConfigSchema(/, **data)¶
Bases:
pydantic.BaseModelRoot configuration schema for all experiments in Noether.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
data (Any)
- accelerator: ACCELERATOR_TYPES = None¶
Type of accelerator to use. By default the system choose the best available accelerator. GPU > MPS > CPU.
- stage_name: str | None = None¶
Name of the current stage. I.e., train, finetune, test, etc. When None, the run_id directory is used as output directory. Otherwise, run_id/stage_name is used.
- resume_run_id: str | None = None¶
Run ID to resume from. If None, start a new run. This can be used to resume training from the last checkpoint of a previous run when training was interrupted/failed.
- resume_stage_name: str | None = None¶
Stage name to resume from. If None, resume from the default stage.
- resume_checkpoint: str | None = None¶
Path to checkpoint to resume from. If None, the ‘latest’ checkpoint will be used.
- dataset_statistics: dict[str, list[float | int]] | None = None¶
Pre-computed dataset statistics, e.g., mean and std for normalization. Since some tensors are multi-dimensional, the statistics are stored as lists.
- dataset_normalizer: dict[str, list[noether.core.schemas.normalizers.AnyNormalizer]] | None = None¶
List of normalizers to apply to the dataset. The key is the data source name.
- tracker: noether.core.schemas.trackers.AnyTracker | None = None¶
Configuration for experiment tracking. If None, no tracking is used. If “disabled”, tracking is explicitly disabled. WandB is currently the only supported tracker.
- devices: str | None = None¶
Comma-separated list of device IDs to use. If None, all available devices will be used.
- num_workers: int | None = None¶
Number of worker threads for data loading. If None, will use (#CPUs / #GPUs - 1) workers
- datasets: dict[str, noether.core.schemas.dataset.DatasetBaseConfig] = None¶
Configuration for datasets. The key is the dataset and value is the configuration for that dataset. See
DatasetBaseConfigfor available options. The key “train” is reserved for the training dataset, but if not provided, the first dataset will be used as training dataset by default, other keys are arbitrary and can be used to identify datasets for different stages, e.g., “train”, “val”, “test”, etc. or different datasets for the same stage, e.g., “train_cfd”, “train_wind_turbine”, etc.
- model: noether.core.schemas.models.ModelBaseConfig = None¶
Configuration for the model. See
ModelBaseConfigfor available options.
- trainer: noether.core.schemas.trainers.BaseTrainerConfig = None¶
Configuration for the trainer. See
BaseTrainerConfigfor available options.
- debug: bool = False¶
If True, enables debug mode with more verbose logging, no WandB logging and output written to debug directory.
- store_code_in_output: bool = True¶
If True, store a copy of the current code in the output directory for reproducibility.
- output_path: pathlib.Path¶
Path to output directory.
- master_port: int = None¶
Port for distributed master node. If None, will be set from environment variable MASTER_PORT if available.
- slurm: noether.core.schemas.slurm.SlurmConfig | None = None¶
Configuration for SLURM job submission.
- classmethod empty_dict_is_none(v)¶
Pre-processes tracker input before validation.
- Parameters:
v (Any)
- Return type:
Any
- classmethod validate_output_path(value)¶
Validates that the output path is valid.
- Parameters:
value (pathlib.Path)
- Return type:
- serialize_output_path(value)¶
- Parameters:
value (Any)
- Return type:
Any
- classmethod get_env_master_port(value)¶
Sets master_port from environment variable if available.
- Parameters:
value (Any)
- Return type:
Any
- class noether.core.schemas.SlurmConfig(/, **data)¶
Bases:
pydantic.BaseModelConfiguration for SLURM job submission via sbatch.
All fields are optional and default to None, meaning the cluster default will be used. This schema covers all sbatch directives available in SLURM.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
data (Any)
- model_config¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- partition: str | None = None¶
Partition to submit the job to (–partition). Multiple partitions can be comma-separated.
- nodes: int | str | None = None¶
Number of nodes to allocate (–nodes). Can be an integer or a range like ‘2-4’.
- gpus: str | int | None = None¶
count’, e.g. ‘v100:2’.
- Type:
Total GPUs for the job (–gpus). Can be a count or ‘type
- gpus_per_node: str | int | None = None¶
count’, e.g. ‘a100:4’.
- Type:
GPUs per node (–gpus-per-node). Can be a count or ‘type
- time: str | None = None¶
‘minutes’, ‘MM:SS’, ‘HH:MM:SS’, ‘D-HH’, ‘D-HH:MM’, ‘D-HH:MM:SS’.
- Type:
Wall clock time limit (–time). Formats
- begin: str | None = None¶
00:00’, ‘now+1hour’.
- Type:
Defer job start until the specified time (–begin), e.g. ‘2024-01-15T10
- output: str | None = None¶
%j (job ID), %x (job name), %A (array master ID), %a (array task ID), %N (node name), %u (user name).
- Type:
File path for stdout (–output). Supports replacement symbols
- array: str | None = None¶
Job array specification (–array), e.g. ‘0-15’, ‘1,3,5,7’, ‘1-7%2’ (max 2 concurrent).
- kill_on_invalid_dep: bool | None = None¶
Kill the job if any dependency is invalid (–kill-on-invalid-dep).
- env_path: str | None = None¶
Shell command to source before running the job (e.g. for activating a virtual environment) which should be used as ‘source env_path’.
- to_srun_args()¶
Return a string of srun arguments for all non-None SLURM fields.
Fields that are not actual srun directives (
experiment_file,source) are excluded. Boolean fields are rendered as bare flags whenTrueand omitted whenFalse.- Return type:
- classmethod validate_time_format(value)¶
Validate SLURM time format.
- classmethod validate_memory_format(value)¶
Validate SLURM memory format (number with optional K/M/G/T suffix).
- classmethod validate_array_format(value)¶
Validate SLURM array specification.
- class noether.core.schemas.WandBTrackerSchema(/, **data)¶
Bases:
pydantic.BaseModel- Parameters:
data (Any)
- kind: Literal['noether.core.trackers.WandBTracker'] = None¶
- class noether.core.schemas.BaseTrainerConfig(/, **data)¶
Bases:
pydantic.BaseModel- Parameters:
data (Any)
- max_epochs: int | None = None¶
The maximum number of epochs to train for. Mutually exclusive with max_updates and max_samples. If set to 0, training will be skipped and all callbacks will be invoked once (useful for evaluation-only runs).
- max_updates: int | None = None¶
The maximum number of updates to train for. Mutually exclusive with max_epochs and max_samples. If set to 0, training will be skipped and all callbacks will be invoked once (useful for evaluation-only runs).
- max_samples: int | None = None¶
The maximum number of samples to train for. Mutually exclusive with max_epochs and max_updates. If set to 0, training will be skipped and all callbacks will be invoked once (useful for evaluation-only runs).
- start_at_epoch: int | None = None¶
The epoch to start training at. This means that the trainer will skip all epochs before this epoch. Learning rate and other schedulers will be stepped accordingly. Useful for resuming training from a specific epoch.
- add_default_callbacks: bool | None = None¶
Whether to add default callbacks. Default callbacks log things like simple dataset statistics or the current value of the learning rate if it is scheduled.
- add_trainer_callbacks: bool | None = None¶
Whether to add trainer specific callbacks (e.g., a callback to log the training accuracy for a classification task).
- effective_batch_size: int = None¶
the “global batch size”. In multi-GPU setups, the batch size per device, (“local batch size”) is effective_batch_size / number of devices. If gradient accumulation is used, the forward-pass batch size is derived by dividing by the number of gradient accumulation steps.
- Type:
The effective batch size used for optimization. This is the number of samples that are processed before an update step is taken
- precision: Literal['float32', 'fp32', 'float16', 'fp16', 'bfloat16', 'bf16'] = None¶
The precision to use for training (e.g., “float32”). Mixed precision training (e.g., “float16” or “bfloat16”) can be used to speed up training and reduce memory usage on supported hardware (e.g., NVIDIA GPUs).
- callbacks: list[noether.core.schemas.callbacks.CallbacksConfig] | None = None¶
The callbacks to use for training.
- initializer: noether.core.schemas.initializers.InitializerConfig | None = None¶
The initializer to use for training. Mainly used for resuming training via ResumeInitializer.
- track_every_n_epochs: int | None = None¶
The integer number of epochs to periodically track metrics at.
- track_every_n_updates: int | None = None¶
The integer number of updates to periodically track metrics at.
- track_every_n_samples: int | None = None¶
The integer number of samples to periodically track metrics at.
- max_batch_size: int | None = None¶
The maximum batch size to use for model forward pass in training. If the effective_batch_size is larger than max_batch_size, gradient accumulation will be used to simulate the larger batch size. For example, if effective_batch_size=8 and max_batch_size=2, 4 gradient accumulation steps will be taken before each optimizer step.
- skip_nan_loss: bool = None¶
Whether to skip NaN losses. These can sometimes occur due to unlucky coincidences. If true, NaN losses will be skipped without terminating the training up until 100 NaN losses occurred in a row.
- disable_gradient_accumulation: bool = None¶
Whether to disable gradient accumulation. Gradient accumulation is sometimes used to simulate larger batch sizes, but can lead to worse generalization.
- save_on_sigint: bool = None¶
Whether to save a checkpoint on SIGINT (Ctrl+C). SIGTERM always triggers a checkpoint save. When False (default), Ctrl+C will stop training immediately without saving.
- use_torch_compile: bool = None¶
Whether to use torch.compile to compile the model for faster training.
- forward_properties: list[str] | None = []¶
Properties (i.e., keys from the batch dict) from the input batch that are used as inputs to the model during the forward pass.
- target_properties: list[str] | None = []¶
Properties (i.e., keys from the batch dict) from the input batch that are used as targets for the model during the forward pass.
- model_config¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_callback_frequency()¶
Ensures that exactly one frequency (‘every_n_*’) is specified and that ‘batch_size’ is present if ‘every_n_samples’ is used.
- Return type: