noether.core.schemas.optimizers

Attributes

Classes

ParamGroupModifierConfig

Configuration for a parameter group modifier. Both for the LrScaleByNameModifier and the WeightDecayByNameModifier,

MuonSecondaryOptimizerConfig

Configuration of the secondary optimizer in MuonComposite.

OptimizerConfig

Base configuration for optimizers.

AdamOptimizerConfig

Configuration for Adam-family optimizers (AdamW, Lion).

SGDOptimizerConfig

Configuration for SGD.

MuonOptimizerConfig

Configuration for MuonComposite.

Module Contents

class noether.core.schemas.optimizers.ParamGroupModifierConfig(/, **data)

Bases: pydantic.BaseModel

Configuration for a parameter group modifier. Both for the LrScaleByNameModifier and the WeightDecayByNameModifier,

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

kind: str | None = None

The class path of the parameter group modifier. Either noether.core.optimizer.param_group_modifiers.LrScaleByNameModifier or noether.core.optimizer.param_group_modifiers.WeightDecayByNameModifier.

scale: float | None = None

The scaling factor for the learning rate. Must be greater than 0.0. Only for the LrScaleByNameModifier.

value: float | None = None

The weight decay value. With 0.0 the parameter is excluded from the weight decay. Only for the WeightDecayByNameModifier.

name: str

The name of the parameter within the model. E.g., ‘backbone.cls_token’.

check_scale_or_value_exclusive()

Validates that either ‘scale’ or ‘value’ is provided, but not both. This is a model-level validator that runs after individual field validation.

Return type:

Self

class noether.core.schemas.optimizers.MuonSecondaryOptimizerConfig(/, **data)

Bases: pydantic.BaseModel

Configuration of the secondary optimizer in MuonComposite.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

kind: str | None = None

The class path of the torch optimizer to use. E.g., ‘torch.optim.Lion’.

lr: float | None = None

The learning rate for the optimizer. Falls back to the primary lr if not set.

weight_decay: float | None = None

The weight decay. Falls back to the primary weight_decay if not set.

momentum: float | None = None

Momentum factor for optimizers like SGD.

betas: tuple[float, float] | None = None

Beta coefficients for Adam-style optimizers.

class noether.core.schemas.optimizers.OptimizerConfig(/, **data)

Bases: pydantic.BaseModel

Base configuration for optimizers.

Holds fields common to all optimizers plus the wrapper-level options. Optimizer-specific fields live on the dedicated subclasses.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

kind: str | None = None

The class path of the torch optimizer to use. E.g., ‘torch.optim.AdamW’.

lr: float | None = None

The learning rate for the optimizer.

weight_decay: float | None = None

The weight decay. Falls back to the primary weight_decay if not set.

clip_grad_value: float | None = None

The maximum value for gradient clipping.

clip_grad_norm: float | None = None

The maximum norm for gradient clipping.

param_group_modifiers_config: list[ParamGroupModifierConfig] | None = None

List of parameter group modifiers to apply. These can modify the learning rate or weight decay for specific parameters.

exclude_bias_from_weight_decay: bool = True

If true, excludes the bias parameters (i.e., parameters that end with ‘.bias’) from the weight decay. Default true.

exclude_normalization_params_from_weight_decay: bool = True

If true, excludes the weights of normalization layers from the weight decay. This is implemented by excluding all 1D tensors from the weight decay. Default true.

weight_decay_schedule: noether.core.schemas.schedules.AnyScheduleConfig | None = None
schedule_config: noether.core.schemas.schedules.AnyScheduleConfig | None = None
return_optim_wrapper_args()
Return type:

dict

class noether.core.schemas.optimizers.AdamOptimizerConfig(/, **data)

Bases: OptimizerConfig

Configuration for Adam-family optimizers (AdamW, Lion).

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

kind: Literal['torch.optim.AdamW', 'noether.core.optimizer.Lion'] = 'torch.optim.AdamW'

The class path of the torch optimizer to use. E.g., ‘torch.optim.AdamW’.

betas: tuple[float, float] | None = None

Beta coefficients for Adam-style optimizers.

class noether.core.schemas.optimizers.SGDOptimizerConfig(/, **data)

Bases: OptimizerConfig

Configuration for SGD.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

kind: Literal['torch.optim.SGD'] = 'torch.optim.SGD'

The class path of the torch optimizer to use. E.g., ‘torch.optim.AdamW’.

momentum: float | None = None

Momentum factor.

class noether.core.schemas.optimizers.MuonOptimizerConfig(/, **data)

Bases: OptimizerConfig

Configuration for MuonComposite.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

kind: Literal['noether.core.optimizer.MuonComposite'] = 'noether.core.optimizer.MuonComposite'

The class path of the torch optimizer to use. E.g., ‘torch.optim.AdamW’.

momentum: float | None = None

Momentum factor for the Muon optimizer.

secondary: MuonSecondaryOptimizerConfig | None = None

Configuration of the secondary optimizer in MuonComposite.

nesterov: bool | None = None

Enable Nesterov momentum in Muon. None uses Muon’s default (True).

ns_steps: int | None = None

Number of Newton-Schulz iteration steps. None uses Muon’s default (5).

adjust_lr_fn: Literal['original', 'match_rms_adamw'] | None = None

Per-matrix LR adjustment strategy. None uses Muon’s default ("original").

noether.core.schemas.optimizers.AnyOptimizerConfig