noether.core.schemas.optimizers

Classes

ParamGroupModifierConfig

Configuration for a parameter group modifier. Both for the LrScaleByNameModifier and the WeightDecayByNameModifier,

OptimizerConfig

Module Contents

class noether.core.schemas.optimizers.ParamGroupModifierConfig(/, **data)

Bases: pydantic.BaseModel

Configuration for a parameter group modifier. Both for the LrScaleByNameModifier and the WeightDecayByNameModifier,

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

kind: str | None = None

The class path of the parameter group modifier. Either noether.core.optimizer.param_group_modifiers.LrScaleByNameModifier or noether.core.optimizer.param_group_modifiers.WeightDecayByNameModifier.

scale: float | None = None

The scaling factor for the learning rate. Must be greater than 0.0. Only for the LrScaleByNameModifier.

value: float | None = None

The weight decay value. With 0.0 the parameter is excluded from the weight decay. Only for the WeightDecayByNameModifier.

name: str

The name of the parameter within the model. E.g., ‘backbone.cls_token’.

check_scale_or_value_exclusive()

Validates that either ‘scale’ or ‘value’ is provided, but not both. This is a model-level validator that runs after individual field validation.

Return type:

Self

class noether.core.schemas.optimizers.OptimizerConfig(/, **data)

Bases: pydantic.BaseModel

Parameters:

data (Any)

kind: str | None = None

The class path of the torch optimizer to use. E.g., ‘torch.optim.AdamW’.

lr: float | None = None

The learning rate for the optimizer.

weight_decay: float | None = None

The weight decay (L2 penalty) for the optimizer.

clip_grad_value: float | None = None

The maximum value for gradient clipping.

clip_grad_norm: float | None = None

The maximum norm for gradient clipping.

param_group_modifiers_config: list[ParamGroupModifierConfig] | None = None

List of parameter group modifiers to apply. These can modify the learning rate or weight decay for specific parameters.

exclude_bias_from_weight_decay: bool = True

If true, excludes the bias parameters (i.e., parameters that end with ‘.bias’) from the weight decay. Default true.

exclude_normalization_params_from_weight_decay: bool = True

If true, excludes the weights of normalization layers from the weight decay. This is implemented by excluding all 1D tensors from the weight decay. Default true.

weight_decay_schedule: noether.core.schemas.schedules.AnyScheduleConfig | None = None
schedule_config: noether.core.schemas.schedules.AnyScheduleConfig | None = None
return_optim_wrapper_args()
Return type:

dict