noether.core.optimizer.param_group_modifiers

Submodules

Classes

ParamGroupModifierBase

Generic implementation to change properties of optimizer parameter groups.

LrScaleByNameModifier

Scales the learning rate of a certain parameter.

WeightDecayByNameModifier

Changes the weight decay value for a single parameter. Use-cases:

Package Contents

class noether.core.optimizer.param_group_modifiers.ParamGroupModifierBase

Generic implementation to change properties of optimizer parameter groups.

abstractmethod get_properties(model, name, param)

Returns the modified properties for a given model parameter. This method is called with all items of model.named_parameters() to compose the parameter groups for the whole model.

Parameters:
  • model (torch.nn.Module) – Model from which the parameter originates from. Used to extract properties (e.g., number of layers for a layerwise learning rate decay).

  • name (str) – Name of the parameter as stored inside the model.

  • param (torch.Tensor) – The parameter tensor.

Return type:

dict[str, float]

abstractmethod was_applied_successfully()

Checks if the parameter group modifier was applied successfully.

Return type:

bool

class noether.core.optimizer.param_group_modifiers.LrScaleByNameModifier(param_group_modifier_config)

Bases: noether.core.optimizer.param_group_modifiers.base.ParamGroupModifierBase

Scales the learning rate of a certain parameter.

Parameters:

param_group_modifier_config (noether.core.schemas.optimizers.ParamGroupModifierConfig)

scale
name
param_was_found = False
get_properties(model, name, param)

This method is called with all items of model.named_parameters() to compose the parameter groups for the whole model. If the desired parameter name is found, it returns a modifier that scales down the learning rate.

Parameters:
  • model (torch.nn.Module) – Model from which the parameter originates from. Used to extract properties (e.g., number of layers for a layerwise learning rate decay).

  • name (str) – Name of the parameter as stored inside the model.

  • param (torch.Tensor) – The parameter tensor.

Return type:

dict[str, float]

was_applied_successfully()

Check if the parameter was found within the model.

Return type:

bool

class noether.core.optimizer.param_group_modifiers.WeightDecayByNameModifier(param_group_modifier_config)

Bases: noether.core.optimizer.param_group_modifiers.base.ParamGroupModifierBase

Changes the weight decay value for a single parameter. Use-cases: - ViT exclude CLS token parameters - Transformer learned positional embeddings - Learnable query tokens for cross attention (“PerceiverPooling”)

Parameters:

param_group_modifier_config (noether.core.schemas.optimizers.ParamGroupModifierConfig)

name
value
param_was_found = False
get_properties(model, name, param)

This method is called with all items of model.named_parameters() to compose the parameter groups for the whole model. If the desired parameter name is found, it returns a modifier that sets the weight decay.

Parameters:
  • model (torch.nn.Module) – Model from which the parameter originates from. Used to extract properties (e.g., number of layers for a layerwise learning rate decay).

  • name (str) – Name of the parameter as stored inside the model.

  • param (torch.Tensor) – The parameter tensor.

Return type:

dict[str, float]

was_applied_successfully()

Check if the parameter was found within the model.