noether.core.schemas.modules.blocks¶

Classes¶

`TransformerBlockConfig`	Configuration for a transformer block.
`PerceiverBlockConfig`	Configuration for the PerceiverBlock module.

class noether.core.schemas.modules.blocks.TransformerBlockConfig(/, **data)¶

Bases: pydantic.BaseModel

Configuration for a transformer block.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

mlp_hidden_dim: int | None = None¶: Hidden dimension of the MLP layer. If set to None, the mlp_hidden dim is set to hidden_dim * mlp_expansion_factor in the TransformerConfig. If both are None, an error is raised.

mlp_expansion_factor: int | None = None¶: Expansion factor for the MLP hidden dimension relative to the hidden dimension. If ‘mlp_hidden_dim’ is not set, this factor is used to compute it as hidden_dim * mlp_expansion_factor.

drop_path: float = None¶: Probability to drop the attention or MLP module. Defaults to 0.0.

attention_constructor: Literal['dot_product', 'perceiver', 'transolver', 'transolver_plusplus'] = 'dot_product'¶: Constructor of the attention module. Defaults to ‘dot_product’.

layerscale: float | None = None¶: Init scale value to scale layer activations. Defaults to None.

condition_dim: int | None = None¶: Dimension of the conditioning vector. If none, no conditioning is applied. If provided, the transformer block will turn into a Diffusion Transformer (DiT) block.

bias: bool = None¶: Whether to use biases in norm/projections. Defaults to True.

eps: float = None¶: Epsilon Value for the layer nornalization. Defaults to 1e-6.

init_weights: noether.core.types.InitWeightsMode = None¶: Initialization method for the weight matrices of the network. Defaults to “truncnormal002