noether.core.schemas.modules.blocks

Classes

TransformerBlockConfig

Configuration for a transformer block.

PerceiverBlockConfig

Configuration for the PerceiverBlock module.

Module Contents

class noether.core.schemas.modules.blocks.TransformerBlockConfig(/, **data)

Bases: pydantic.BaseModel

Configuration for a transformer block.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

hidden_dim: int = None

Hidden Dimension of the transformer block.

num_heads: int = None

Number of attention heads.

mlp_hidden_dim: int | None = None

Hidden dimension of the MLP layer. If set to None, the mlp_hidden dim is set to hidden_dim * mlp_expansion_factor in the TransformerConfig. If both are None, an error is raised.

mlp_expansion_factor: int | None = None

Expansion factor for the MLP hidden dimension relative to the hidden dimension. If ‘mlp_hidden_dim’ is not set, this factor is used to compute it as hidden_dim * mlp_expansion_factor.

drop_path: float = None

Probability to drop the attention or MLP module. Defaults to 0.0.

normalization_constructor: type

Constructor for the normalization layer.

attention_constructor: Literal['dot_product', 'perceiver', 'transolver', 'transolver_plusplus'] = 'dot_product'

Constructor of the attention module. Defaults to ‘dot_product’.

layerscale: float | None = None

Init scale value to scale layer activations. Defaults to None.

condition_dim: int | None = None

Dimension of the conditioning vector. If none, no conditioning is applied. If provided, the transformer block will turn into a Diffusion Transformer (DiT) block.

bias: bool = None

Whether to use biases in norm/projections. Defaults to True.

eps: float = None

Epsilon Value for the layer nornalization. Defaults to 1e-6.

init_weights: noether.core.types.InitWeightsMode = None

Initialization method for the weight matrices of the network. Defaults to “truncnormal002

use_rope: bool = None

Whether to use Rotary Positional Embeddings (RoPE).

attention_arguments: dict

Additional arguments for the attention module that are only needed for a specific attention implementation.

set_mlp_hidden_dim()
class noether.core.schemas.modules.blocks.PerceiverBlockConfig(/, **data)

Bases: TransformerBlockConfig

Configuration for the PerceiverBlock module.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

kv_dim: int | None = None

Dimensionality of the key and value representations. Defaults to None. If None, hidden_dim is used.

set_kv_dim()

Set kv_dim to hidden_dim if not provided.

Return type:

PerceiverBlockConfig