noether.modeling.modules.blocks

Submodules

Classes

PerceiverBlock

For a self-attention module, the input tensor for the query, key, and value are the same. The PerceiverBlock,

PerceiverBlockConfig

Configuration for the PerceiverBlock module.

TransformerBlock

A transformer block with a single attention layer and a feedforward layer.

TransformerBlockConfig

Configuration for a transformer block.

Package Contents

class noether.modeling.modules.blocks.PerceiverBlock(config)

Bases: torch.nn.Module

For a self-attention module, the input tensor for the query, key, and value are the same. The PerceiverBlock, takes different input tensors for the query and the key/value.

Parameters:
  • config (PerceiverBlockConfig) – Configuration of the PerceiverBlock. See PerceiverBlockConfig

  • options. (for available)

norm1q
norm1kv
attn
ls1
drop_path1
norm2
mlp
ls2
drop_path2
forward(q, kv=None, condition=None, attn_kwargs=None)

Forward pass of the PerceiverBlock.

Parameters:
  • q (torch.Tensor) – Input tensor with shape (batch_size, seqlen/num_tokens, hidden_dim) for the query representations.

  • kv (torch.Tensor | None) – Input tensor with shape (batch_size, seqlen/num_tokens, hidden_dim) for the key and value representations. Can be None when a kv_cache is provided in attn_kwargs.

  • condition (torch.Tensor | None) – Conditioning vector. If provided, the attention and MLP will be scaled, shifted and gated feature-wise with predicted values from this vector.

  • attn_kwargs (dict[str, Any] | None) – Dict with arguments for the attention (such as the attention mask, rope frequencies, or kv_cache). Defaults to None.

Returns:

Tuple of (output_tensor, kv_cache). kv_cache contains cached K/V from the perceiver attention, or None when loading from cache.

Return type:

tuple[torch.Tensor, dict[str, torch.Tensor] | None]

class noether.modeling.modules.blocks.PerceiverBlockConfig(/, **data)

Bases: noether.modeling.modules.blocks.transformer.TransformerBlockConfig

Configuration for the PerceiverBlock module.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

kv_dim: int | None = None

Dimensionality of the key and value representations. Defaults to None. If None, hidden_dim is used.

set_kv_dim()

Set kv_dim to hidden_dim if not provided.

Return type:

PerceiverBlockConfig

perceiver_attention_config()
Return type:

noether.modeling.modules.attention.PerceiverAttentionConfig

modulation_linear_projection_config()
Return type:

noether.modeling.modules.layers.LinearProjectionConfig | None

class noether.modeling.modules.blocks.TransformerBlock(config)

Bases: torch.nn.Module

A transformer block with a single attention layer and a feedforward layer.

Parameters:

config (TransformerBlockConfig) – Configuration for the transformer block. See TransformerBlockConfig for available options.

config
norm1
attention_block
ls1
drop_path1
norm2
mlp
ls2
drop_path2
forward(x, condition=None, attn_kwargs=None)

Forward pass of the transformer block.

Parameters:
  • x (torch.Tensor) – Input tensor with shape (batch_size, seqlen/num_tokens, hidden_dim).

  • condition (torch.Tensor | None) – Conditioning vector. If provided, the attention and MLP will be scaled, shifted and gated feature-wise with predicted values from this vector.

  • attn_kwargs (dict[str, Any] | None) – Dict with arguments for the attention (such as the attention mask or rope frequencies). Defaults to None.

Returns:

Tuple of (output_tensor, kv_cache). kv_cache is None when the attention module does not return a cache (e.g. standard DotProductAttention).

Return type:

tuple[torch.Tensor, dict[str, dict[str, torch.Tensor]] | None]

class noether.modeling.modules.blocks.TransformerBlockConfig(/, **data)

Bases: pydantic.BaseModel

Configuration for a transformer block.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

hidden_dim: int = None

Hidden Dimension of the transformer block.

num_heads: int = None

Number of attention heads.

mlp_hidden_dim: int | None = None

Hidden dimension of the MLP layer. If set to None, the mlp_hidden dim is set to hidden_dim * mlp_expansion_factor in the TransformerConfig. If both are None, an error is raised.

mlp_expansion_factor: int | None = None

Expansion factor for the MLP hidden dimension relative to the hidden dimension. If ‘mlp_hidden_dim’ is not set, this factor is used to compute it as hidden_dim * mlp_expansion_factor.

drop_path: float = None

Probability to drop the attention or MLP module. Defaults to 0.0.

attention_constructor: Literal['dot_product', 'perceiver', 'transolver', 'transolver_plusplus'] = 'dot_product'

Constructor of the attention module. Defaults to ‘dot_product’.

layerscale: float | None = None

Init scale value to scale layer activations. Defaults to None.

condition_dim: int | None = None

Dimension of the conditioning vector. If none, no conditioning is applied. If provided, the transformer block will turn into a Diffusion Transformer (DiT) block.

bias: bool = None

Whether to use biases in norm/projections. Defaults to True.

eps: float = None

Epsilon Value for the layer nornalization. Defaults to 1e-6.

init_weights: noether.core.types.InitWeightsMode = None

Initialization method for the weight matrices of the network. Defaults to “truncnormal002

use_rope: bool = None

Whether to use Rotary Positional Embeddings (RoPE).

max_wavelength: int | None = None

10_000

Type:

Theta parameter for the transformer sine/cosine embedding. Default

attention_arguments: dict

Additional arguments for the attention module that are only needed for a specific attention implementation.

set_mlp_hidden_dim()
set_wavelength_for_rope()
linear_projection_config()
Return type:

noether.modeling.modules.layers.linear_projection.LinearProjectionConfig

layerscale_config()
Return type:

noether.modeling.modules.layers.layer_scale.LayerScaleConfig

drop_path_config()
Return type:

noether.modeling.modules.layers.drop_path.UnquantizedDropPathConfig

modulation_linear_projection_config()
Return type:

LinearProjectionConfig | None

up_act_down_mlp_config()
Return type:

noether.modeling.modules.mlp.upactdown_mlp.UpActDownMLPConfig