noether.modeling.modules.blocks

Submodules

Classes

PerceiverBlock

For a self-attention module, the input tensor for the query, key, and value are the same. The PerceiverBlock,

TransformerBlock

A transformer block with a single attention layer and a feedforward layer.

Package Contents

class noether.modeling.modules.blocks.PerceiverBlock(config)

Bases: torch.nn.Module

For a self-attention module, the input tensor for the query, key, and value are the same. The PerceiverBlock, takes different input tensors for the query and the key/value.

Parameters:
norm1q
norm1kv
attn
ls1
drop_path1
norm2
mlp
ls2
drop_path2
forward(q, kv, condition=None, attn_kwargs=None)

Forward pass of the PerceiverBlock.

Parameters:
  • q (torch.Tensor) – Input tensor with shape (batch_size, seqlen/num_tokens, hidden_dim) for the query representations.

  • kv (torch.Tensor) – Input tensor with shape (batch_size, seqlen/num_tokens, hidden_dim) for the key and value representations.

  • condition (torch.Tensor | None) – Conditioning vector. If provided, the attention and MLP will be scaled, shifted and gated feature-wise with predicted values from this vector.

  • attn_kwargs (dict[str, Any] | None) – Dict with arguments for the attention (such as the attention mask or rope frequencies). Defaults to None.

Returns:

Tensor after the forward pass of the PerceiverBlock.

Return type:

torch.Tensor

class noether.modeling.modules.blocks.TransformerBlock(config)

Bases: torch.nn.Module

A transformer block with a single attention layer and a feedforward layer.

Parameters:

config (noether.core.schemas.modules.blocks.TransformerBlockConfig) – Configuration for the transformer block. See TransformerBlockConfig for available options.

norm1
attention_block
ls1
drop_path1
norm2
mlp
ls2
drop_path2
forward(x, condition=None, attn_kwargs=None)

Forward pass of the transformer block.

Parameters:
  • x (torch.Tensor) – Input tensor with shape (batch_size, seqlen/num_tokens, hidden_dim).

  • condition (torch.Tensor | None) – Conditioning vector. If provided, the attention and MLP will be scaled, shifted and gated feature-wise with predicted values from this vector.

  • attn_kwargs (dict[str, Any] | None) – Dict with arguments for the attention (such as the attention mask or rope frequencies). Defaults to None.

Returns:

Tensor after the forward pass of the transformer block.

Return type:

torch.Tensor