noether.modeling.modules.blocks.perceiver¶

Classes¶

`PerceiverBlockConfig`	Configuration for the PerceiverBlock module.
`PerceiverBlock`	For a self-attention module, the input tensor for the query, key, and value are the same. The PerceiverBlock,

Module Contents¶

class noether.modeling.modules.blocks.perceiver.PerceiverBlockConfig(/, **data)¶

Bases: noether.modeling.modules.blocks.transformer.TransformerBlockConfig

Configuration for the PerceiverBlock module.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

kv_dim: int | None = None¶: Dimensionality of the key and value representations. Defaults to None. If None, hidden_dim is used.

set_kv_dim()¶

Set kv_dim to hidden_dim if not provided.

Return type:: PerceiverBlockConfig

perceiver_attention_config()¶

Return type:: noether.modeling.modules.attention.PerceiverAttentionConfig

modulation_linear_projection_config()¶

Return type:: noether.modeling.modules.layers.LinearProjectionConfig | None

class noether.modeling.modules.blocks.perceiver.PerceiverBlock(config)¶

Bases: torch.nn.Module

For a self-attention module, the input tensor for the query, key, and value are the same. The PerceiverBlock, takes different input tensors for the query and the key/value.

Parameters:

config (PerceiverBlockConfig) – Configuration of the PerceiverBlock. See PerceiverBlockConfig
options. (for available)

norm1q¶

norm1kv¶

attn¶

ls1¶

drop_path1¶

norm2¶

mlp¶

ls2¶

drop_path2¶

forward(q, kv=None, condition=None, attn_kwargs=None)¶

Forward pass of the PerceiverBlock.

Parameters:

q (torch.Tensor) – Input tensor with shape (batch_size, seqlen/num_tokens, hidden_dim) for the query representations.
kv (torch.Tensor | None) – Input tensor with shape (batch_size, seqlen/num_tokens, hidden_dim) for the key and value representations. Can be None when a kv_cache is provided in attn_kwargs.
condition (torch.Tensor | None) – Conditioning vector. If provided, the attention and MLP will be scaled, shifted and gated feature-wise with predicted values from this vector.
attn_kwargs (dict[str, Any] | None) – Dict with arguments for the attention (such as the attention mask, rope frequencies, or kv_cache). Defaults to None.

Returns:

Tuple of (output_tensor, kv_cache). kv_cache contains cached K/V from the perceiver attention, or None when loading from cache.

Return type:

tuple[torch.Tensor, dict[str, torch.Tensor] | None]