noether.modeling.modules.blocks.perceiver¶
Classes¶
For a self-attention module, the input tensor for the query, key, and value are the same. The PerceiverBlock, |
Module Contents¶
- class noether.modeling.modules.blocks.perceiver.PerceiverBlock(config)¶
Bases:
torch.nn.ModuleFor a self-attention module, the input tensor for the query, key, and value are the same. The PerceiverBlock, takes different input tensors for the query and the key/value.
- Parameters:
config (noether.core.schemas.modules.blocks.PerceiverBlockConfig) – Configuration of the PerceiverBlock. See
PerceiverBlockConfigoptions. (for available)
- norm1q¶
- norm1kv¶
- attn¶
- ls1¶
- drop_path1¶
- norm2¶
- mlp¶
- ls2¶
- drop_path2¶
- forward(q, kv=None, condition=None, attn_kwargs=None)¶
Forward pass of the PerceiverBlock.
- Parameters:
q (torch.Tensor) – Input tensor with shape (batch_size, seqlen/num_tokens, hidden_dim) for the query representations.
kv (torch.Tensor | None) – Input tensor with shape (batch_size, seqlen/num_tokens, hidden_dim) for the key and value representations. Can be
Nonewhen akv_cacheis provided inattn_kwargs.condition (torch.Tensor | None) – Conditioning vector. If provided, the attention and MLP will be scaled, shifted and gated feature-wise with predicted values from this vector.
attn_kwargs (dict[str, Any] | None) – Dict with arguments for the attention (such as the attention mask, rope frequencies, or kv_cache). Defaults to None.
- Returns:
Tuple of (output_tensor, kv_cache).
kv_cachecontains cached K/V from the perceiver attention, orNonewhen loading from cache.- Return type:
tuple[torch.Tensor, dict[str, torch.Tensor] | None]