noether.modeling.modules.attention.dot_product¶
Classes¶
Scaled dot-product attention module. |
Module Contents¶
- class noether.modeling.modules.attention.dot_product.DotProductAttention(config)¶
Bases:
torch.nn.ModuleScaled dot-product attention module.
Initialize the DotProductAttention module.
- Parameters:
config (noether.core.schemas.modules.AttentionConfig) – configuration of the attention module.
- num_heads = None¶
- head_dim¶
- init_weights = None¶
- use_rope = None¶
- dropout = None¶
- proj_dropout¶
- qkv¶
- proj¶
- forward(x, attn_mask=None, freqs=None)¶
Forward function of the DotProductAttention module.
- Parameters:
x (torch.Tensor) – Tensor to apply self-attention over, shape (batch size, sequence length, hidden_dim).
attn_mask (torch.Tensor | None) – For causal attention (i.e., no attention over the future token) a attention mask should be provided. Defaults to None.
freqs (torch.Tensor | None) – Frequencies for Rotary Positional Embedding (RoPE) of queries/keys. None if use_rope=False.
- Returns:
Returns the output of the attention module.
- Return type:
torch.Tensor