noether.modeling.modules.attention.dot_product¶

Classes¶

Scaled dot-product attention module.

class noether.modeling.modules.attention.dot_product.DotProductAttention(config)¶

Scaled dot-product attention module.

Parameters:: config (noether.core.schemas.modules.AttentionConfig) – Configuration for the DotProductAttention module. See AttentionConfig for available options.

forward(x, attn_mask=None, freqs=None)¶

Forward function of the DotProductAttention module.

Parameters:

x (torch.Tensor) – Tensor to apply self-attention over, shape (batch size, sequence length, hidden_dim).
attn_mask (torch.Tensor | None) – For causal attention (i.e., no attention over the future token) a attention mask should be provided. Defaults to None.
freqs (torch.Tensor | None) – Frequencies for Rotary Positional Embedding (RoPE) of queries/keys. None if use_rope=False.

Returns:

Returns the output of the attention module.

Return type:

torch.Tensor