noether.modeling.modules.attention.anchor_attention¶

Submodules¶

Classes¶

`CrossAnchorAttention`	Anchor attention across branches: each configured branch attends to the anchors of all other branches.
`JointAnchorAttention`	Anchor attention within and across branches: all tokens attend to anchors from all configured branches.
`MultiBranchAnchorAttention`	A base class for multi-branch anchor-based attention modules with shared parameters between branches.
`SelfAnchorAttention`	Anchor attention within branches: each configured branch attends to its own anchors independently.

Package Contents¶

class noether.modeling.modules.attention.anchor_attention.CrossAnchorAttention(config)¶

Bases: noether.modeling.modules.attention.anchor_attention.multi_branch.MultiBranchAnchorAttention

Anchor attention across branches: each configured branch attends to the anchors of all other branches.

For a list of branches (e.g., A, B, C), this creates a pattern, where A attends to (B_anchors + C_anchors), B attends to (A_anchors + C_anchors), etc. It requires all configured branches and their anchors to be present in the input.

Example: all surface tokens attend to volume_anchors and all volume tokens attend to surface_anchors. This is achieved via the following attention patterns:

AttentionPattern(query_tokens=["surface_anchors", "surface_queries"], key_value_tokens=["volume_anchors"])
AttentionPattern(query_tokens=["volume_anchors", "volume_queries"], key_value_tokens=["surface_anchors"])

Parameters:: config (CrossAnchorAttentionConfig) – Configuration for the CrossAnchorAttention module. See CrossAnchorAttentionConfig for the available options.

class noether.modeling.modules.attention.anchor_attention.JointAnchorAttention(config)¶

Bases: noether.modeling.modules.attention.anchor_attention.multi_branch.MultiBranchAnchorAttention

Anchor attention within and across branches: all tokens attend to anchors from all configured branches.

For a list of branches (e.g., A, B, C), this creates a pattern where all tokens (A_anchors, A_queries, B_anchors, B_queries, C_anchors, C_queries) attend to (A_anchors + B_anchors + C_anchors). It requires at least one anchor token to be present in the input.

Example: all tokens attend to (surface_anchors, volume_anchors). This is achieved via the following attention pattern:

AttentionPattern(
    query_tokens=["surface_anchors", "surface_queries", "volume_anchors", "volume_queries"],
    key_value_tokens=["surface_anchors", "volume_anchors"],
)

Parameters:: config (JointAnchorAttentionConfig)

class noether.modeling.modules.attention.anchor_attention.MultiBranchAnchorAttention(config)¶

Bases: torch.nn.Module

A base class for multi-branch anchor-based attention modules with shared parameters between branches.

Anchor attention limits the self-attention to anchor tokens while other tokens use cross-attention. Multiple branches for different modalities use the same linear-projection parameters. This base class provides a common constructor, validation logic, and forward method implementation. Subclasses only need to implement _create_attention_patterns to define their specific attention patterns.

Parameters:: config (noether.core.schemas.modules.attention.AttentionConfig)

mixed_attention¶

branches = None¶

anchor_suffix = None¶

forward(x, token_specs, key_padding_mask=None, freqs=None, kv_cache=None)¶

Apply attention using the patterns defined by the subclass.

Parameters:

x (torch.Tensor)
token_specs (collections.abc.Sequence[noether.core.schemas.modules.attention.TokenSpec])
key_padding_mask (torch.Tensor | None)
freqs (torch.Tensor | None)
kv_cache (dict[str, dict[str, torch.Tensor]] | None)

Return type:

tuple[torch.Tensor, dict[str, dict[str, torch.Tensor]] | None]

class noether.modeling.modules.attention.anchor_attention.SelfAnchorAttention(config)¶

Bases: noether.modeling.modules.attention.anchor_attention.multi_branch.MultiBranchAnchorAttention

Anchor attention within branches: each configured branch attends to its own anchors independently.

For a list of branches (e.g., A, B, C), this creates a pattern where A tokens attend to A_anchors, B tokens attend to B_anchors, and C tokens attend to C_anchors. It requires all configured branches and their anchors to be present in the input.

Example: surface tokens attend to surface_anchors and volume tokens attend to volume_anchors.

This is achieved via the following attention patterns:

AttentionPattern(query_tokens=["surface_anchors", "surface_queries"], key_value_tokens=["surface_anchors"])
AttentionPattern(query_tokens=["volume_anchors", "volume_queries"], key_value_tokens=["volume_anchors"])

Parameters:: config (noether.core.schemas.modules.attention.AttentionConfig)