noether.core.schemas.modules.attention

Base attention configs and back-compat re-exports for moved attention configs.

The base configs (AttentionConfig, TokenSpec, AttentionPattern) have no matching class and stay here. The concrete attention configs have moved next to their matching classes in noether.modeling.modules.attention; they are re-exported here for backward compatibility.

Classes

AttentionConfig

Configuration for an attention module.

TokenSpec

Specification for a token type in the attention mechanism.

AttentionPattern

Defines which tokens attend to which other tokens.

Module Contents

class noether.core.schemas.modules.attention.AttentionConfig(/, **data)

Bases: pydantic.BaseModel

Configuration for an attention module. Since we can have many different attention implementations, we allow extra fields. such that we can use the same schema for all attention modules.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

model_config

Configuration for an attention module.

hidden_dim: int = None

Dimensionality of the hidden features.

num_heads: int = None

Number of attention heads.

use_rope: bool = None

Whether to use Rotary Positional Embeddings (RoPE).

dropout: float = None

Dropout rate for the attention weights and output projection.

init_weights: noether.core.types.InitWeightsMode = None

Weight initialization strategy.

bias: bool = None

Whether to use bias terms in linear layers.

head_dim: int | None = None

Dimensionality of each attention head.

qk_norm: bool = None

Whether to apply layer normalization to the query and key features before computing attention scores.

validate_hidden_dim_and_num_heads()
class noether.core.schemas.modules.attention.TokenSpec(/, **data)

Bases: pydantic.BaseModel

Specification for a token type in the attention mechanism.

When size is None, the token group is not present in the input tensor and its key/value representations will be loaded from a KV cache instead.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

name: str
size: int | None = None
classmethod from_dict(token_dict)

Create TokenSpec from dictionary with single key-value pair.

Parameters:

token_dict (dict[str, int | None])

Return type:

TokenSpec

to_dict()

Convert TokenSpec to dictionary.

Return type:

dict[str, int | None]

property domain: str

Extract token domain from the name (e.g., “surface” from “surface_anchors”).

Return type:

str

property attn_type: str

Extract attention type from the name (e.g., “anchors” from “surface_anchors”).

Return type:

str

class noether.core.schemas.modules.attention.AttentionPattern(/, **data)

Bases: pydantic.BaseModel

Defines which tokens attend to which other tokens.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

query_tokens: collections.abc.Sequence[str]
key_value_tokens: collections.abc.Sequence[str]