noether.modeling.modules.attention¶

Submodules¶

Attributes¶

ATTENTION_REGISTRY

Classes¶

`DotProductAttention`	Scaled dot-product attention module.
`DotProductAttentionConfig`	Configuration for the Dot Product attention module.
`PerceiverAttention`	Perceiver style attention module. This module is similar to a cross-attention module.
`PerceiverAttentionConfig`	Configuration for the Perceiver attention module.
`TransolverAttention`	Adapted from https://github.com/thuml/Transolver/blob/main/Car-Design-ShapeNetCar/models/Transolver.py
`TransolverAttentionConfig`	Configuration for the Transolver attention module.
`TransolverPlusPlusAttention`	Transolver++ Attention module as implemented in https://github.com/thuml/Transolver_plus/blob/main/models/Transolver_plus.py
`TransolverPlusPlusAttentionConfig`	Configuration for the Transolver++ attention module.

Package Contents¶

class noether.modeling.modules.attention.DotProductAttention(config)¶

Bases: torch.nn.Module

Scaled dot-product attention module.

Parameters:: config (noether.core.schemas.modules.attention.AttentionConfig) – Configuration for the DotProductAttention module. See AttentionConfig for available options.

num_heads = None¶

head_dim¶

init_weights = None¶

use_rope = None¶

dropout = None¶

proj_dropout¶

q¶

k¶

v¶

proj¶

forward(x, attn_mask=None, freqs=None)¶

Forward function of the DotProductAttention module.

Parameters:

x (torch.Tensor) – Tensor to apply self-attention over, shape (batch size, sequence length, hidden_dim).
attn_mask (torch.Tensor | None) – For causal attention (i.e., no attention over the future token) a attention mask should be provided. Defaults to None.
freqs (torch.Tensor | None) – Frequencies for Rotary Positional Embedding (RoPE) of queries/keys. None if use_rope=False.

Returns:

Returns the output of the attention module.

Return type:

torch.Tensor

class noether.modeling.modules.attention.DotProductAttentionConfig(/, **data)¶

Bases: noether.core.schemas.modules.attention.AttentionConfig

Configuration for the Dot Product attention module.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

class noether.modeling.modules.attention.PerceiverAttention(config)¶

Bases: torch.nn.Module

Perceiver style attention module. This module is similar to a cross-attention module.

Supports KV caching: when kv_cache is provided, the projected K/V tensors (with RoPE already applied) are loaded from the cache instead of being recomputed from kv.

Parameters:: config (noether.core.schemas.modules.attention.AttentionConfig) – Configuration for the PerceiverAttention module. See AttentionConfig for available options.

num_heads = None¶

head_dim¶

init_weights = None¶

use_rope = None¶

k¶

v¶

q¶

proj¶

dropout = None¶

proj_dropout¶

forward(q, kv=None, attn_mask=None, q_freqs=None, k_freqs=None, kv_cache=None)¶

Forward function of the PerceiverAttention module.

Parameters:

q (torch.Tensor) – Query tensor, shape (batch size, number of points/tokens, hidden_dim).
kv (torch.Tensor | None) – Key/value tensor, shape (batch size, number of latent tokens, kv_dim). Can be None when kv_cache is provided.
attn_mask (torch.Tensor | None) – When applying causal attention, an attention mask is required. Defaults to None.
q_freqs (torch.Tensor | None) – Frequencies for Rotary Positional Embedding (RoPE) of queries. None if use_rope=False.
k_freqs (torch.Tensor | None) – Frequencies for Rotary Positional Embedding (RoPE) of keys. None if use_rope=False. Not needed when loading from kv_cache (RoPE was already applied).
kv_cache (dict[str, torch.Tensor] | None) – Cached K/V tensors from a previous forward pass. Structure: {"k": tensor, "v": tensor}. When provided, kv and k_freqs are ignored.

Returns:

Tuple of (output, new_kv_cache).

Return type:

tuple[torch.Tensor, dict[str, torch.Tensor] | None]

class noether.modeling.modules.attention.PerceiverAttentionConfig(/, **data)¶

Bases: noether.core.schemas.modules.attention.AttentionConfig

Configuration for the Perceiver attention module.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

kv_dim: int | None = None¶: Dimensionality of the key/value features. If None, use hidden_dim.

set_kv_dim()¶

class noether.modeling.modules.attention.TransolverAttention(config)¶

Bases: torch.nn.Module

Adapted from https://github.com/thuml/Transolver/blob/main/Car-Design-ShapeNetCar/models/Transolver.py - Readable reshaping operations via einops - Merged qkv linear layer for higher GPU utilization - F.scaled_dot_product_attention instead of slow pytorch attention - Possibility to mask tokens (required to process variable sized inputs)

Parameters:: config (noether.core.schemas.modules.attention.AttentionConfig) – Configuration for the Transolver attention module. See AttentionConfig for available options.

num_heads = None¶

dropout = None¶

temperature¶

in_project_x¶

in_project_fx¶

in_project_slice¶

q¶

k¶

v¶

proj¶

proj_dropout¶

create_slices(x, num_input_points, attn_mask=None)¶

Given a set of points, project them to a fixed number of slices using the computed the slice weights per token.

Parameters:

x (torch.Tensor) – Input tensor with shape (batch_size, num_input_points, hidden_dim).
num_input_points (int) – Number of input points.
attn_mask (torch.Tensor | None) – Mask to mask out certain token for the attention. Defaults to None.

Returns:

Tensor with the projected slice tokens and the slice weights.

forward(x, attn_mask=None)¶

Forward pass of the Transolver attention module.

Parameters:

x (torch.Tensor) – Input tensor with shape (batch_size, seqlen, hidden_dim).
attn_mask (torch.Tensor | None) – Attention mask tensor with shape (batch_size). Defaults to None.

Returns:

Tensor after applying the transolver attention mechanism.

class noether.modeling.modules.attention.TransolverAttentionConfig(/, **data)¶

Bases: noether.core.schemas.modules.attention.AttentionConfig

Configuration for the Transolver attention module.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

num_slices: int = None¶: Number of slices to project the input tokens to.

class noether.modeling.modules.attention.TransolverPlusPlusAttention(config)¶

Bases: torch.nn.Module

Transolver++ Attention module as implemented in https://github.com/thuml/Transolver_plus/blob/main/models/Transolver_plus.py

Parameters:: config (noether.core.schemas.modules.attention.AttentionConfig) – Configuration for the TransolverPlusPlusAttention module. See AttentionConfig for available options.

dim_head¶

num_heads = None¶

scale¶

softmax¶

dropout = None¶

bias¶

proj_temperature¶

in_project_x¶

in_project_slice¶

q¶

k¶

v¶

to_out¶

forward(x, attn_mask=None)¶

Forward pass of the Transolver attention module.

Parameters:

x (torch.Tensor) – Input tensor with shape (batch_size, seqlen, hidden_dim).
attn_mask (torch.Tensor | None) – Attention mask tensor with shape (batch_size). Defaults to None.

Returns:

Tensor after applying the transolver attention mechanism.

class noether.modeling.modules.attention.TransolverPlusPlusAttentionConfig(/, **data)¶

Bases: noether.modeling.modules.attention.TransolverAttentionConfig

Configuration for the Transolver++ attention module.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

use_overparameterization: bool = None¶: Whether to use overparameterization for the slice projection.

use_adaptive_temperature: bool = None¶: Whether to use an adaptive temperature for the slice selection.

temperature_activation: Literal['sigmoid', 'softplus', 'exp'] | None = None¶: Activation function for the adaptive temperature.

use_gumbel_softmax: bool = None¶: Whether to use Gumbel-Softmax for the slice selection.

noether.modeling.modules.attention.ATTENTION_REGISTRY: dict[str, type[torch.nn.Module]]¶