noether.modeling.modules

Submodules

Classes

Activation

Supported activation functions.

DotProductAttention

Scaled dot-product attention module.

PerceiverAttention

Perceiver style attention module. This module is similar to a cross-attention module.

TransolverAttention

Adapted from https://github.com/thuml/Transolver/blob/main/Car-Design-ShapeNetCar/models/Transolver.py

PerceiverBlock

For a self-attention module, the input tensor for the query, key, and value are the same. The PerceiverBlock,

PerceiverBlockConfig

Configuration for the PerceiverBlock module.

TransformerBlock

A transformer block with a single attention layer and a feedforward layer.

TransformerBlockConfig

Configuration for a transformer block.

DeepPerceiverDecoder

A deep Perceiver decoder module. Can be configured with different number of layers and hidden dimensions.

SupernodePooling

Supernode pooling layer.

SupernodePoolingConfig

ContinuousSincosEmbed

Embedding layer for continuous coordinates using sine and cosine functions.

LayerScale

LayerScale module scales the input tensor by a learnable parameter gamma.

LinearProjection

LinearProjection is a linear projection layer that can be used for 1D, 2D, and 3D data.

UnquantizedDropPath

Unquantized drop path (Stochastic Depth, https://arxiv.org/abs/1603.09382) per sample. Unquantized means

MLP

Implements a Multi-Layer Perceptron (MLP) with configurable number of layers, hidden dimension activation functions and weight initialization methods.

UpActDownMlp

UpActDownMlp is a vanilla MLP with an up-projection followed by an GELU activation function and a

Package Contents

class noether.modeling.modules.Activation(*args, **kwds)

Bases: enum.Enum

Supported activation functions.

GELU
TANH
SIGMOID
RELU
LEAKY_RELU
SOFTPLUS
ELU
SILU
build()

Create a new instance of the activation module.

Return type:

torch.nn.Module

class noether.modeling.modules.DotProductAttention(config)

Bases: torch.nn.Module

Scaled dot-product attention module.

Parameters:

config (noether.core.schemas.modules.attention.AttentionConfig) – Configuration for the DotProductAttention module. See AttentionConfig for available options.

num_heads = None
head_dim
init_weights = None
use_rope = None
dropout = None
proj_dropout
q
k
v
proj
forward(x, attn_mask=None, freqs=None)

Forward function of the DotProductAttention module.

Parameters:
  • x (torch.Tensor) – Tensor to apply self-attention over, shape (batch size, sequence length, hidden_dim).

  • attn_mask (torch.Tensor | None) – For causal attention (i.e., no attention over the future token) a attention mask should be provided. Defaults to None.

  • freqs (torch.Tensor | None) – Frequencies for Rotary Positional Embedding (RoPE) of queries/keys. None if use_rope=False.

Returns:

Returns the output of the attention module.

Return type:

torch.Tensor

class noether.modeling.modules.PerceiverAttention(config)

Bases: torch.nn.Module

Perceiver style attention module. This module is similar to a cross-attention module.

Supports KV caching: when kv_cache is provided, the projected K/V tensors (with RoPE already applied) are loaded from the cache instead of being recomputed from kv.

Parameters:

config (noether.core.schemas.modules.attention.AttentionConfig) – Configuration for the PerceiverAttention module. See AttentionConfig for available options.

num_heads = None
head_dim
init_weights = None
use_rope = None
k
v
q
proj
dropout = None
proj_dropout
forward(q, kv=None, attn_mask=None, q_freqs=None, k_freqs=None, kv_cache=None)

Forward function of the PerceiverAttention module.

Parameters:
  • q (torch.Tensor) – Query tensor, shape (batch size, number of points/tokens, hidden_dim).

  • kv (torch.Tensor | None) – Key/value tensor, shape (batch size, number of latent tokens, kv_dim). Can be None when kv_cache is provided.

  • attn_mask (torch.Tensor | None) – When applying causal attention, an attention mask is required. Defaults to None.

  • q_freqs (torch.Tensor | None) – Frequencies for Rotary Positional Embedding (RoPE) of queries. None if use_rope=False.

  • k_freqs (torch.Tensor | None) – Frequencies for Rotary Positional Embedding (RoPE) of keys. None if use_rope=False. Not needed when loading from kv_cache (RoPE was already applied).

  • kv_cache (dict[str, torch.Tensor] | None) – Cached K/V tensors from a previous forward pass. Structure: {"k": tensor, "v": tensor}. When provided, kv and k_freqs are ignored.

Returns:

Tuple of (output, new_kv_cache).

Return type:

tuple[torch.Tensor, dict[str, torch.Tensor] | None]

class noether.modeling.modules.TransolverAttention(config)

Bases: torch.nn.Module

Adapted from https://github.com/thuml/Transolver/blob/main/Car-Design-ShapeNetCar/models/Transolver.py - Readable reshaping operations via einops - Merged qkv linear layer for higher GPU utilization - F.scaled_dot_product_attention instead of slow pytorch attention - Possibility to mask tokens (required to process variable sized inputs)

Parameters:

config (noether.core.schemas.modules.attention.AttentionConfig) – Configuration for the Transolver attention module. See AttentionConfig for available options.

num_heads = None
dropout = None
temperature
in_project_x
in_project_fx
in_project_slice
q
k
v
proj
proj_dropout
create_slices(x, num_input_points, attn_mask=None)

Given a set of points, project them to a fixed number of slices using the computed the slice weights per token.

Parameters:
  • x (torch.Tensor) – Input tensor with shape (batch_size, num_input_points, hidden_dim).

  • num_input_points (int) – Number of input points.

  • attn_mask (torch.Tensor | None) – Mask to mask out certain token for the attention. Defaults to None.

Returns:

Tensor with the projected slice tokens and the slice weights.

forward(x, attn_mask=None)

Forward pass of the Transolver attention module.

Parameters:
  • x (torch.Tensor) – Input tensor with shape (batch_size, seqlen, hidden_dim).

  • attn_mask (torch.Tensor | None) – Attention mask tensor with shape (batch_size). Defaults to None.

Returns:

Tensor after applying the transolver attention mechanism.

class noether.modeling.modules.PerceiverBlock(config)

Bases: torch.nn.Module

For a self-attention module, the input tensor for the query, key, and value are the same. The PerceiverBlock, takes different input tensors for the query and the key/value.

Parameters:
  • config (PerceiverBlockConfig) – Configuration of the PerceiverBlock. See PerceiverBlockConfig

  • options. (for available)

norm1q
norm1kv
attn
ls1
drop_path1
norm2
mlp
ls2
drop_path2
forward(q, kv=None, condition=None, attn_kwargs=None)

Forward pass of the PerceiverBlock.

Parameters:
  • q (torch.Tensor) – Input tensor with shape (batch_size, seqlen/num_tokens, hidden_dim) for the query representations.

  • kv (torch.Tensor | None) – Input tensor with shape (batch_size, seqlen/num_tokens, hidden_dim) for the key and value representations. Can be None when a kv_cache is provided in attn_kwargs.

  • condition (torch.Tensor | None) – Conditioning vector. If provided, the attention and MLP will be scaled, shifted and gated feature-wise with predicted values from this vector.

  • attn_kwargs (dict[str, Any] | None) – Dict with arguments for the attention (such as the attention mask, rope frequencies, or kv_cache). Defaults to None.

Returns:

Tuple of (output_tensor, kv_cache). kv_cache contains cached K/V from the perceiver attention, or None when loading from cache.

Return type:

tuple[torch.Tensor, dict[str, torch.Tensor] | None]

class noether.modeling.modules.PerceiverBlockConfig(/, **data)

Bases: noether.modeling.modules.blocks.transformer.TransformerBlockConfig

Configuration for the PerceiverBlock module.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

kv_dim: int | None = None

Dimensionality of the key and value representations. Defaults to None. If None, hidden_dim is used.

set_kv_dim()

Set kv_dim to hidden_dim if not provided.

Return type:

PerceiverBlockConfig

perceiver_attention_config()
Return type:

noether.modeling.modules.attention.PerceiverAttentionConfig

modulation_linear_projection_config()
Return type:

noether.modeling.modules.layers.LinearProjectionConfig | None

class noether.modeling.modules.TransformerBlock(config)

Bases: torch.nn.Module

A transformer block with a single attention layer and a feedforward layer.

Parameters:

config (TransformerBlockConfig) – Configuration for the transformer block. See TransformerBlockConfig for available options.

config
norm1
attention_block
ls1
drop_path1
norm2
mlp
ls2
drop_path2
forward(x, condition=None, attn_kwargs=None)

Forward pass of the transformer block.

Parameters:
  • x (torch.Tensor) – Input tensor with shape (batch_size, seqlen/num_tokens, hidden_dim).

  • condition (torch.Tensor | None) – Conditioning vector. If provided, the attention and MLP will be scaled, shifted and gated feature-wise with predicted values from this vector.

  • attn_kwargs (dict[str, Any] | None) – Dict with arguments for the attention (such as the attention mask or rope frequencies). Defaults to None.

Returns:

Tuple of (output_tensor, kv_cache). kv_cache is None when the attention module does not return a cache (e.g. standard DotProductAttention).

Return type:

tuple[torch.Tensor, dict[str, dict[str, torch.Tensor]] | None]

class noether.modeling.modules.TransformerBlockConfig(/, **data)

Bases: pydantic.BaseModel

Configuration for a transformer block.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

hidden_dim: int = None

Hidden Dimension of the transformer block.

num_heads: int = None

Number of attention heads.

mlp_hidden_dim: int | None = None

Hidden dimension of the MLP layer. If set to None, the mlp_hidden dim is set to hidden_dim * mlp_expansion_factor in the TransformerConfig. If both are None, an error is raised.

mlp_expansion_factor: int | None = None

Expansion factor for the MLP hidden dimension relative to the hidden dimension. If ‘mlp_hidden_dim’ is not set, this factor is used to compute it as hidden_dim * mlp_expansion_factor.

drop_path: float = None

Probability to drop the attention or MLP module. Defaults to 0.0.

attention_constructor: Literal['dot_product', 'perceiver', 'transolver', 'transolver_plusplus'] = 'dot_product'

Constructor of the attention module. Defaults to ‘dot_product’.

layerscale: float | None = None

Init scale value to scale layer activations. Defaults to None.

condition_dim: int | None = None

Dimension of the conditioning vector. If none, no conditioning is applied. If provided, the transformer block will turn into a Diffusion Transformer (DiT) block.

bias: bool = None

Whether to use biases in norm/projections. Defaults to True.

eps: float = None

Epsilon Value for the layer nornalization. Defaults to 1e-6.

init_weights: noether.core.types.InitWeightsMode = None

Initialization method for the weight matrices of the network. Defaults to “truncnormal002

use_rope: bool = None

Whether to use Rotary Positional Embeddings (RoPE).

max_wavelength: int | None = None

10_000

Type:

Theta parameter for the transformer sine/cosine embedding. Default

attention_arguments: dict

Additional arguments for the attention module that are only needed for a specific attention implementation.

set_mlp_hidden_dim()
set_wavelength_for_rope()
linear_projection_config()
Return type:

noether.modeling.modules.layers.linear_projection.LinearProjectionConfig

layerscale_config()
Return type:

noether.modeling.modules.layers.layer_scale.LayerScaleConfig

drop_path_config()
Return type:

noether.modeling.modules.layers.drop_path.UnquantizedDropPathConfig

modulation_linear_projection_config()
Return type:

LinearProjectionConfig | None

up_act_down_mlp_config()
Return type:

noether.modeling.modules.mlp.upactdown_mlp.UpActDownMLPConfig

class noether.modeling.modules.DeepPerceiverDecoder(config)

Bases: torch.nn.Module

A deep Perceiver decoder module. Can be configured with different number of layers and hidden dimensions. However, it should be noted that this layer is not a full-fledged Perceiver, since it only has a cross-attention mechanism.

Parameters:

config (DeepPerceiverDecoderConfig) – Configuration for the DeepPerceiverDecoder module. See DeepPerceiverDecoderConfig for available options.

blocks
forward(kv, queries, attn_kwargs=None, condition=None)

Forward pass of the model.

Parameters:
  • kv (torch.Tensor) – The key-value tensor (batch_size, num_latent_tokens, dim).

  • queries (torch.Tensor) – The query tensor (batch_size, num_output_queries, dim).

  • attn_kwargs (dict[str, Any] | None) – Dict with arguments for the attention (such as the attention mask or rope frequencies). Defaults to None.

  • condition (torch.Tensor | None) – Optional conditioning tensor that can be used in the attention mechanism. This can be used to pass additional conditioning information, etc.

Returns:

The predictions as sparse tensor (batch_size * num_output_pos, num_out_values).

Return type:

torch.Tensor

class noether.modeling.modules.SupernodePooling(config)

Bases: torch.nn.Module

Supernode pooling layer.

The permutation of the supernodes is preserved through the message passing (contrary to the (GP-)UPT code). Additionally, radius is used instead of radius_graph, which is more efficient.

Initialize the SupernodePooling.

Parameters:

config (SupernodePoolingConfig) – Configuration for the SupernodePooling module. See SupernodePoolingConfig for available options.

radius
k
max_degree
spool_pos_mode
readd_supernode_pos
aggregation
input_features_dim
pos_embed
output_dim
compute_src_and_dst_indices(input_pos, supernode_idx, batch_idx=None)

Compute the source and destination indices for the message passing to the supernodes.

Parameters:
  • input_pos (torch.Tensor) – Sparse tensor with shape (batch_size * number of points, 3), representing the input geometries.

  • supernode_idx (torch.Tensor) – Indexes of the supernodes in the sparse tensor input_pos.

  • batch_idx (torch.Tensor | None) – 1D tensor, containing the batch index of each entry in input_pos. Default None.

Returns:

Tuple of (src_idx, dst_idx, local_dst_idx) where src_idx and dst_idx are absolute indices into input_pos and local_dst_idx is a 0-indexed position into supernode_idx (used for scatter_reduce_).

Return type:

tuple[torch.Tensor, torch.Tensor, torch.Tensor]

create_messages(input_pos, src_idx, dst_idx, supernode_idx, input_features=None)

Create messages for the message passing to the supernodes, based on different positional encoding representations.

Parameters:
  • input_pos (torch.Tensor) – Tensor of shape (batch_size * number_of_points_per_sample, {2,3}), representing the point cloud representation of the input geometry.

  • src_idx (torch.Tensor) – Index of the source nodes from input_pos.

  • dst_idx (torch.Tensor) – Source index of the destination nodes from input_pos tensor. These indexes should be the matching supernode indexes.

  • supernode_idx (torch.Tensor) – Indexes of the node in input_pos that are considered supernodes.

  • input_features (torch.Tensor | None)

Raises:

NotImplementedError – Raised if the mode is not implemented. Either “abspos”, “relpos” or “absrelpos” are allowed.

Returns:

Tensor with messages for the message passing into the super nodes and the embedding coordinates of the

supernodes.

Return type:

tuple[torch.Tensor, torch.Tensor]

accumulate_messages(x, local_dst_idx, supernode_idx)

Method to accumulate the messages of neighbouring points into the supernodes.

Parameters:
  • x (torch.Tensor) – Tensor containing the message representation of each neighbour representation.

  • local_dst_idx (torch.Tensor) – 0-indexed position into supernode_idx for each message (no CUDA sync).

  • supernode_idx (torch.Tensor) – Indexes of the supernode in the input point cloud.

Returns:

Tensor with the aggregated messages for each supernode.

Return type:

torch.Tensor

forward(input_pos, supernode_idx, batch_idx=None, input_features=None)

Forward pass of the supernode pooling layer.

Parameters:
  • input_pos (torch.Tensor) – Sparse tensor with shape (batch_size * number_of_points_per_sample, 3), representing the point cloud representation of the input geometry.

  • supernode_idx (torch.Tensor) – indexes of the supernodes in the sparse tensor input_pos.

  • batch_idx (torch.Tensor | None) – 1D tensor, containing the batch index of each entry in input_pos. Default None.

  • input_features (torch.Tensor | None) – Sparse tensor with shape (batch_size * number_of_points_per_sample, number_of_features)

Returns:

Tensor with the aggregated messages for each supernode.

Return type:

torch.Tensor | dict[str, torch.Tensor]

class noether.modeling.modules.SupernodePoolingConfig(/, **data)

Bases: pydantic.BaseModel

Parameters:

data (Any)

hidden_dim: int = None

Hidden dimension for positional embeddings, messages and the resulting output vector.

input_dim: int = None

Number of positional dimension (e.g., input_dim=2 for a 2D position, input_dim=3 for a 3D position)

radius: float | None = None

Radius around each supernode. From points within this radius, messages are passed to the supernode.

k: int | None = None

Number of neighbors for each supernode. From the k-NN points, messages are passed to the supernode.

max_degree: int = None

Maximum degree of the radius graph. Defaults to 32.

spool_pos_mode: Literal['abspos', 'relpos', 'absrelpos'] = None

absolute space (“abspos”), relative space (“relpos”) or both (“absrelpos”).

Type:

Type of position embedding

init_weights: noether.core.types.InitWeightsMode = None

Weight initialization of linear layers. Defaults to “truncnormal002”.

readd_supernode_pos: bool = None

If true, the absolute positional encoding of the supernode is concatenated to the supernode vector after message passing and linearly projected back to hidden_dim. Defaults to True.

aggregation: Literal['mean', 'sum'] = None

Aggregation for message passing (“mean” or “sum”).

message_mode: Literal['mlp', 'linear', 'identity'] = None

How messages are created. “mlp” (2 layer MLP), “linear” (nn.Linear), “identity” (nn.Identity). Defaults to “mlp”.

input_features_dim: int | None = None

Number of input features per point. None will fall back to a version without features. Defaults to None, which means no input features.

bias: bool = None

Whether to use bias in the linear layers. Defaults to True.

validate_radius_and_k()
class noether.modeling.modules.ContinuousSincosEmbed(config)

Bases: torch.nn.Module

Embedding layer for continuous coordinates using sine and cosine functions. The original implementation from the Attenion is All You Need paper, deals with descrete 1D cordinates (i.e., a sequence). Howerver, this implementation is able to deal with 2D and 3D coordinate systems as well.

Two frequency schedules are supported via config.mode:

  • "wavelength" (default): geometric wavelengths from 1 to max_wavelength, matching the original Transformer encoding. Use this for integer / unnormalized coordinates.

  • "nerf": log-spaced frequencies from π to π * max_frequency. Use this for coordinates normalized to [-1, 1].

Parameters:

config (ContinuousSincosEmbeddingConfig) – Configuration for the ContinuousSincosEmbed module. See ContinuousSincosEmbeddingConfig for the available options.

omega: torch.Tensor
padding_tensor: torch.Tensor
hidden_dim
input_dim
ndim_padding
sincos_padding
mode
max_wavelength
max_frequency
padding
forward(coords)

Forward method of the ContinuousSincosEmbed layer.

Parameters:

coords (torch.Tensor) – Tensor of coordinates. The shape of the tensor should be [batch size, number of points, coordinate dimension] or [number of points, coordinate dimension].

Raises:

NotImplementedError – Only supports sparse (i.e. [number of points, coordinate dimension]) or dense (i.e. [batch size, number of points, coordinate dimension]) coordinates systems.

Returns:

Tensor with embedded coordinates.

Return type:

torch.Tensor

class noether.modeling.modules.LayerScale(config)

Bases: torch.nn.Module

LayerScale module scales the input tensor by a learnable parameter gamma.

Initialize the LayerScale module. :param config: Configuration for the LayerScale module. See LayerScaleConfig for details.

Parameters:

config (LayerScaleConfig)

forward(x)

Forward function of the LayerScale module.

Parameters:

x (torch.Tensor) – Input tensor to be scaled.

Returns:

Tensor scaled by the gamma parameter.

Return type:

torch.Tensor

class noether.modeling.modules.LinearProjection(config)

Bases: torch.nn.Module

LinearProjection is a linear projection layer that can be used for 1D, 2D, and 3D data.

Parameters:

config (LinearProjectionConfig) – The configuration of the LinearProjection. See LinearProjectionConfig for available options.

Raises:

NotImplementedError – raises not implemented error if the number of dimensions of the input domain is bigger than 4.

project: torch.nn.Linear | torch.nn.Conv1d | torch.nn.Conv2d | torch.nn.Conv3d | torch.nn.Identity
init_weights
reset_parameters()
Reset the parameters of the MLP with a specific initialization. Options are “torch” (i.e., default) or

“truncnormal002”.

Raises:

NotImplementedError – raised if the specified initialization is not implemented.

Return type:

None

forward(x)

Forward function of the LinearProjection.

Parameters:

x (torch.Tensor) – Input tensor to the LinearProjection.

Returns:

Output tensor from the LinearProjection.

Return type:

torch.Tensor

class noether.modeling.modules.UnquantizedDropPath(config)

Bases: torch.nn.Module

Unquantized drop path (Stochastic Depth, https://arxiv.org/abs/1603.09382) per sample. Unquantized means that dropped paths are still calculated. Number of dropped paths is fully stochastic, i.e., it can happen that not a single path is dropped or that all paths are dropped. In a quantized drop path, the same amount of paths are dropped in each forward pass, resulting in large speedups with high drop_prob values. See https://arxiv.org/abs/2212.04884 for more discussion. UnquantizedDropPath does not provide any speedup, consider using a quantized version if large drop_prob values are used.

Adapted from https://github.com/huggingface/pytorch-image-models/blob/main/timm/layers/drop.py#L150

Initialize the UnquantizedDropPath module.

Parameters:

config (UnquantizedDropPathConfig) – Configuration for the UnquantizedDropPath module. See UnquantizedDropPathConfig for the available options.

drop_prob
scale_by_keep
property keep_prob

Return the keep probability. I.e. the probability to keep a path, which is 1 - drop_prob.

Returns:

Float value of the keep probability.

forward(x)

Forward function of the UnquantizedDropPath module.

Parameters:

x (torch.Tensor) – Tensor to apply the drop path. Shape: (batch_size, …).

Returns:

(batch_size, …). If drop_prob is 0, the input tensor is returned. If drop_prob is 1, a tensor with zeros is returned.

Return type:

Tensor with drop path applied. Shape

extra_repr()

Extra representation of the UnquantizedDropPath module.

Returns:

Return a string representation of the module.

class noether.modeling.modules.MLP(config)

Bases: torch.nn.Module

Implements a Multi-Layer Perceptron (MLP) with configurable number of layers, hidden dimension activation functions and weight initialization methods. Only one hidden dimension is supported for simplicity, i.e., all hidden layers have the same dimension. The MLP will always have one input layer and one output layer. When num_layers=0, the MLP is a two layer network with one non-linearity in between. When num_layers>=1, the MLP has additional hidden layers, etc.

Initialize the MLP.

Parameters:

config (MLPConfig) – Configuration object for the MLP. See MLPConfig for available options.

init_weights
mlp
reset_parameters()
Reset the parameters of the MLP with a specific initialization. Options are “torch” (i.e., default), or

“truncnormal002”.

Raises:

NotImplementedError – raised if the specified initialization is not implemented.

Return type:

None

forward(x)

Forward function of the MLP.

Parameters:

x (torch.Tensor) – Input tensor to the MLP.

Returns:

Output tensor from the MLP.

Return type:

torch.Tensor

class noether.modeling.modules.UpActDownMlp(config)

Bases: torch.nn.Module

UpActDownMlp is a vanilla MLP with an up-projection followed by an GELU activation function and a down-projection to the original input dim.

Initialize the UpActDownMlp.

Parameters:

config (UpActDownMLPConfig) – The configuration of the UpActDownMlp.

init_weights
fc1
act
fc2
reset_parameters()
Reset the parameters of the MLP with a specific initialization. Options are “torch” (i.e., default), or

“truncnormal002”.

Raises:

NotImplementedError – raised if the specified initialization is not implemented.

Return type:

None

forward(x)

Forward function of the UpActDownMlp.

Parameters:

x (torch.Tensor) – Input tensor to the MLP.

Returns:

Output tensor from the MLP.

Return type:

torch.Tensor