noether.modeling.modules¶
Submodules¶
Classes¶
Supported activation functions. |
|
Scaled dot-product attention module. |
|
Perceiver style attention module. This module is similar to a cross-attention module. |
|
Adapted from https://github.com/thuml/Transolver/blob/main/Car-Design-ShapeNetCar/models/Transolver.py |
|
For a self-attention module, the input tensor for the query, key, and value are the same. The PerceiverBlock, |
|
Configuration for the PerceiverBlock module. |
|
A transformer block with a single attention layer and a feedforward layer. |
|
Configuration for a transformer block. |
|
A deep Perceiver decoder module. Can be configured with different number of layers and hidden dimensions. |
|
Supernode pooling layer. |
|
Embedding layer for continuous coordinates using sine and cosine functions. |
|
LayerScale module scales the input tensor by a learnable parameter gamma. |
|
LinearProjection is a linear projection layer that can be used for 1D, 2D, and 3D data. |
|
Unquantized drop path (Stochastic Depth, https://arxiv.org/abs/1603.09382) per sample. Unquantized means |
|
Implements a Multi-Layer Perceptron (MLP) with configurable number of layers, hidden dimension activation functions and weight initialization methods. |
|
UpActDownMlp is a vanilla MLP with an up-projection followed by an GELU activation function and a |
Package Contents¶
- class noether.modeling.modules.Activation(*args, **kwds)¶
Bases:
enum.EnumSupported activation functions.
- GELU¶
- TANH¶
- SIGMOID¶
- RELU¶
- LEAKY_RELU¶
- SOFTPLUS¶
- ELU¶
- SILU¶
- build()¶
Create a new instance of the activation module.
- Return type:
- class noether.modeling.modules.DotProductAttention(config)¶
Bases:
torch.nn.ModuleScaled dot-product attention module.
- Parameters:
config (noether.core.schemas.modules.attention.AttentionConfig) – Configuration for the DotProductAttention module. See
AttentionConfigfor available options.
- num_heads = None¶
- head_dim¶
- init_weights = None¶
- use_rope = None¶
- dropout = None¶
- proj_dropout¶
- q¶
- k¶
- v¶
- proj¶
- forward(x, attn_mask=None, freqs=None)¶
Forward function of the DotProductAttention module.
- Parameters:
x (torch.Tensor) – Tensor to apply self-attention over, shape (batch size, sequence length, hidden_dim).
attn_mask (torch.Tensor | None) – For causal attention (i.e., no attention over the future token) a attention mask should be provided. Defaults to None.
freqs (torch.Tensor | None) – Frequencies for Rotary Positional Embedding (RoPE) of queries/keys. None if use_rope=False.
- Returns:
Returns the output of the attention module.
- Return type:
- class noether.modeling.modules.PerceiverAttention(config)¶
Bases:
torch.nn.ModulePerceiver style attention module. This module is similar to a cross-attention module.
Supports KV caching: when
kv_cacheis provided, the projected K/V tensors (with RoPE already applied) are loaded from the cache instead of being recomputed fromkv.- Parameters:
config (noether.core.schemas.modules.attention.AttentionConfig) – Configuration for the PerceiverAttention module. See
AttentionConfigfor available options.
- num_heads = None¶
- head_dim¶
- init_weights = None¶
- use_rope = None¶
- k¶
- v¶
- q¶
- proj¶
- dropout = None¶
- proj_dropout¶
- forward(q, kv=None, attn_mask=None, q_freqs=None, k_freqs=None, kv_cache=None)¶
Forward function of the PerceiverAttention module.
- Parameters:
q (torch.Tensor) – Query tensor, shape (batch size, number of points/tokens, hidden_dim).
kv (torch.Tensor | None) – Key/value tensor, shape (batch size, number of latent tokens, kv_dim). Can be
Nonewhenkv_cacheis provided.attn_mask (torch.Tensor | None) – When applying causal attention, an attention mask is required. Defaults to None.
q_freqs (torch.Tensor | None) – Frequencies for Rotary Positional Embedding (RoPE) of queries. None if use_rope=False.
k_freqs (torch.Tensor | None) – Frequencies for Rotary Positional Embedding (RoPE) of keys. None if use_rope=False. Not needed when loading from
kv_cache(RoPE was already applied).kv_cache (dict[str, torch.Tensor] | None) – Cached K/V tensors from a previous forward pass. Structure:
{"k": tensor, "v": tensor}. When provided,kvandk_freqsare ignored.
- Returns:
Tuple of (output, new_kv_cache).
- Return type:
tuple[torch.Tensor, dict[str, torch.Tensor] | None]
- class noether.modeling.modules.TransolverAttention(config)¶
Bases:
torch.nn.ModuleAdapted from https://github.com/thuml/Transolver/blob/main/Car-Design-ShapeNetCar/models/Transolver.py - Readable reshaping operations via einops - Merged qkv linear layer for higher GPU utilization - F.scaled_dot_product_attention instead of slow pytorch attention - Possibility to mask tokens (required to process variable sized inputs)
- Parameters:
config (noether.core.schemas.modules.attention.AttentionConfig) – Configuration for the Transolver attention module. See
AttentionConfigfor available options.
- num_heads = None¶
- dropout = None¶
- temperature¶
- in_project_x¶
- in_project_fx¶
- in_project_slice¶
- q¶
- k¶
- v¶
- proj¶
- proj_dropout¶
- create_slices(x, num_input_points, attn_mask=None)¶
Given a set of points, project them to a fixed number of slices using the computed the slice weights per token.
- Parameters:
x (torch.Tensor) – Input tensor with shape (batch_size, num_input_points, hidden_dim).
num_input_points (int) – Number of input points.
attn_mask (torch.Tensor | None) – Mask to mask out certain token for the attention. Defaults to None.
- Returns:
Tensor with the projected slice tokens and the slice weights.
- forward(x, attn_mask=None)¶
Forward pass of the Transolver attention module.
- Parameters:
x (torch.Tensor) – Input tensor with shape (batch_size, seqlen, hidden_dim).
attn_mask (torch.Tensor | None) – Attention mask tensor with shape (batch_size). Defaults to None.
- Returns:
Tensor after applying the transolver attention mechanism.
- class noether.modeling.modules.PerceiverBlock(config)¶
Bases:
torch.nn.ModuleFor a self-attention module, the input tensor for the query, key, and value are the same. The PerceiverBlock, takes different input tensors for the query and the key/value.
- Parameters:
config (PerceiverBlockConfig) – Configuration of the PerceiverBlock. See
PerceiverBlockConfigoptions. (for available)
- norm1q¶
- norm1kv¶
- attn¶
- ls1¶
- drop_path1¶
- norm2¶
- mlp¶
- ls2¶
- drop_path2¶
- forward(q, kv=None, condition=None, attn_kwargs=None)¶
Forward pass of the PerceiverBlock.
- Parameters:
q (torch.Tensor) – Input tensor with shape (batch_size, seqlen/num_tokens, hidden_dim) for the query representations.
kv (torch.Tensor | None) – Input tensor with shape (batch_size, seqlen/num_tokens, hidden_dim) for the key and value representations. Can be
Nonewhen akv_cacheis provided inattn_kwargs.condition (torch.Tensor | None) – Conditioning vector. If provided, the attention and MLP will be scaled, shifted and gated feature-wise with predicted values from this vector.
attn_kwargs (dict[str, Any] | None) – Dict with arguments for the attention (such as the attention mask, rope frequencies, or kv_cache). Defaults to None.
- Returns:
Tuple of (output_tensor, kv_cache).
kv_cachecontains cached K/V from the perceiver attention, orNonewhen loading from cache.- Return type:
tuple[torch.Tensor, dict[str, torch.Tensor] | None]
- class noether.modeling.modules.PerceiverBlockConfig(/, **data)¶
Bases:
noether.modeling.modules.blocks.transformer.TransformerBlockConfigConfiguration for the PerceiverBlock module.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
data (Any)
- kv_dim: int | None = None¶
Dimensionality of the key and value representations. Defaults to None. If None, hidden_dim is used.
- set_kv_dim()¶
Set kv_dim to hidden_dim if not provided.
- Return type:
- perceiver_attention_config()¶
- modulation_linear_projection_config()¶
- Return type:
noether.modeling.modules.layers.LinearProjectionConfig | None
- class noether.modeling.modules.TransformerBlock(config)¶
Bases:
torch.nn.ModuleA transformer block with a single attention layer and a feedforward layer.
- Parameters:
config (TransformerBlockConfig) – Configuration for the transformer block. See
TransformerBlockConfigfor available options.
- config¶
- norm1¶
- attention_block¶
- ls1¶
- drop_path1¶
- norm2¶
- mlp¶
- ls2¶
- drop_path2¶
- forward(x, condition=None, attn_kwargs=None)¶
Forward pass of the transformer block.
- Parameters:
x (torch.Tensor) – Input tensor with shape (batch_size, seqlen/num_tokens, hidden_dim).
condition (torch.Tensor | None) – Conditioning vector. If provided, the attention and MLP will be scaled, shifted and gated feature-wise with predicted values from this vector.
attn_kwargs (dict[str, Any] | None) – Dict with arguments for the attention (such as the attention mask or rope frequencies). Defaults to None.
- Returns:
Tuple of (output_tensor, kv_cache).
kv_cacheisNonewhen the attention module does not return a cache (e.g. standardDotProductAttention).- Return type:
tuple[torch.Tensor, dict[str, dict[str, torch.Tensor]] | None]
- class noether.modeling.modules.TransformerBlockConfig(/, **data)¶
Bases:
pydantic.BaseModelConfiguration for a transformer block.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
data (Any)
Hidden Dimension of the transformer block.
Hidden dimension of the MLP layer. If set to None, the mlp_hidden dim is set to hidden_dim * mlp_expansion_factor in the TransformerConfig. If both are None, an error is raised.
- mlp_expansion_factor: int | None = None¶
Expansion factor for the MLP hidden dimension relative to the hidden dimension. If ‘mlp_hidden_dim’ is not set, this factor is used to compute it as hidden_dim * mlp_expansion_factor.
- attention_constructor: Literal['dot_product', 'perceiver', 'transolver', 'transolver_plusplus'] = 'dot_product'¶
Constructor of the attention module. Defaults to ‘dot_product’.
- condition_dim: int | None = None¶
Dimension of the conditioning vector. If none, no conditioning is applied. If provided, the transformer block will turn into a Diffusion Transformer (DiT) block.
- init_weights: noether.core.types.InitWeightsMode = None¶
Initialization method for the weight matrices of the network. Defaults to “truncnormal002
- max_wavelength: int | None = None¶
10_000
- Type:
Theta parameter for the transformer sine/cosine embedding. Default
- attention_arguments: dict¶
Additional arguments for the attention module that are only needed for a specific attention implementation.
- set_wavelength_for_rope()¶
- linear_projection_config()¶
- layerscale_config()¶
- drop_path_config()¶
- modulation_linear_projection_config()¶
- Return type:
LinearProjectionConfig | None
- up_act_down_mlp_config()¶
- class noether.modeling.modules.DeepPerceiverDecoder(config)¶
Bases:
torch.nn.ModuleA deep Perceiver decoder module. Can be configured with different number of layers and hidden dimensions. However, it should be noted that this layer is not a full-fledged Perceiver, since it only has a cross-attention mechanism.
- Parameters:
config (DeepPerceiverDecoderConfig) – Configuration for the DeepPerceiverDecoder module. See
DeepPerceiverDecoderConfigfor available options.
- blocks¶
- forward(kv, queries, attn_kwargs=None, condition=None)¶
Forward pass of the model.
- Parameters:
kv (torch.Tensor) – The key-value tensor (batch_size, num_latent_tokens, dim).
queries (torch.Tensor) – The query tensor (batch_size, num_output_queries, dim).
attn_kwargs (dict[str, Any] | None) – Dict with arguments for the attention (such as the attention mask or rope frequencies). Defaults to None.
condition (torch.Tensor | None) – Optional conditioning tensor that can be used in the attention mechanism. This can be used to pass additional conditioning information, etc.
- Returns:
The predictions as sparse tensor (batch_size * num_output_pos, num_out_values).
- Return type:
- class noether.modeling.modules.SupernodePooling(config)¶
Bases:
torch.nn.ModuleSupernode pooling layer.
The permutation of the supernodes is preserved through the message passing (contrary to the (GP-)UPT code). Additionally, radius is used instead of radius_graph, which is more efficient.
Initialize the SupernodePooling.
- Parameters:
config (SupernodePoolingConfig) – Configuration for the SupernodePooling module. See
SupernodePoolingConfigfor available options.
- radius¶
- k¶
- max_degree¶
- spool_pos_mode¶
- readd_supernode_pos¶
- aggregation¶
- input_features_dim¶
- pos_embed¶
- output_dim¶
- compute_src_and_dst_indices(input_pos, supernode_idx, batch_idx=None)¶
Compute the source and destination indices for the message passing to the supernodes.
- Parameters:
input_pos (torch.Tensor) – Sparse tensor with shape (batch_size * number of points, 3), representing the input geometries.
supernode_idx (torch.Tensor) – Indexes of the supernodes in the sparse tensor input_pos.
batch_idx (torch.Tensor | None) – 1D tensor, containing the batch index of each entry in input_pos. Default None.
- Returns:
Tuple of (src_idx, dst_idx, local_dst_idx) where src_idx and dst_idx are absolute indices into input_pos and local_dst_idx is a 0-indexed position into supernode_idx (used for scatter_reduce_).
- Return type:
- create_messages(input_pos, src_idx, dst_idx, supernode_idx, input_features=None)¶
Create messages for the message passing to the supernodes, based on different positional encoding representations.
- Parameters:
input_pos (torch.Tensor) – Tensor of shape (batch_size * number_of_points_per_sample, {2,3}), representing the point cloud representation of the input geometry.
src_idx (torch.Tensor) – Index of the source nodes from input_pos.
dst_idx (torch.Tensor) – Source index of the destination nodes from input_pos tensor. These indexes should be the matching supernode indexes.
supernode_idx (torch.Tensor) – Indexes of the node in input_pos that are considered supernodes.
input_features (torch.Tensor | None)
- Raises:
NotImplementedError – Raised if the mode is not implemented. Either “abspos”, “relpos” or “absrelpos” are allowed.
- Returns:
- Tensor with messages for the message passing into the super nodes and the embedding coordinates of the
supernodes.
- Return type:
- accumulate_messages(x, local_dst_idx, supernode_idx)¶
Method to accumulate the messages of neighbouring points into the supernodes.
- Parameters:
x (torch.Tensor) – Tensor containing the message representation of each neighbour representation.
local_dst_idx (torch.Tensor) – 0-indexed position into supernode_idx for each message (no CUDA sync).
supernode_idx (torch.Tensor) – Indexes of the supernode in the input point cloud.
- Returns:
Tensor with the aggregated messages for each supernode.
- Return type:
- forward(input_pos, supernode_idx, batch_idx=None, input_features=None)¶
Forward pass of the supernode pooling layer.
- Parameters:
input_pos (torch.Tensor) – Sparse tensor with shape (batch_size * number_of_points_per_sample, 3), representing the point cloud representation of the input geometry.
supernode_idx (torch.Tensor) – indexes of the supernodes in the sparse tensor input_pos.
batch_idx (torch.Tensor | None) – 1D tensor, containing the batch index of each entry in input_pos. Default None.
input_features (torch.Tensor | None) – Sparse tensor with shape (batch_size * number_of_points_per_sample, number_of_features)
- Returns:
Tensor with the aggregated messages for each supernode.
- Return type:
- class noether.modeling.modules.SupernodePoolingConfig(/, **data)¶
Bases:
pydantic.BaseModel- Parameters:
data (Any)
Hidden dimension for positional embeddings, messages and the resulting output vector.
- input_dim: int = None¶
Number of positional dimension (e.g., input_dim=2 for a 2D position, input_dim=3 for a 3D position)
- radius: float | None = None¶
Radius around each supernode. From points within this radius, messages are passed to the supernode.
- k: int | None = None¶
Number of neighbors for each supernode. From the k-NN points, messages are passed to the supernode.
- spool_pos_mode: Literal['abspos', 'relpos', 'absrelpos'] = None¶
absolute space (“abspos”), relative space (“relpos”) or both (“absrelpos”).
- Type:
Type of position embedding
- init_weights: noether.core.types.InitWeightsMode = None¶
Weight initialization of linear layers. Defaults to “truncnormal002”.
- readd_supernode_pos: bool = None¶
If true, the absolute positional encoding of the supernode is concatenated to the supernode vector after message passing and linearly projected back to hidden_dim. Defaults to True.
- aggregation: Literal['mean', 'sum'] = None¶
Aggregation for message passing (“mean” or “sum”).
- message_mode: Literal['mlp', 'linear', 'identity'] = None¶
How messages are created. “mlp” (2 layer MLP), “linear” (nn.Linear), “identity” (nn.Identity). Defaults to “mlp”.
- input_features_dim: int | None = None¶
Number of input features per point. None will fall back to a version without features. Defaults to None, which means no input features.
- validate_radius_and_k()¶
- class noether.modeling.modules.ContinuousSincosEmbed(config)¶
Bases:
torch.nn.ModuleEmbedding layer for continuous coordinates using sine and cosine functions. The original implementation from the Attenion is All You Need paper, deals with descrete 1D cordinates (i.e., a sequence). Howerver, this implementation is able to deal with 2D and 3D coordinate systems as well.
Two frequency schedules are supported via
config.mode:"wavelength"(default): geometric wavelengths from1tomax_wavelength, matching the original Transformer encoding. Use this for integer / unnormalized coordinates."nerf": log-spaced frequencies fromπtoπ * max_frequency. Use this for coordinates normalized to[-1, 1].
- Parameters:
config (ContinuousSincosEmbeddingConfig) – Configuration for the ContinuousSincosEmbed module. See
ContinuousSincosEmbeddingConfigfor the available options.
- omega: torch.Tensor¶
- padding_tensor: torch.Tensor¶
- input_dim¶
- ndim_padding¶
- sincos_padding¶
- mode¶
- max_wavelength¶
- max_frequency¶
- padding¶
- forward(coords)¶
Forward method of the ContinuousSincosEmbed layer.
- Parameters:
coords (torch.Tensor) – Tensor of coordinates. The shape of the tensor should be [batch size, number of points, coordinate dimension] or [number of points, coordinate dimension].
- Raises:
NotImplementedError – Only supports sparse (i.e. [number of points, coordinate dimension]) or dense (i.e. [batch size, number of points, coordinate dimension]) coordinates systems.
- Returns:
Tensor with embedded coordinates.
- Return type:
- class noether.modeling.modules.LayerScale(config)¶
Bases:
torch.nn.ModuleLayerScale module scales the input tensor by a learnable parameter gamma.
Initialize the LayerScale module. :param config: Configuration for the LayerScale module. See
LayerScaleConfigfor details.- Parameters:
config (LayerScaleConfig)
- forward(x)¶
Forward function of the LayerScale module.
- Parameters:
x (torch.Tensor) – Input tensor to be scaled.
- Returns:
Tensor scaled by the gamma parameter.
- Return type:
- class noether.modeling.modules.LinearProjection(config)¶
Bases:
torch.nn.ModuleLinearProjection is a linear projection layer that can be used for 1D, 2D, and 3D data.
- Parameters:
config (LinearProjectionConfig) – The configuration of the LinearProjection. See
LinearProjectionConfigfor available options.- Raises:
NotImplementedError – raises not implemented error if the number of dimensions of the input domain is bigger than 4.
- project: torch.nn.Linear | torch.nn.Conv1d | torch.nn.Conv2d | torch.nn.Conv3d | torch.nn.Identity¶
- init_weights¶
- reset_parameters()¶
- Reset the parameters of the MLP with a specific initialization. Options are “torch” (i.e., default) or
“truncnormal002”.
- Raises:
NotImplementedError – raised if the specified initialization is not implemented.
- Return type:
None
- forward(x)¶
Forward function of the LinearProjection.
- Parameters:
x (torch.Tensor) – Input tensor to the LinearProjection.
- Returns:
Output tensor from the LinearProjection.
- Return type:
- class noether.modeling.modules.UnquantizedDropPath(config)¶
Bases:
torch.nn.ModuleUnquantized drop path (Stochastic Depth, https://arxiv.org/abs/1603.09382) per sample. Unquantized means that dropped paths are still calculated. Number of dropped paths is fully stochastic, i.e., it can happen that not a single path is dropped or that all paths are dropped. In a quantized drop path, the same amount of paths are dropped in each forward pass, resulting in large speedups with high drop_prob values. See https://arxiv.org/abs/2212.04884 for more discussion. UnquantizedDropPath does not provide any speedup, consider using a quantized version if large drop_prob values are used.
Adapted from https://github.com/huggingface/pytorch-image-models/blob/main/timm/layers/drop.py#L150
Initialize the UnquantizedDropPath module.
- Parameters:
config (UnquantizedDropPathConfig) – Configuration for the UnquantizedDropPath module. See
UnquantizedDropPathConfigfor the available options.
- drop_prob¶
- scale_by_keep¶
- property keep_prob¶
Return the keep probability. I.e. the probability to keep a path, which is 1 - drop_prob.
- Returns:
Float value of the keep probability.
- forward(x)¶
Forward function of the UnquantizedDropPath module.
- Parameters:
x (torch.Tensor) – Tensor to apply the drop path. Shape: (batch_size, …).
- Returns:
(batch_size, …). If drop_prob is 0, the input tensor is returned. If drop_prob is 1, a tensor with zeros is returned.
- Return type:
Tensor with drop path applied. Shape
- extra_repr()¶
Extra representation of the UnquantizedDropPath module.
- Returns:
Return a string representation of the module.
- class noether.modeling.modules.MLP(config)¶
Bases:
torch.nn.ModuleImplements a Multi-Layer Perceptron (MLP) with configurable number of layers, hidden dimension activation functions and weight initialization methods. Only one hidden dimension is supported for simplicity, i.e., all hidden layers have the same dimension. The MLP will always have one input layer and one output layer. When num_layers=0, the MLP is a two layer network with one non-linearity in between. When num_layers>=1, the MLP has additional hidden layers, etc.
Initialize the MLP.
- Parameters:
config (MLPConfig) – Configuration object for the MLP. See
MLPConfigfor available options.
- init_weights¶
- mlp¶
- reset_parameters()¶
- Reset the parameters of the MLP with a specific initialization. Options are “torch” (i.e., default), or
“truncnormal002”.
- Raises:
NotImplementedError – raised if the specified initialization is not implemented.
- Return type:
None
- forward(x)¶
Forward function of the MLP.
- Parameters:
x (torch.Tensor) – Input tensor to the MLP.
- Returns:
Output tensor from the MLP.
- Return type:
- class noether.modeling.modules.UpActDownMlp(config)¶
Bases:
torch.nn.ModuleUpActDownMlp is a vanilla MLP with an up-projection followed by an GELU activation function and a down-projection to the original input dim.
Initialize the UpActDownMlp.
- Parameters:
config (UpActDownMLPConfig) – The configuration of the UpActDownMlp.
- init_weights¶
- fc1¶
- act¶
- fc2¶
- reset_parameters()¶
- Reset the parameters of the MLP with a specific initialization. Options are “torch” (i.e., default), or
“truncnormal002”.
- Raises:
NotImplementedError – raised if the specified initialization is not implemented.
- Return type:
None
- forward(x)¶
Forward function of the UpActDownMlp.
- Parameters:
x (torch.Tensor) – Input tensor to the MLP.
- Returns:
Output tensor from the MLP.
- Return type: