noether.modeling.modules.layers

Submodules

Classes

ContinuousSincosEmbed

Embedding layer for continuous coordinates using sine and cosine functions.

ContinuousSincosEmbeddingConfig

Configuration for Continuous Sine-Cosine Embedding layer.

UnquantizedDropPath

Unquantized drop path (Stochastic Depth, https://arxiv.org/abs/1603.09382) per sample. Unquantized means

UnquantizedDropPathConfig

Configuration for the UnquantizedDropPath layer.

LayerScale

LayerScale module scales the input tensor by a learnable parameter gamma.

LayerScaleConfig

Configuration for Layer Scale module.

LinearProjection

LinearProjection is a linear projection layer that can be used for 1D, 2D, and 3D data.

LinearProjectionConfig

Configuration for a LinearProjection layer.

RopeFrequency

Creates frequencies for rotary embeddings (RoPE) from https://arxiv.org/abs/2104.09864 for variable positions.

RopeFrequencyConfig

Configuration for RoPE frequency settings.

ScalarsConditioner

Embeds num_scalars scalars into a single conditioning vector via first encoding every scalar with

ScalarsConditionerConfig

TransformerBatchNorm

Wrapper around torch.nn.BatchNorm1d that considers all tokens of a single sample as the full batch.

VectorsConditioner

Embeds a set of named vectors into a single conditioning vector.

VectorsConditionerConfig

Configuration for VectorsConditioner.

AvgPool2DPatchify

Tokenize a 2D grid by average-pooling each patch_size``×``patch_size patch.

ConvOutputHead

Conv output head decodes tokens to spatial output

FinalLayer

Final unpatchify projection with optional AdaLN modulation conditioned on a global vector c.

MaskPatchify

Downsample a boolean mask to patch resolution via max-pooling (True = at least one valid cell).

Package Contents

class noether.modeling.modules.layers.ContinuousSincosEmbed(config)

Bases: torch.nn.Module

Embedding layer for continuous coordinates using sine and cosine functions. The original implementation from the Attenion is All You Need paper, deals with descrete 1D cordinates (i.e., a sequence). Howerver, this implementation is able to deal with 2D and 3D coordinate systems as well.

Two frequency schedules are supported via config.mode:

  • "wavelength" (default): geometric wavelengths from 1 to max_wavelength, matching the original Transformer encoding. Use this for integer / unnormalized coordinates.

  • "nerf": log-spaced frequencies from π to π * max_frequency. Use this for coordinates normalized to [-1, 1].

Parameters:

config (ContinuousSincosEmbeddingConfig) – Configuration for the ContinuousSincosEmbed module. See ContinuousSincosEmbeddingConfig for the available options.

omega: torch.Tensor
padding_tensor: torch.Tensor
hidden_dim
input_dim
ndim_padding
sincos_padding
mode
max_wavelength
max_frequency
padding
forward(coords)

Forward method of the ContinuousSincosEmbed layer.

Parameters:

coords (torch.Tensor) – Tensor of coordinates. The shape of the tensor should be [batch size, number of points, coordinate dimension] or [number of points, coordinate dimension].

Raises:

NotImplementedError – Only supports sparse (i.e. [number of points, coordinate dimension]) or dense (i.e. [batch size, number of points, coordinate dimension]) coordinates systems.

Returns:

Tensor with embedded coordinates.

Return type:

torch.Tensor

class noether.modeling.modules.layers.ContinuousSincosEmbeddingConfig(/, **data)

Bases: pydantic.BaseModel

Configuration for Continuous Sine-Cosine Embedding layer.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

hidden_dim: int = None

Dimensionality of the output embedding.

input_dim: int = None

Dimensionality of the input coordinates.

mode: Literal['wavelength', 'nerf'] = None

Frequency schedule.

  • "wavelength" (default): transformer-style geometric wavelengths from 1 to max_wavelength. Suitable for integer / unnormalized coordinates.

  • "nerf": NeRF-style log-spaced frequencies from π to π * max_frequency. Suitable for coordinates normalized to [-1, 1]. The L available bands are distributed evenly in log-frequency across this range.

max_wavelength: int = None

Maximum wavelength. Only used when mode == "wavelength".

max_frequency: float | None = None

Highest frequency band for NeRF mode, in units of π. The L frequencies are log-spaced between π (wavelength 2, spans the [-1, 1] domain) and π * max_frequency (wavelength 2 / max_frequency). Required when mode == "nerf"; pick based on the smallest spatial scale you need to resolve in normalized coordinates (rough heuristic: 1 / typical_point_spacing).

class noether.modeling.modules.layers.UnquantizedDropPath(config)

Bases: torch.nn.Module

Unquantized drop path (Stochastic Depth, https://arxiv.org/abs/1603.09382) per sample. Unquantized means that dropped paths are still calculated. Number of dropped paths is fully stochastic, i.e., it can happen that not a single path is dropped or that all paths are dropped. In a quantized drop path, the same amount of paths are dropped in each forward pass, resulting in large speedups with high drop_prob values. See https://arxiv.org/abs/2212.04884 for more discussion. UnquantizedDropPath does not provide any speedup, consider using a quantized version if large drop_prob values are used.

Adapted from https://github.com/huggingface/pytorch-image-models/blob/main/timm/layers/drop.py#L150

Initialize the UnquantizedDropPath module.

Parameters:

config (UnquantizedDropPathConfig) – Configuration for the UnquantizedDropPath module. See UnquantizedDropPathConfig for the available options.

drop_prob
scale_by_keep
property keep_prob

Return the keep probability. I.e. the probability to keep a path, which is 1 - drop_prob.

Returns:

Float value of the keep probability.

forward(x)

Forward function of the UnquantizedDropPath module.

Parameters:

x (torch.Tensor) – Tensor to apply the drop path. Shape: (batch_size, …).

Returns:

(batch_size, …). If drop_prob is 0, the input tensor is returned. If drop_prob is 1, a tensor with zeros is returned.

Return type:

Tensor with drop path applied. Shape

extra_repr()

Extra representation of the UnquantizedDropPath module.

Returns:

Return a string representation of the module.

class noether.modeling.modules.layers.UnquantizedDropPathConfig(/, **data)

Bases: pydantic.BaseModel

Configuration for the UnquantizedDropPath layer.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

drop_prob: float = None

Probability of dropping a path during training.

scale_by_keep: bool = None

Up-scales activations during training by 1 - drop_prob to avoid train-test mismatch. Defaults to True.

class noether.modeling.modules.layers.LayerScale(config)

Bases: torch.nn.Module

LayerScale module scales the input tensor by a learnable parameter gamma.

Initialize the LayerScale module. :param config: Configuration for the LayerScale module. See LayerScaleConfig for details.

Parameters:

config (LayerScaleConfig)

forward(x)

Forward function of the LayerScale module.

Parameters:

x (torch.Tensor) – Input tensor to be scaled.

Returns:

Tensor scaled by the gamma parameter.

Return type:

torch.Tensor

class noether.modeling.modules.layers.LayerScaleConfig(/, **data)

Bases: pydantic.BaseModel

Configuration for Layer Scale module.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

hidden_dim: int = None

Number of dimensions of the input tensor to be scaled.

init_values: float | None = None

Initial gamme scale value. Defaults to 1e-5.

class noether.modeling.modules.layers.LinearProjection(config)

Bases: torch.nn.Module

LinearProjection is a linear projection layer that can be used for 1D, 2D, and 3D data.

Parameters:

config (LinearProjectionConfig) – The configuration of the LinearProjection. See LinearProjectionConfig for available options.

Raises:

NotImplementedError – raises not implemented error if the number of dimensions of the input domain is bigger than 4.

project: torch.nn.Linear | torch.nn.Conv1d | torch.nn.Conv2d | torch.nn.Conv3d | torch.nn.Identity
init_weights
reset_parameters()
Reset the parameters of the MLP with a specific initialization. Options are “torch” (i.e., default) or

“truncnormal002”.

Raises:

NotImplementedError – raised if the specified initialization is not implemented.

Return type:

None

forward(x)

Forward function of the LinearProjection.

Parameters:

x (torch.Tensor) – Input tensor to the LinearProjection.

Returns:

Output tensor from the LinearProjection.

Return type:

torch.Tensor

class noether.modeling.modules.layers.LinearProjectionConfig(/, **data)

Bases: pydantic.BaseModel

Configuration for a LinearProjection layer.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

input_dim: int = None

Input dimension of the linear projection.

output_dim: int = None

Output dimension of the linear projection.

ndim: None | int = None

Number of dimensions of the input domain. Either None (Linear projection), 1D (sequence), 2D, or 3D. Defaults to None.

bias: bool = None

If true, use bias term in the linear projection. Defaults to True.

optional: bool = None

If true and input_dim==output_dim (i.e., there is no up/down projection), then the identity mapping is used. Defaults to False.

init_weights: noether.core.types.InitWeightsMode = None

Initialization method of the weights of the MLP. Options are ‘torch’ (i.e., similar to the module) or ‘truncnormal002’, or ‘zero’. Defaults to ‘torch’.

validate_ndim()

Validate the ndim field to ensure it is either None, 1, 2, or 3.

Return type:

Self

class noether.modeling.modules.layers.RopeFrequency(config)

Bases: torch.nn.Module

Creates frequencies for rotary embeddings (RoPE) from https://arxiv.org/abs/2104.09864 for variable positions.

Parameters:

config (RopeFrequencyConfig) – Configuration for RoPE frequency settings. See RopeFrequencyConfig for available options.

omega: torch.Tensor
hidden_dim
input_dim
implementation
ndim_padding
sincos_padding
max_wavelength
padding
forward(coords)
Parameters:

coords (torch.Tensor) – coordinates to create RoPE frequencies for. Expected shape is (…, input_dim).

Return type:

torch.Tensor | tuple[torch.Tensor, Ellipsis]

class noether.modeling.modules.layers.RopeFrequencyConfig(/, **data)

Bases: pydantic.BaseModel

Configuration for RoPE frequency settings.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

hidden_dim: int = None

Dimensionality of frequencies (in transformers this should be the head dimension).

input_dim: int = None

Dimensionality of the coordinates (e.g., 2 for 2D coordinates, 3 for 3D coordinates).

max_wavelength: int = None

10000.0

Type:

Theta parameter for the transformer sine/cosine embedding. Default

implementation: Literal['real', 'complex'] = None

“real” -> basic implementation using real coordinates (this is slow and only here for backward compatibility). “complex” -> fast implementation of rotation via complex multiplication. Default: “real”.

class noether.modeling.modules.layers.ScalarsConditioner(config)

Bases: torch.nn.Module

Embeds num_scalars scalars into a single conditioning vector via first encoding every scalar with sine-cosine embeddings followed by a mlp (per scalar). These vectors are then concatenated and projected down to condition_dim with an MLP.

Parameters:

config (ScalarsConditionerConfig) – configuration for the ScalarsConditioner. See ScalarsConditionerConfig for available options.

hidden_dim
num_scalars
condition_dim
embed
mlps
shared_mlp
forward(*args, **kwargs)

Embeds scalars into a single conditioning vector. Scalars can be passed as *args or as **kwargs. It is recommended to use kwargs to avoid bugs that originate from passing scalars in a different order at two locations in the code. Recommended usage: condition = conditioner(geometry_angle=75.3, friction_angle=24.6) :param *args: Scalars in tensor representation (batch_size,) or (batch_size, 1). :param **kwargs: Scalars in tensor representation (batch_size,) or (batch_size, 1).

Returns:

Conditioning vector with shape (batch_size, condition_dim)

Parameters:
Return type:

torch.Tensor

Example: .. code-block:: python

conditioner = ScalarsConditioner(
ScalarsConditionerConfig(

hidden_dim=64, num_scalars=2, condition_dim=128, init_weights=”truncnormal002”,

)

) geometry_angle = torch.tensor([75.3, 80.1]) # shape (batch_size,) friction_angle = torch.tensor([24.6, 30.2]) # shape (batch_size,) condition = conditioner(

geometry_angle=geometry_angle, friction_angle=friction_angle

) # shape (batch_size, condition_dim)

class noether.modeling.modules.layers.ScalarsConditionerConfig(/, **data)

Bases: pydantic.BaseModel

Parameters:

data (Any)

hidden_dim: int = None

Dimension for embedding the scalars and the per-scalar MLP.

num_scalars: int = None

How many scalars are embedded.

condition_dim: int | None = None

Dimension of the final conditioning vector. Defaults to 4 * dim if condition_dim is None.

init_weights: noether.core.types.InitWeightsMode = 'truncnormal002'

Weight initialization for MLPs.

class noether.modeling.modules.layers.TransformerBatchNorm(num_features, eps=1e-05, elementwise_affine=True, bias=True)

Bases: torch.nn.Module

Wrapper around torch.nn.BatchNorm1d that considers all tokens of a single sample as the full batch. Additionally remaps affine to elementwise_affine and supports disabling bias to comply with the torch.nn.LayerNorm interface. Does not use any nn.BatchNorm1d modules to avoid errors with nn.SyncBatchnorm.

Parameters:
num_features
eps = 1e-05
elementwise_affine = True
forward(x)

BatchNorm1d where all tokens of a single sample correspond to a full batch.

Parameters:

x (torch.Tensor) – Tensor of shape (batch_size, seqlen, dim).

Returns:

Normalized x of shape (batch_size, seqlen, dim).

Return type:

torch.Tensor

class noether.modeling.modules.layers.VectorsConditioner(config)

Bases: torch.nn.Module

Embeds a set of named vectors into a single conditioning vector.

Each input vector named in config.conditioning_spec is encoded with a NeRF-mode ContinuousSincosEmbed followed by a per-vector MLP. The resulting per-vector embeddings are concatenated and projected to condition_dim by a shared MLP.

Note

All input vectors must be normalized to [-1, 1]. The underlying sine-cosine embedding uses NeRF-style frequencies tuned for that range; values outside it will alias and produce uninformative embeddings.

Parameters:

config (VectorsConditionerConfig) – configuration for the VectorsConditioner. See VectorsConditionerConfig for available options.

hidden_dim
condition_dim
conditioning_spec
embedder
shared_mlp
forward(**conditioning_inputs)

Embed a set of named vectors into a single conditioning vector.

All vectors declared in config.conditioning_spec must be supplied as keyword arguments matching the spec names. Inputs must be normalized to [-1, 1].

Parameters:

**conditioning_inputs (torch.Tensor) – Vectors with shape (batch_size, num_features), keyed by the names declared in config.conditioning_spec. The num_features of each vector must match the dimension declared in the spec. All inputs must share the same batch_size.

Returns:

Conditioning vector with shape (batch_size, condition_dim).

Raises:

ValueError – If the supplied inputs don’t match the spec (wrong number of vectors, missing key, wrong rank, or wrong feature dimension).

Return type:

torch.Tensor

Example

conditioner = VectorsConditioner(
    VectorsConditionerConfig(
        hidden_dim=64,
        conditioning_spec={"angle": 1, "shape_params": 3},
        condition_dim=128,
        max_frequency=1024,
    )
)
# Inputs normalized to [-1, 1].
angle = torch.tensor([[0.5], [-0.2]])  # shape (batch_size, 1)
shape_params = torch.tensor([[0.1, -0.3, 0.7], [-0.5, 0.2, -0.8]])  # shape (batch_size, 3)
condition = conditioner(angle=angle, shape_params=shape_params)
# condition.shape == (2, 128)
class noether.modeling.modules.layers.VectorsConditionerConfig(/, **data)

Bases: pydantic.BaseModel

Configuration for VectorsConditioner.

All conditioning inputs are expected to be normalized to [-1, 1]; the underlying sine-cosine embedding runs in NeRF mode.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

hidden_dim: int = None

Dimension of the per-vector embedding and per-vector MLP.

conditioning_spec: noether.data.schemas.FieldDimSpec

Mapping from input vector name to its feature dimension, e.g. {"angle": 1, "shape_params": 3}.

condition_dim: int | None = None

Dimension of the final conditioning vector. Defaults to hidden_dim if None.

max_frequency: float = None

Highest frequency band, in units of π, for the NeRF-mode sine-cosine embedding. Pick based on the smallest spatial scale you need to resolve in normalized coordinates (rough heuristic: 1 / typical_input_spacing).

init_weights: noether.core.types.InitWeightsMode = 'truncnormal002'

Weight initialization for MLPs.

class noether.modeling.modules.layers.AvgPool2DPatchify(patch_size=16)

Bases: torch.nn.Module

Tokenize a 2D grid by average-pooling each patch_size``×``patch_size patch.

Parameters:

patch_size (int)

patch_size = 16
patch
forward(x)

Pool spatial features into patches.

Parameters:

x (torch.Tensor) – Input grid with shape (B, H, W, C).

Returns:

Pooled patch grid of shape (B, H // patch_size, W // patch_size, C).

Return type:

torch.Tensor

class noether.modeling.modules.layers.ConvOutputHead(hidden_dim, out_channels, patch_size, mid_channels=64)

Bases: torch.nn.Module

Conv output head decodes tokens to spatial output

Parameters:
  • hidden_dim (int)

  • out_channels (int)

  • patch_size (int)

  • mid_channels (int)

patch_size
out_channels
stages
forward(x, grid_h, grid_w)

Decode tokens to spatial output via cascaded PixelShuffle stages.

Parameters:
  • x (torch.Tensor) – Flattened tokens of shape (B, grid_h * grid_w, hidden_dim).

  • grid_h (int) – Patch grid height (H // patch_size).

  • grid_w (int) – Patch grid width (W // patch_size).

Returns:

Spatial tensor of shape (B, H, W, out_channels) after upsampling.

Return type:

torch.Tensor

class noether.modeling.modules.layers.FinalLayer(hidden_size, patch_size, out_channels, use_modulation=True)

Bases: torch.nn.Module

Final unpatchify projection with optional AdaLN modulation conditioned on a global vector c.

Parameters:
  • hidden_size (int)

  • patch_size (int)

  • out_channels (int)

  • use_modulation (bool)

norm_final
linear
adaLN_modulation: torch.nn.Linear | None
forward(x, c=None)

Apply (optionally AdaLN-modulated) norm then linear projection.

Parameters:
  • x (torch.Tensor) – Tokens of shape (B, L, hidden_size).

  • c (torch.Tensor | None) – Conditioning vector of shape (B, hidden_size) when use_modulation=True; must be None when use_modulation=False. The caller is responsible for any upstream activation (e.g. SiLU) — this layer applies the AdaLN linear directly.

Returns:

Tensor of shape (B, L, patch_size**2 * out_channels).

Return type:

torch.Tensor

class noether.modeling.modules.layers.MaskPatchify(patch_size)

Bases: torch.nn.Module

Downsample a boolean mask to patch resolution via max-pooling (True = at least one valid cell).

Parameters:

patch_size (int)

patch_size
forward(mask)

Downsample boolean mask to patch resolution.

Parameters:

mask (torch.Tensor) – Boolean mask of shape (B, H, W).

Returns:

Flat boolean mask of shape (B, (H // patch_size) * (W // patch_size)).

Return type:

torch.Tensor