noether.modeling.modules.layers¶

Submodules¶

Classes¶

`ContinuousSincosEmbed`	Embedding layer for continuous coordinates using sine and cosine functions.
`ContinuousSincosEmbeddingConfig`	Configuration for Continuous Sine-Cosine Embedding layer.
`UnquantizedDropPath`	Unquantized drop path (Stochastic Depth, https://arxiv.org/abs/1603.09382) per sample. Unquantized means
`UnquantizedDropPathConfig`	Configuration for the UnquantizedDropPath layer.
`LayerScale`	LayerScale module scales the input tensor by a learnable parameter gamma.
`LayerScaleConfig`	Configuration for Layer Scale module.
`LinearProjection`	LinearProjection is a linear projection layer that can be used for 1D, 2D, and 3D data.
`LinearProjectionConfig`	Configuration for a LinearProjection layer.
`RopeFrequency`	Creates frequencies for rotary embeddings (RoPE) from https://arxiv.org/abs/2104.09864 for variable positions.
`RopeFrequencyConfig`	Configuration for RoPE frequency settings.
`ScalarsConditioner`	Embeds num_scalars scalars into a single conditioning vector via first encoding every scalar with
`ScalarsConditionerConfig`
`TransformerBatchNorm`	Wrapper around torch.nn.BatchNorm1d that considers all tokens of a single sample as the full batch.
`VectorsConditioner`	Embeds a set of named vectors into a single conditioning vector.
`VectorsConditionerConfig`	Configuration for `VectorsConditioner`.
`AvgPool2DPatchify`	Tokenize a 2D grid by average-pooling each patch_size``×``patch_size patch.
`ConvOutputHead`	Conv output head decodes tokens to spatial output
`FinalLayer`	Final unpatchify projection with optional AdaLN modulation conditioned on a global vector `c`.
`MaskPatchify`	Downsample a boolean mask to patch resolution via max-pooling (`True` = at least one valid cell).

Package Contents¶

class noether.modeling.modules.layers.ContinuousSincosEmbed(config)¶

Bases: torch.nn.Module

Embedding layer for continuous coordinates using sine and cosine functions. The original implementation from the Attenion is All You Need paper, deals with descrete 1D cordinates (i.e., a sequence). Howerver, this implementation is able to deal with 2D and 3D coordinate systems as well.

Two frequency schedules are supported via config.mode:

"wavelength" (default): geometric wavelengths from 1 to max_wavelength, matching the original Transformer encoding. Use this for integer / unnormalized coordinates.
"nerf": log-spaced frequencies from π to π * max_frequency. Use this for coordinates normalized to [-1, 1].

Parameters:: config (ContinuousSincosEmbeddingConfig) – Configuration for the ContinuousSincosEmbed module. See ContinuousSincosEmbeddingConfig for the available options.

omega: torch.Tensor¶

padding_tensor: torch.Tensor¶

hidden_dim¶

input_dim¶

ndim_padding¶

sincos_padding¶

mode¶

max_wavelength¶

max_frequency¶

padding¶

forward(coords)¶

Forward method of the ContinuousSincosEmbed layer.

Parameters:: coords (torch.Tensor) – Tensor of coordinates. The shape of the tensor should be [batch size, number of points, coordinate dimension] or [number of points, coordinate dimension].
Raises:: NotImplementedError – Only supports sparse (i.e. [number of points, coordinate dimension]) or dense (i.e. [batch size, number of points, coordinate dimension]) coordinates systems.
Returns:: Tensor with embedded coordinates.
Return type:: torch.Tensor

class noether.modeling.modules.layers.ContinuousSincosEmbeddingConfig(/, **data)¶

Bases: pydantic.BaseModel

Configuration for Continuous Sine-Cosine Embedding layer.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

hidden_dim: int = None¶: Dimensionality of the output embedding.

input_dim: int = None¶: Dimensionality of the input coordinates.

mode: Literal['wavelength', 'nerf'] = None¶

Frequency schedule.

"wavelength" (default): transformer-style geometric wavelengths from 1 to max_wavelength. Suitable for integer / unnormalized coordinates.
"nerf": NeRF-style log-spaced frequencies from π to π * max_frequency. Suitable for coordinates normalized to [-1, 1]. The L available bands are distributed evenly in log-frequency across this range.

max_wavelength: int = None¶: Maximum wavelength. Only used when mode == "wavelength".

max_frequency: float | None = None¶: Highest frequency band for NeRF mode, in units of π. The L frequencies are log-spaced between π (wavelength 2, spans the [-1, 1] domain) and π * max_frequency (wavelength 2 / max_frequency). Required when mode == "nerf"; pick based on the smallest spatial scale you need to resolve in normalized coordinates (rough heuristic: 1 / typical_point_spacing).

class noether.modeling.modules.layers.UnquantizedDropPath(config)¶

Bases: torch.nn.Module

Unquantized drop path (Stochastic Depth, https://arxiv.org/abs/1603.09382) per sample. Unquantized means that dropped paths are still calculated. Number of dropped paths is fully stochastic, i.e., it can happen that not a single path is dropped or that all paths are dropped. In a quantized drop path, the same amount of paths are dropped in each forward pass, resulting in large speedups with high drop_prob values. See https://arxiv.org/abs/2212.04884 for more discussion. UnquantizedDropPath does not provide any speedup, consider using a quantized version if large drop_prob values are used.

Adapted from https://github.com/huggingface/pytorch-image-models/blob/main/timm/layers/drop.py#L150

Initialize the UnquantizedDropPath module.

Parameters:: config (UnquantizedDropPathConfig) – Configuration for the UnquantizedDropPath module. See UnquantizedDropPathConfig for the available options.

drop_prob¶

scale_by_keep¶

property keep_prob¶

Return the keep probability. I.e. the probability to keep a path, which is 1 - drop_prob.

Returns:: Float value of the keep probability.

forward(x)¶

Forward function of the UnquantizedDropPath module.

Parameters:: x (torch.Tensor) – Tensor to apply the drop path. Shape: (batch_size, …).
Returns:: (batch_size, …). If drop_prob is 0, the input tensor is returned. If drop_prob is 1, a tensor with zeros is returned.
Return type:: Tensor with drop path applied. Shape

extra_repr()¶

Extra representation of the UnquantizedDropPath module.

Returns:: Return a string representation of the module.

class noether.modeling.modules.layers.UnquantizedDropPathConfig(/, **data)¶

Bases: pydantic.BaseModel

Configuration for the UnquantizedDropPath layer.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

drop_prob: float = None¶: Probability of dropping a path during training.

scale_by_keep: bool = None¶: Up-scales activations during training by 1 - drop_prob to avoid train-test mismatch. Defaults to True.

class noether.modeling.modules.layers.LayerScale(config)¶

Bases: torch.nn.Module

LayerScale module scales the input tensor by a learnable parameter gamma.

Initialize the LayerScale module. :param config: Configuration for the LayerScale module. See LayerScaleConfig for details.

Parameters:: config (LayerScaleConfig)

forward(x)¶

Forward function of the LayerScale module.

Parameters:: x (torch.Tensor) – Input tensor to be scaled.
Returns:: Tensor scaled by the gamma parameter.
Return type:: torch.Tensor

class noether.modeling.modules.layers.LayerScaleConfig(/, **data)¶

Bases: pydantic.BaseModel

Configuration for Layer Scale module.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

hidden_dim: int = None¶: Number of dimensions of the input tensor to be scaled.

init_values: float | None = None¶: Initial gamme scale value. Defaults to 1e-5.

class noether.modeling.modules.layers.LinearProjection(config)¶

Bases: torch.nn.Module

LinearProjection is a linear projection layer that can be used for 1D, 2D, and 3D data.

Parameters:: config (LinearProjectionConfig) – The configuration of the LinearProjection. See LinearProjectionConfig for available options.
Raises:: NotImplementedError – raises not implemented error if the number of dimensions of the input domain is bigger than 4.

project: torch.nn.Linear | torch.nn.Conv1d | torch.nn.Conv2d | torch.nn.Conv3d | torch.nn.Identity¶

init_weights¶

reset_parameters()¶

Reset the parameters of the MLP with a specific initialization. Options are “torch” (i.e., default) or: “truncnormal002”.

Raises:: NotImplementedError – raised if the specified initialization is not implemented.
Return type:: None

forward(x)¶

Forward function of the LinearProjection.

Parameters:: x (torch.Tensor) – Input tensor to the LinearProjection.
Returns:: Output tensor from the LinearProjection.
Return type:: torch.Tensor

class noether.modeling.modules.layers.LinearProjectionConfig(/, **data)¶

Bases: pydantic.BaseModel

Configuration for a LinearProjection layer.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

input_dim: int = None¶: Input dimension of the linear projection.

output_dim: int = None¶: Output dimension of the linear projection.

ndim: None | int = None¶: Number of dimensions of the input domain. Either None (Linear projection), 1D (sequence), 2D, or 3D. Defaults to None.

bias: bool = None¶: If true, use bias term in the linear projection. Defaults to True.

optional: bool = None¶: If true and input_dim==output_dim (i.e., there is no up/down projection), then the identity mapping is used. Defaults to False.

init_weights: noether.core.types.InitWeightsMode = None¶: Initialization method of the weights of the MLP. Options are ‘torch’ (i.e., similar to the module) or ‘truncnormal002’, or ‘zero’. Defaults to ‘torch’.

validate_ndim()¶

Validate the ndim field to ensure it is either None, 1, 2, or 3.

Return type:: Self

class noether.modeling.modules.layers.RopeFrequency(config)¶

Bases: torch.nn.Module

Creates frequencies for rotary embeddings (RoPE) from https://arxiv.org/abs/2104.09864 for variable positions.

Parameters:: config (RopeFrequencyConfig) – Configuration for RoPE frequency settings. See RopeFrequencyConfig for available options.

omega: torch.Tensor¶

hidden_dim¶

input_dim¶

implementation¶

ndim_padding¶

sincos_padding¶

max_wavelength¶

padding¶

forward(coords)¶

Parameters:: coords (torch.Tensor) – coordinates to create RoPE frequencies for. Expected shape is (…, input_dim).
Return type:: torch.Tensor | tuple[torch.Tensor, Ellipsis]

class noether.modeling.modules.layers.RopeFrequencyConfig(/, **data)¶

Bases: pydantic.BaseModel

Configuration for RoPE frequency settings.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

hidden_dim: int = None¶: Dimensionality of frequencies (in transformers this should be the head dimension).

input_dim: int = None¶: Dimensionality of the coordinates (e.g., 2 for 2D coordinates, 3 for 3D coordinates).

max_wavelength: int = None¶

10000.0

Type:: Theta parameter for the transformer sine/cosine embedding. Default

implementation: Literal['real', 'complex'] = None¶: “real” -> basic implementation using real coordinates (this is slow and only here for backward compatibility). “complex” -> fast implementation of rotation via complex multiplication. Default: “real”.

class noether.modeling.modules.layers.ScalarsConditioner(config)¶

Bases: torch.nn.Module

Embeds num_scalars scalars into a single conditioning vector via first encoding every scalar with sine-cosine embeddings followed by a mlp (per scalar). These vectors are then concatenated and projected down to condition_dim with an MLP.

Parameters:: config (ScalarsConditionerConfig) – configuration for the ScalarsConditioner. See ScalarsConditionerConfig for available options.

hidden_dim¶

num_scalars¶

condition_dim¶

embed¶

mlps¶

shared_mlp¶

forward(*args, **kwargs)¶

Embeds scalars into a single conditioning vector. Scalars can be passed as *args or as **kwargs. It is recommended to use kwargs to avoid bugs that originate from passing scalars in a different order at two locations in the code. Recommended usage: condition = conditioner(geometry_angle=75.3, friction_angle=24.6) :param *args: Scalars in tensor representation (batch_size,) or (batch_size, 1). :param **kwargs: Scalars in tensor representation (batch_size,) or (batch_size, 1).

Returns:

Conditioning vector with shape (batch_size, condition_dim)

Parameters:

args (torch.Tensor)
kwargs (torch.Tensor)

Return type:

torch.Tensor

Example: .. code-block:: python

conditioner = ScalarsConditioner(

ScalarsConditionerConfig(
hidden_dim=64, num_scalars=2, condition_dim=128, init_weights=”truncnormal002”,

)

) geometry_angle = torch.tensor([75.3, 80.1]) # shape (batch_size,) friction_angle = torch.tensor([24.6, 30.2]) # shape (batch_size,) condition = conditioner(

geometry_angle=geometry_angle, friction_angle=friction_angle

) # shape (batch_size, condition_dim)

class noether.modeling.modules.layers.ScalarsConditionerConfig(/, **data)¶

Bases: pydantic.BaseModel

Parameters:: data (Any)

hidden_dim: int = None¶: Dimension for embedding the scalars and the per-scalar MLP.

num_scalars: int = None¶: How many scalars are embedded.

condition_dim: int | None = None¶: Dimension of the final conditioning vector. Defaults to 4 * dim if condition_dim is None.

init_weights: noether.core.types.InitWeightsMode = 'truncnormal002'¶: Weight initialization for MLPs.

class noether.modeling.modules.layers.TransformerBatchNorm(num_features, eps=1e-05, elementwise_affine=True, bias=True)¶

Bases: torch.nn.Module

Wrapper around torch.nn.BatchNorm1d that considers all tokens of a single sample as the full batch. Additionally remaps affine to elementwise_affine and supports disabling bias to comply with the torch.nn.LayerNorm interface. Does not use any nn.BatchNorm1d modules to avoid errors with nn.SyncBatchnorm.

Parameters:

num_features (int)
eps (float)
elementwise_affine (bool)
bias (bool)

num_features¶

eps = 1e-05¶

elementwise_affine = True¶

forward(x)¶

BatchNorm1d where all tokens of a single sample correspond to a full batch.

Parameters:: x (torch.Tensor) – Tensor of shape (batch_size, seqlen, dim).
Returns:: Normalized x of shape (batch_size, seqlen, dim).
Return type:: torch.Tensor

class noether.modeling.modules.layers.VectorsConditioner(config)¶

Bases: torch.nn.Module

Embeds a set of named vectors into a single conditioning vector.

Each input vector named in config.conditioning_spec is encoded with a NeRF-mode ContinuousSincosEmbed followed by a per-vector MLP. The resulting per-vector embeddings are concatenated and projected to condition_dim by a shared MLP.

Note

All input vectors must be normalized to [-1, 1]. The underlying sine-cosine embedding uses NeRF-style frequencies tuned for that range; values outside it will alias and produce uninformative embeddings.

Parameters:: config (VectorsConditionerConfig) – configuration for the VectorsConditioner. See VectorsConditionerConfig for available options.

hidden_dim¶

condition_dim¶

conditioning_spec¶

embedder¶

shared_mlp¶

forward(**conditioning_inputs)¶

Embed a set of named vectors into a single conditioning vector.

All vectors declared in config.conditioning_spec must be supplied as keyword arguments matching the spec names. Inputs must be normalized to [-1, 1].

Parameters:: **conditioning_inputs (torch.Tensor) – Vectors with shape (batch_size, num_features), keyed by the names declared in config.conditioning_spec. The num_features of each vector must match the dimension declared in the spec. All inputs must share the same batch_size.
Returns:: Conditioning vector with shape (batch_size, condition_dim).
Raises:: ValueError – If the supplied inputs don’t match the spec (wrong number of vectors, missing key, wrong rank, or wrong feature dimension).
Return type:: torch.Tensor

Example

conditioner = VectorsConditioner(
    VectorsConditionerConfig(
        hidden_dim=64,
        conditioning_spec={"angle": 1, "shape_params": 3},
        condition_dim=128,
        max_frequency=1024,
    )
)
# Inputs normalized to [-1, 1].
angle = torch.tensor([[0.5], [-0.2]])  # shape (batch_size, 1)
shape_params = torch.tensor([[0.1, -0.3, 0.7], [-0.5, 0.2, -0.8]])  # shape (batch_size, 3)
condition = conditioner(angle=angle, shape_params=shape_params)
# condition.shape == (2, 128)

class noether.modeling.modules.layers.VectorsConditionerConfig(/, **data)¶

Bases: pydantic.BaseModel

Configuration for VectorsConditioner.

All conditioning inputs are expected to be normalized to [-1, 1]; the underlying sine-cosine embedding runs in NeRF mode.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

hidden_dim: int = None¶: Dimension of the per-vector embedding and per-vector MLP.

conditioning_spec: noether.data.schemas.FieldDimSpec¶: Mapping from input vector name to its feature dimension, e.g. {"angle": 1, "shape_params": 3}.

condition_dim: int | None = None¶: Dimension of the final conditioning vector. Defaults to hidden_dim if None.

max_frequency: float = None¶: Highest frequency band, in units of π, for the NeRF-mode sine-cosine embedding. Pick based on the smallest spatial scale you need to resolve in normalized coordinates (rough heuristic: 1 / typical_input_spacing).

init_weights: noether.core.types.InitWeightsMode = 'truncnormal002'¶: Weight initialization for MLPs.

class noether.modeling.modules.layers.AvgPool2DPatchify(patch_size=16)¶

Bases: torch.nn.Module

Tokenize a 2D grid by average-pooling each patch_size``×``patch_size patch.

Parameters:: patch_size (int)

patch_size = 16¶

patch¶

forward(x)¶

Pool spatial features into patches.

Parameters:: x (torch.Tensor) – Input grid with shape (B, H, W, C).
Returns:: Pooled patch grid of shape (B, H // patch_size, W // patch_size, C).
Return type:: torch.Tensor

class noether.modeling.modules.layers.ConvOutputHead(hidden_dim, out_channels, patch_size, mid_channels=64)¶

Bases: torch.nn.Module

Conv output head decodes tokens to spatial output

Parameters:

hidden_dim (int)
out_channels (int)
patch_size (int)
mid_channels (int)

patch_size¶

out_channels¶

stages¶

forward(x, grid_h, grid_w)¶

Decode tokens to spatial output via cascaded PixelShuffle stages.

Parameters:

x (torch.Tensor) – Flattened tokens of shape (B, grid_h * grid_w, hidden_dim).
grid_h (int) – Patch grid height (H // patch_size).
grid_w (int) – Patch grid width (W // patch_size).

Returns:

Spatial tensor of shape (B, H, W, out_channels) after upsampling.

Return type:

torch.Tensor

class noether.modeling.modules.layers.FinalLayer(hidden_size, patch_size, out_channels, use_modulation=True)¶

Bases: torch.nn.Module

Final unpatchify projection with optional AdaLN modulation conditioned on a global vector c.

Parameters:

hidden_size (int)
patch_size (int)
out_channels (int)
use_modulation (bool)

norm_final¶

linear¶

adaLN_modulation: torch.nn.Linear | None¶

forward(x, c=None)¶

Apply (optionally AdaLN-modulated) norm then linear projection.

Parameters:

x (torch.Tensor) – Tokens of shape (B, L, hidden_size).
c (torch.Tensor | None) – Conditioning vector of shape (B, hidden_size) when use_modulation=True; must be None when use_modulation=False. The caller is responsible for any upstream activation (e.g. SiLU) — this layer applies the AdaLN linear directly.

Returns:

Tensor of shape (B, L, patch_size**2 * out_channels).

Return type:

torch.Tensor

class noether.modeling.modules.layers.MaskPatchify(patch_size)¶

Bases: torch.nn.Module

Downsample a boolean mask to patch resolution via max-pooling (True = at least one valid cell).

Parameters:: patch_size (int)

patch_size¶

forward(mask)¶

Downsample boolean mask to patch resolution.

Parameters:: mask (torch.Tensor) – Boolean mask of shape (B, H, W).
Returns:: Flat boolean mask of shape (B, (H // patch_size) * (W // patch_size)).
Return type:: torch.Tensor