noether.modeling.models

Submodules

Classes

AnchorBranchedUPTConfig

Configuration for the Anchored Branched UPT (AB-UPT) model.

AnchoredBranchedUPT

Implementation of the Anchored Branched UPT (AB-UPT) model.

AeroABUPT

Aerodynamic Anchored-Branched UPT wrapper.

AeroTransformer

Aerodynamic Transformer wrapper.

AeroTransformerConfig

Transformer config extended with aerodynamic data specifications.

AeroTransolver

Aerodynamic Transolver wrapper.

AeroTransolverConfig

Transolver config extended with aerodynamic data specifications.

AeroUPT

Aerodynamic UPT wrapper.

Transformer

Implementation of a Transformer model.

TransformerConfig

Configuration for a Transformer model.

Transolver

Implementation of the Transolver model.

TransolverConfig

Configuration for a Transolver model.

TransolverPlusPlusConfig

Configuration for a Transolver++ model.

UPT

Implementation of the UPT (Universal Physics Transformer) model.

UPTConfig

Configuration for a UPT model.

ViT

Vision Transformer for spatial regression on continuous-coordinate grids.

ViTConfig

Configuration for ViT model

Package Contents

class noether.modeling.models.AnchorBranchedUPTConfig(/, **data)

Bases: noether.core.models.base.ModelBaseConfig, noether.core.schemas.mixins.InjectSharedFieldFromParentMixin

Configuration for the Anchored Branched UPT (AB-UPT) model.

AB-UPT is built from three configurable stages:

  1. Geometry encoder (optional): a SupernodePooling encoder followed by geometry_depth standard transformer blocks. Only instantiated when at least one perceiver / perceiver_untied block is present in physics_blocks and supernode_pooling_config is provided.

  2. Physics trunk: a stack of blocks listed in physics_blocks operating on per-domain anchor (and optionally query) tokens. The block string controls the attention pattern and weight sharing — see physics_blocks below.

  3. Per-domain decoder (optional): num_domain_decoder_blocks[name] self-attention blocks with untied weights per domain, followed by a linear projection to that domain’s output fields.

hidden_dim is a shared field — it is auto-injected into transformer_block_config and supernode_pooling_config via InjectSharedFieldFromParentMixin, so it only needs to be set once at the top level. See Configuration Inheritance.

Configuration guide

See Configuring AB-UPT for a step-by-step walkthrough of how to compose physics blocks, choose between tied and _untied variants, and wire up the per-domain decoder.

Concrete examples (YAML):

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

kind: str | None = 'noether.core.schemas.models.AnchorBranchedUPTConfig'

Kind of model to use, i.e. class path

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

supernode_pooling_config: Annotated[noether.modeling.modules.encoders.supernode_pooling.SupernodePoolingConfig, noether.core.schemas.mixins.Shared] | None = None
transformer_block_config: Annotated[noether.modeling.modules.blocks.transformer.TransformerBlockConfig, noether.core.schemas.mixins.Shared]
geometry_depth: int = None

Number of transformer blocks in the geometry encoder.

hidden_dim: int = None

Hidden dimension of the model.

condition_dim: int | None = None
physics_blocks: list[Literal['self', 'shared', 'cross', 'joint', 'perceiver', 'self_untied', 'cross_untied', 'joint_untied', 'perceiver_untied']]

Types of physics blocks to use in the model.

self/shared: Self-attention within a branch/domain. Weights are shared between all domains. cross: Cross-attention between domains. Each domain attends to all other domains’ anchors, weights are shared. joint: Joint attention over all domain points. Full self-attention over all points, weights are shared. perceiver: Perceiver-style cross-attention to geometry encoding. self_untied: Self-attention within a branch with untied weights for each domain. cross_untied: Cross-attention between domains with untied weights for each domain. joint_untied: Joint attention over all domain points with untied weights for each domain. perceiver_untied: Perceiver cross-attention with geometry encoding and untied weights per domain.

Note: “shared” is a deprecated alias for “self” and will be removed in a future release.

num_domain_decoder_blocks: dict[str, int] = None

2, “volume”: 2}.

Type:

Number of final domain-specific decoder blocks with self attention and no weight sharing, e.g. {“surface”

init_weights: noether.core.types.InitWeightsMode = None

Weight initialization of linear layers. Defaults to “truncnormal002”.

drop_path_rate: float = None

Drop path rate for stochastic depth. Defaults to 0.0 (no drop path).

geometry_conditioning_dims: noether.data.schemas.FieldDimSpec | None = None

Per-named-field conditioning spec for geometry transformer blocks. When left unset, defaults to data_specs.conditioning_dims so the geometry branch sees the same conditioning as the rest of the model. An explicit empty FieldDimSpec (total_dim == 0) opts out — useful for diffusion, where timestep modulation should touch physics + per-domain decoders but not the geometry branch (geometry is invariant to noise level).

data_specs: noether.data.schemas.ModelDataSpecs

Data specifications for the model.

migrate_shared_to_self()

Migrate deprecated ‘shared’ block type to ‘self’.

Return type:

AnchorBranchedUPTConfig

rope_frequency_config()
Return type:

noether.modeling.modules.layers.rope_frequency.RopeFrequencyConfig

pos_embed_config()
Return type:

noether.modeling.modules.layers.continuous_sincos_embed.ContinuousSincosEmbeddingConfig

bias_mlp_config()
Return type:

noether.modeling.modules.mlp.MLPConfig

perceiver_block_config()
Return type:

noether.modeling.modules.blocks.perceiver.PerceiverBlockConfig

domain_decoder_configs()

Per-domain decoder projection configs, keyed by domain name.

Return type:

dict[str, noether.modeling.modules.layers.linear_projection.LinearProjectionConfig]

conditioner_config()

Configuration for the scalar conditioner module.

Return type:

noether.modeling.modules.layers.vectors_conditioner.VectorsConditionerConfig

set_condition_dim()

Set condition_dim in transformer_block_config based on data_specs.

Return type:

AnchorBranchedUPTConfig

geometry_transformer_block_config()

Transformer block config for geometry encoder, with condition_dim set to geometry_conditioning_dims.

Return type:

noether.modeling.modules.blocks.transformer.TransformerBlockConfig

geometry_conditioner_config()

Configuration for the scalar conditioner module.

Return type:

noether.modeling.modules.layers.vectors_conditioner.VectorsConditionerConfig

validate_parameters()

Validate validity of parameters across the model and its submodules.

Ensures that hidden_dim is consistent across parent and all submodules. Note: transformer_block_config validates hidden_dim % num_heads == 0 in its own validator.

Return type:

AnchorBranchedUPTConfig

Parameters:

data (Any)

class noether.modeling.models.AnchoredBranchedUPT(config)

Bases: torch.nn.Module

Implementation of the Anchored Branched UPT (AB-UPT) model.

This is an off-the-shelf model — it includes input embedding and output projection, so it can be used directly by providing the appropriate input tensors. See forward() for the expected inputs.

The architecture is fully driven by AnchorBranchedUPTConfig: the geometry encoder depth, the ordering and type of physics blocks, and the per-domain decoder depths are all configured there. For a walkthrough of how to assemble a config (and concrete YAML examples from the aero_cfd and heat_transfer recipes), see Configuring AB-UPT.

Parameters:

config (AnchorBranchedUPTConfig) – Configuration for the AB-UPT model. See AnchorBranchedUPTConfig for details.

data_specs
rope
pos_embed
domain_names: list[str]
domain_biases
hidden_dim
physics_blocks
use_geometry_branch = False
domain_feature_projs: torch.nn.ModuleDict | None = None
domain_decoder_blocks
domain_decoder_projections
geometry_branch_forward(geometry_position, geometry_supernode_idx, geometry_batch_idx, condition, geometry_attn_kwargs)

Forward pass through the geometry branch of the model.

Parameters:
Return type:

torch.Tensor

build_physics_input(domain_anchor_positions=None, domain_query_positions=None, domain_anchor_features=None, domain_query_features=None)

Build the physics-block input tensor and combined per-domain positions.

Each per-domain segment is [anchors | queries] with positional biases plus projected features (when data_specs.domains[name].feature_dim was set on the config). Domains are concatenated in self.domain_names order.

Returns:

Tuple of (x_physics, physics_positions). x_physics has shape (B, total_tokens, hidden_dim). physics_positions maps each domain name to its concatenated [anchors | queries] positions and can be passed directly to create_rope_frequencies().

Parameters:
Return type:

tuple[torch.Tensor, dict[str, torch.Tensor]]

physics_blocks_forward(x_physics, geometry_encoding, physics_token_specs, physics_attn_kwargs, physics_perceiver_attn_kwargs, condition, physics_blocks_cache=None)

Run the physics-block stack on a pre-built input tensor.

Perceiver blocks always re-project K/V from geometry_encoding and contribute None to the returned cache; only transformer blocks cache their anchor self-attention K/V.

Parameters:
Return type:

tuple[torch.Tensor, list[LayerCache | None]]

decoder_blocks_forward(x_physics, physics_token_specs, per_domain_token_specs, decoder_attn_kwargs, condition, decoders_cache=None)

Forward pass through the per-domain decoder blocks.

Returns:

Tuple of (domain_predictions, new_domain_caches).

Parameters:
Return type:

tuple[dict[str, torch.Tensor], dict[str, list[LayerCache]]]

create_rope_frequencies(physics_positions, geometry_position=None, geometry_supernode_idx=None, geometry_rope=None)

Create RoPE frequencies for all relevant positions.

Parameters:
  • physics_positions (dict[str, torch.Tensor]) – Per-domain combined [anchors | queries] positions, as returned by build_physics_input().

  • geometry_position (torch.Tensor | None) – Geometry mesh coordinates (optional).

  • geometry_supernode_idx (torch.Tensor | None) – Geometry supernode indices (optional).

  • geometry_rope (torch.Tensor | None) – Precomputed geometry-supernode RoPE. When provided, bypasses geometry_position / geometry_supernode_idx for the perceiver k_freqs (needed in queries-only mode where geometry inputs aren’t available).

Returns:

Tuple of (geometry_attn_kwargs, decoder_attn_kwargs, physics_perceiver_attn_kwargs, physics_attn_kwargs, geometry_rope). geometry_rope is the rope tensor used / computed (or None when there’s no geometry branch).

Return type:

tuple[dict[str, Any], dict[str, dict[str, Any]], dict[str, Any], dict[str, Any], torch.Tensor | None]

forward(geometry_position=None, geometry_supernode_idx=None, geometry_batch_idx=None, domain_anchor_positions=None, domain_query_positions=None, domain_anchor_features=None, domain_query_features=None, conditioning_inputs=None, geometry_conditioning_inputs=None, kv_cache=None)

Forward pass of the AB-UPT model.

Example:

model(
    geometry_position=...,
    geometry_supernode_idx=...,
    geometry_batch_idx=...,
    domain_anchor_positions={"surface": surface_pos, "volume": volume_pos},
    domain_query_positions={"surface": query_pos},
    domain_anchor_features={"surface": surface_features, "volume": volume_features},
    domain_query_features={"surface": query_features},
    conditioning_inputs={"geometry_design_parameters": design_params},
)
Parameters:
  • geometry_position (torch.Tensor | None) – Coordinates of the geometry mesh. Tensor of shape (B * N_geometry, D_pos).

  • geometry_supernode_idx (torch.Tensor | None) – Supernode indices for the geometry points.

  • geometry_batch_idx (torch.Tensor | None) – Batch indices for the geometry points.

  • domain_anchor_positions (dict[str, torch.Tensor] | None) – Per-domain anchor positions, e.g. {"surface": (B, N, D), "volume": (B, M, D)}.

  • domain_query_positions (dict[str, torch.Tensor] | None) – Per-domain query positions (optional).

  • domain_anchor_features (dict[str, torch.Tensor] | None) – Per-domain anchor input features (optional), matching the shape of domain_anchor_positions.

  • domain_query_features (dict[str, torch.Tensor] | None) – Per-domain query input features (optional), matching the shape of domain_query_positions.

  • conditioning_inputs (dict[str, torch.Tensor] | None) – Conditioning tensors for physics + decoder blocks, e.g. {"geometry_design_parameters": (B, D)}.

  • geometry_conditioning_inputs (dict[str, torch.Tensor] | None) – Conditioning tensors for the geometry branch. When None and conditioning_inputs is set, the geometry branch automatically reuses conditioning_inputs if the configured geometry_conditioning_dims matches data_specs.conditioning_dims (the common case). Pass an explicit dict to feed a different conditioning to geometry, or leave it None when the geometry branch is unconditioned.

  • kv_cache (ModelKVCache | None) – KV cache from a previous forward call.

Returns:

Tuple of (predictions, kv_cache).

Return type:

tuple[dict[str, torch.Tensor], ModelKVCache]

class noether.modeling.models.AeroABUPT(model_config, **kwargs)

Bases: noether.core.models.model.Model

Aerodynamic Anchored-Branched UPT wrapper.

Bridges the factory’s (config, **kwargs) instantiation pattern to the core model. Converts flat kwargs (surface_anchor_position, volume_anchor_position, …) into the domain-dict format expected by AnchoredBranchedUPT.

Base class for single models, i.e. one model with one optimizer as opposed to CompositeModel.

Parameters:
  • model_config (noether.modeling.models.ab_upt.AnchorBranchedUPTConfig) – Model configuration. See ModelBaseConfig for available options.

  • update_counter – The UpdateCounter provided to the optimizer.

  • is_frozen – If true, will set requires_grad of all parameters to false. Will also put the model into eval mode (e.g., to put Dropout or BatchNorm into eval mode).

  • path_providerPathProvider used by the initializer to store or retrieve checkpoints.

  • data_containerDataContainer which includes the data and dataloader. This is currently unused but helpful for quick prototyping only, evaluating forward in debug mode, etc.

backbone
forward(**kwargs)
Return type:

dict[str, torch.Tensor]

class noether.modeling.models.AeroTransformer(model_config, **kwargs)

Bases: noether.core.models.model.Model

Aerodynamic Transformer wrapper.

End-to-end forward for aero CFD: positional encoding, optional RoPE, optional physics features, surface/volume bias, Transformer backbone, output projection, and output gathering.

Base class for single models, i.e. one model with one optimizer as opposed to CompositeModel.

Parameters:
  • model_config (AeroTransformerConfig) – Model configuration. See ModelBaseConfig for available options.

  • update_counter – The UpdateCounter provided to the optimizer.

  • is_frozen – If true, will set requires_grad of all parameters to false. Will also put the model into eval mode (e.g., to put Dropout or BatchNorm into eval mode).

  • path_providerPathProvider used by the initializer to store or retrieve checkpoints.

  • data_containerDataContainer which includes the data and dataloader. This is currently unused but helpful for quick prototyping only, evaluating forward in debug mode, etc.

data_specs
use_rope
pos_embed
surface_bias
volume_bias
use_physics_features
backbone
norm
out
forward(surface_position, volume_position, surface_features=None, volume_features=None)
Parameters:
Return type:

dict[str, torch.Tensor]

class noether.modeling.models.AeroTransformerConfig(/, **data)

Bases: noether.modeling.models.transformer.TransformerConfig

Transformer config extended with aerodynamic data specifications.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

data_specs: noether.data.schemas.ModelDataSpecs
class noether.modeling.models.AeroTransolver(model_config, **kwargs)

Bases: noether.core.models.model.Model

Aerodynamic Transolver wrapper.

Like AeroTransformer but adds the Transolver-specific learnable placeholder parameter.

Base class for single models, i.e. one model with one optimizer as opposed to CompositeModel.

Parameters:
  • model_config (AeroTransolverConfig) – Model configuration. See ModelBaseConfig for available options.

  • update_counter – The UpdateCounter provided to the optimizer.

  • is_frozen – If true, will set requires_grad of all parameters to false. Will also put the model into eval mode (e.g., to put Dropout or BatchNorm into eval mode).

  • path_providerPathProvider used by the initializer to store or retrieve checkpoints.

  • data_containerDataContainer which includes the data and dataloader. This is currently unused but helpful for quick prototyping only, evaluating forward in debug mode, etc.

data_specs
pos_embed
surface_bias
volume_bias
use_physics_features
placeholder
backbone
norm
out
forward(surface_position, volume_position, surface_features=None, volume_features=None)
Parameters:
Return type:

dict[str, torch.Tensor]

class noether.modeling.models.AeroTransolverConfig(/, **data)

Bases: noether.modeling.models.transolver.TransolverConfig

Transolver config extended with aerodynamic data specifications.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

data_specs: noether.data.schemas.ModelDataSpecs
class noether.modeling.models.AeroUPT(model_config, **kwargs)

Bases: noether.core.models.model.Model

Aerodynamic UPT wrapper.

Combines separate surface/volume query positions into the single query_position that the core UPT expects, and splits outputs using ModelDataSpecs. Supports optional surface/volume bias layers and physics feature projection on queries.

Base class for single models, i.e. one model with one optimizer as opposed to CompositeModel.

Parameters:
  • model_config (noether.modeling.models.upt.UPTConfig) – Model configuration. See ModelBaseConfig for available options.

  • update_counter – The UpdateCounter provided to the optimizer.

  • is_frozen – If true, will set requires_grad of all parameters to false. Will also put the model into eval mode (e.g., to put Dropout or BatchNorm into eval mode).

  • path_providerPathProvider used by the initializer to store or retrieve checkpoints.

  • data_containerDataContainer which includes the data and dataloader. This is currently unused but helpful for quick prototyping only, evaluating forward in debug mode, etc.

backbone
data_specs
use_bias_layers
use_physics_features
forward(surface_position_batch_idx, surface_position_supernode_idx, surface_position, surface_query_position, volume_query_position, surface_query_features=None, volume_query_features=None)
Parameters:
Return type:

dict[str, torch.Tensor]

class noether.modeling.models.Transformer(config)

Bases: torch.nn.Module

Implementation of a Transformer model.

Parameters:

config (TransformerConfig) – Configuration of the Transformer model.

blocks
forward(x, attn_kwargs, condition=None)

Forward pass of the Transformer model.

Parameters:
  • x (torch.Tensor) – Input tensor of shape (batch_size, seq_len, hidden_dim).

  • attn_kwargs (dict[str, torch.Tensor]) – Additional arguments for the attention mechanism.

  • condition (torch.Tensor | None) – Optional conditioning vector of shape (batch_size, condition_dim) consumed by each block’s AdaLN-Zero modulation. None (default) for unconditioned models.

Returns:

Output tensor after processing through the Transformer model.

Return type:

torch.Tensor

class noether.modeling.models.TransformerConfig(/, **data)

Bases: noether.core.models.base.ModelBaseConfig, noether.core.schemas.mixins.InjectSharedFieldFromParentMixin

Configuration for a Transformer model.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

hidden_dim: int = None

Hidden dimension of the model. Used for all transformer blocks.

depth: int = None

Number of transformer blocks in the model.

transformer_block_config: Annotated[noether.modeling.modules.blocks.transformer.TransformerBlockConfig, noether.core.schemas.mixins.Shared]
class noether.modeling.models.Transolver(config)

Bases: noether.modeling.models.transformer.Transformer

Implementation of the Transolver model. Reference code: https://github.com/thuml/Transolver/ Paper: https://arxiv.org/abs/2402.02366 Transolver is a Transformer with a special physics attention mechanism. Hence, we extend the Transformer class, and configure it accordingly.

Parameters:

config (TransolverConfig) – Configuration of the Transolver model.

class noether.modeling.models.TransolverConfig(/, **data)

Bases: noether.modeling.models.transformer.TransformerConfig, noether.core.models.base.ModelBaseConfig

Configuration for a Transolver model.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

attention_arguments: dict
set_attention_constructor()

Set attention_constructor in transformer_block_config based on data_specs.

Return type:

TransolverConfig

class noether.modeling.models.TransolverPlusPlusConfig(/, **data)

Bases: TransolverConfig

Configuration for a Transolver++ model.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

set_attention_constructor()

Set attention_constructor in transformer_block_config based on data_specs.

Return type:

TransolverPlusPlusConfig

class noether.modeling.models.UPT(config)

Bases: torch.nn.Module

Implementation of the UPT (Universal Physics Transformer) model.

Parameters:

config (UPTConfig) – Configuration for the UPT model. See UPTConfig for details.

use_rope
encoder
pos_embed
approximator_blocks
decoder
norm
prediction_layer
compute_rope_args(geometry_batch_idx, geometry_position, geometry_supernode_idx, query_position)

Compute the RoPE frequency arguments for the geometry and query positions. If RoPE is not used, return empty dicts.

Parameters:
Return type:

tuple[dict[str, torch.Tensor], dict[str, torch.Tensor]]

forward(geometry_batch_idx, geometry_supernode_idx, geometry_position, query_position)

Forward pass of the UPT model.

Parameters:
  • geometry_batch_idx (torch.Tensor) – Batch indices for the geometry positions.

  • geometry_supernode_idx (torch.Tensor) – Supernode indices for the geometry positions.

  • geometry_position (torch.Tensor) – Input coordinates of the geometry mesh points.

  • query_position (torch.Tensor) – Input coordinates of the query points.

Returns:

Output tensor containing the predictions at query positions.

Return type:

torch.Tensor

class noether.modeling.models.UPTConfig(/, **data)

Bases: noether.core.models.base.ModelBaseConfig, noether.core.schemas.mixins.InjectSharedFieldFromParentMixin

Configuration for a UPT model.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

num_heads: int = None

Number of attention heads in the model.

hidden_dim: int = None

Hidden dimension of the model.

mlp_expansion_factor: int = None

Expansion factor for the MLP of the FF layers.

approximator_depth: int = None

Number of approximator layers.

use_rope: bool = None
bias: bool = None

Whether to use bias terms in the model’s linear layers.

supernode_pooling_config: Annotated[noether.modeling.modules.encoders.supernode_pooling.SupernodePoolingConfig, noether.core.schemas.mixins.Shared]
approximator_config: Annotated[noether.modeling.modules.blocks.transformer.TransformerBlockConfig, noether.core.schemas.mixins.Shared]
decoder_config: Annotated[noether.modeling.modules.decoders.deep_perceiver.DeepPerceiverDecoderConfig, noether.core.schemas.mixins.Shared]
bias_layers: bool = None
data_specs: noether.data.schemas.ModelDataSpecs
linear_output_projection_config()
Return type:

noether.modeling.modules.layers.linear_projection.LinearProjectionConfig

rope_frequency_config()
Return type:

noether.modeling.modules.layers.rope_frequency.RopeFrequencyConfig

validate_rope_usage()

Ensure that if use_rope is True in the main config, it is also True in the approximator_config.

Return type:

UPTConfig

pos_embedding_config()
Return type:

noether.modeling.modules.layers.continuous_sincos_embed.ContinuousSincosEmbeddingConfig

validate_parameters()

Validate validity of parameters across the model and its submodules.

Ensures that: 1. hidden_dim is divisible by num_heads in parent and all submodules with num_heads 2. hidden_dim is consistent across parent and all submodules

Return type:

UPTConfig

class noether.modeling.models.ViT(config)

Bases: torch.nn.Module

Vision Transformer for spatial regression on continuous-coordinate grids.

Based on the ViT paper (https://arxiv.org/pdf/2010.11929) with several modifications, such as:

  • Continuous coordinate inputs with sincos positional embedding and RoPE (vs. learned 1D position embeddings).

  • Optional AdaLN-Zero conditioning, à la DiT (https://arxiv.org/abs/2212.09748).

  • RMSNorm and QK-norm in attention (vs. LayerNorm only).

Parameters:

config (ViTConfig) – Configuration for the ViT model. See ViTConfig for available options.

coord_dim
out_channels
patch_size
hidden_dim
num_heads
token_dropout
use_conditioning
pool_patch
mask_patchify
pos_embedding
rope
backbone
use_conv_output_head
initialize_weights()

Initialize backbone weights

Return type:

None

unpatchify(x, grid_h, grid_w)

Linear unpatchify: (B, L, p²·C_out) (B, H, W, C_out).

Parameters:
Return type:

torch.Tensor

forward(x, coords, mask=None, cond=None, return_tokens=False)

Run the standard ViT.

Parameters:
  • x (torch.Tensor | None) – Optional pre-computed patch embeddings of shape (B, L, hidden_dim). When None, tokens come purely from positional encoding.

  • coords (torch.Tensor) – Per-cell coordinates of shape (B, H, W, coord_dim).

  • mask (torch.Tensor | None) – Optional per-cell fluid mask of shape (B, H, W).

  • cond (torch.Tensor | None) – AdaLN conditioning vector of shape (B, hidden_dim). Required when the ViT was built with use_conditioning=True (the default); must be None otherwise.

  • return_tokens (bool) – If True, return raw post-FinalLayer tokens plus (grid_h, grid_w) instead of the decoded spatial output.

Returns:

Either (B, H, W, out_channels) or (tokens, (grid_h, grid_w)) if return_tokens.

Return type:

torch.Tensor | tuple[torch.Tensor, tuple[int, int]]

class noether.modeling.models.ViTConfig(/, **data)

Bases: noether.core.models.base.ModelBaseConfig

Configuration for ViT model

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

coord_dim: int = None

Coordinate dimensionality of the input grid (2 for 2D, 3 for 3D).

out_channels: int = None

Number of output channels emitted per spatial cell.

patch_size: int = None

Patch side length in cells. The grid resolution must be divisible by this value.

hidden_dim: int = None

Token hidden dimension throughout the transformer stack.

num_heads: int = None

Number of attention heads in each transformer block.

depth: int = None

Number of stacked transformer blocks.

mlp_ratio: int = None

FFN expansion factor inside each transformer block.

use_conditioning: bool = True

If True, enable AdaLN-Zero conditioning (forward requires cond); if False, plain ViT (cond must be None).

token_dropout: float = None

Per-patch token dropout probability used during training.

attn_drop: float = None

Dropout probability inside attention.

use_conv_output_head: bool = True

If True, decode via a cascaded PixelShuffle conv head; if False, decode via a linear unpatchify.

property transformer_block_config: noether.modeling.modules.blocks.transformer.TransformerBlockConfig
Return type:

noether.modeling.modules.blocks.transformer.TransformerBlockConfig