noether.modeling.models.ab_upt¶

Attributes¶

`KVPair`
`LayerCache`

Classes¶

`AnchorBranchedUPTConfig`	Configuration for the Anchored Branched UPT (AB-UPT) model.
`ModelKVCache`	Caches that let a forward call skip work it has already done.
`ReadoutLayer`	The final readout layer for AB-UPT, which applies an AdaLN-style modulation followed by a linear projection to the desired output dimension.
`AnchoredBranchedUPT`	Implementation of the Anchored Branched UPT (AB-UPT) model.

Module Contents¶

class noether.modeling.models.ab_upt.AnchorBranchedUPTConfig(/, **data)¶

Bases: noether.core.models.base.ModelBaseConfig, noether.core.schemas.mixins.InjectSharedFieldFromParentMixin

Configuration for the Anchored Branched UPT (AB-UPT) model.

AB-UPT is built from three configurable stages:

Geometry encoder (optional): a SupernodePooling encoder followed by geometry_depth standard transformer blocks. Only instantiated when at least one perceiver / perceiver_untied block is present in physics_blocks and supernode_pooling_config is provided.
Physics trunk: a stack of blocks listed in physics_blocks operating on per-domain anchor (and optionally query) tokens. The block string controls the attention pattern and weight sharing — see physics_blocks below.
Per-domain decoder (optional): num_domain_decoder_blocks[name] self-attention blocks with untied weights per domain, followed by a linear projection to that domain’s output fields.

hidden_dim is a shared field — it is auto-injected into transformer_block_config and supernode_pooling_config via InjectSharedFieldFromParentMixin, so it only needs to be set once at the top level. See Configuration Inheritance.

Configuration guide¶

See Configuring AB-UPT for a step-by-step walkthrough of how to compose physics blocks, choose between tied and _untied variants, and wire up the per-domain decoder.

Concrete examples (YAML):

Aerodynamics (multi-domain, surface + volume): recipes/aero_cfd/configs/model/ab_upt.yaml
Heat transfer (single-domain, volume only with parameter conditioning): recipes/heat_transfer/configs/model/ab_upt.yaml

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

kind: str | None = 'noether.core.schemas.models.AnchorBranchedUPTConfig'¶: Kind of model to use, i.e. class path

model_config¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

supernode_pooling_config: Annotated[noether.modeling.modules.encoders.supernode_pooling.SupernodePoolingConfig, noether.core.schemas.mixins.Shared] | None = None¶

transformer_block_config: Annotated[noether.modeling.modules.blocks.transformer.TransformerBlockConfig, noether.core.schemas.mixins.Shared]¶

geometry_depth: int = None¶: Number of transformer blocks in the geometry encoder.

hidden_dim: int = None¶: Hidden dimension of the model.

condition_dim: int | None = None¶

physics_blocks: list[Literal['self', 'shared', 'cross', 'joint', 'perceiver', 'self_untied', 'cross_untied', 'joint_untied', 'perceiver_untied']]¶

Types of physics blocks to use in the model.

self/shared: Self-attention within a branch/domain. Weights are shared between all domains. cross: Cross-attention between domains. Each domain attends to all other domains’ anchors, weights are shared. joint: Joint attention over all domain points. Full self-attention over all points, weights are shared. perceiver: Perceiver-style cross-attention to geometry encoding. self_untied: Self-attention within a branch with untied weights for each domain. cross_untied: Cross-attention between domains with untied weights for each domain. joint_untied: Joint attention over all domain points with untied weights for each domain. perceiver_untied: Perceiver cross-attention with geometry encoding and untied weights per domain.

Note: “shared” is a deprecated alias for “self” and will be removed in a future release.

num_domain_decoder_blocks: dict[str, int] = None¶

2, “volume”: 2}.

Type:: Number of final domain-specific decoder blocks with self attention and no weight sharing, e.g. {“surface”

init_weights: noether.core.types.InitWeightsMode = None¶: Weight initialization of linear layers. Defaults to “truncnormal002”.

drop_path_rate: float = None¶: Drop path rate for stochastic depth. Defaults to 0.0 (no drop path).

geometry_conditioning_dims: noether.data.schemas.FieldDimSpec | None = None¶: Per-named-field conditioning spec for geometry transformer blocks. When left unset, defaults to data_specs.conditioning_dims so the geometry branch sees the same conditioning as the rest of the model. An explicit empty FieldDimSpec (total_dim == 0) opts out — useful for diffusion, where timestep modulation should touch physics + per-domain decoders but not the geometry branch (geometry is invariant to noise level).

data_specs: noether.data.schemas.ModelDataSpecs¶: Data specifications for the model.

migrate_shared_to_self()¶

Migrate deprecated ‘shared’ block type to ‘self’.

Return type:: AnchorBranchedUPTConfig

rope_frequency_config()¶

Return type:: noether.modeling.modules.layers.rope_frequency.RopeFrequencyConfig

pos_embed_config()¶

Return type:: noether.modeling.modules.layers.continuous_sincos_embed.ContinuousSincosEmbeddingConfig

bias_mlp_config()¶

Return type:: noether.modeling.modules.mlp.MLPConfig

perceiver_block_config()¶

Return type:: noether.modeling.modules.blocks.perceiver.PerceiverBlockConfig

domain_decoder_configs()¶

Per-domain decoder projection configs, keyed by domain name.

Return type:: dict[str, noether.modeling.modules.layers.linear_projection.LinearProjectionConfig]

conditioner_config()¶

Configuration for the scalar conditioner module.

Return type:: noether.modeling.modules.layers.vectors_conditioner.VectorsConditionerConfig

set_condition_dim()¶

Set condition_dim in transformer_block_config based on data_specs.

Return type:: AnchorBranchedUPTConfig

geometry_transformer_block_config()¶

Transformer block config for geometry encoder, with condition_dim set to geometry_conditioning_dims.

Return type:: noether.modeling.modules.blocks.transformer.TransformerBlockConfig

geometry_conditioner_config()¶

Configuration for the scalar conditioner module.

Return type:: noether.modeling.modules.layers.vectors_conditioner.VectorsConditionerConfig

validate_parameters()¶

Validate validity of parameters across the model and its submodules.

Ensures that hidden_dim is consistent across parent and all submodules. Note: transformer_block_config validates hidden_dim % num_heads == 0 in its own validator.

Return type:: AnchorBranchedUPTConfig

Parameters:: data (Any)

noether.modeling.models.ab_upt.KVPair¶

noether.modeling.models.ab_upt.LayerCache¶

class noether.modeling.models.ab_upt.ModelKVCache¶

Bases: TypedDict

Caches that let a forward call skip work it has already done.

All keys are independent — pass any subset. geometry_encoding and geometry_rope travel together (both depend only on the geometry mesh) and are what diffusion sampling reuses across Euler steps.

geometry_encoding — output of AnchoredBranchedUPT.geometry_branch_forward(). When present, the geometry encoder + transformer stack are skipped.
geometry_rope — RoPE of the geometry supernode positions, used by perceiver blocks as k_freqs. Cached alongside geometry_encoding so subsequent calls don’t need geometry_position.
physics_blocks — per-block anchor self-attention K/V (None for perceiver blocks, which always re-project from geometry_encoding). When present, anchors are not re-encoded; the call is decode-only.
decoders — per-domain anchor K/V for each decoder block. Symmetric with physics_blocks for the per-domain decoder stage.

Initialize self. See help(type(self)) for accurate signature.

geometry_encoding: torch.Tensor¶

geometry_rope: torch.Tensor¶

physics_blocks: list[LayerCache | None]¶

decoders: dict[str, list[LayerCache]]¶

class noether.modeling.models.ab_upt.ReadoutLayer(decoder_config, hidden_dim, condition_dim=None)¶

Bases: torch.nn.Module

The final readout layer for AB-UPT, which applies an AdaLN-style modulation followed by a linear projection to the desired output dimension.

Parameters:

decoder_config (noether.modeling.modules.layers.linear_projection.LinearProjectionConfig)
hidden_dim (int)
condition_dim (int | None)

norm_final¶

linear¶

modulation = None¶

forward(x, condition)¶

Parameters:

x (torch.Tensor)
condition (torch.Tensor | None)

Return type:

torch.Tensor

class noether.modeling.models.ab_upt.AnchoredBranchedUPT(config)¶

Bases: torch.nn.Module

Implementation of the Anchored Branched UPT (AB-UPT) model.

This is an off-the-shelf model — it includes input embedding and output projection, so it can be used directly by providing the appropriate input tensors. See forward() for the expected inputs.

The architecture is fully driven by AnchorBranchedUPTConfig: the geometry encoder depth, the ordering and type of physics blocks, and the per-domain decoder depths are all configured there. For a walkthrough of how to assemble a config (and concrete YAML examples from the aero_cfd and heat_transfer recipes), see Configuring AB-UPT.

Parameters:: config (AnchorBranchedUPTConfig) – Configuration for the AB-UPT model. See AnchorBranchedUPTConfig for details.

data_specs¶

rope¶

pos_embed¶

domain_names: list[str]¶

domain_biases¶

hidden_dim¶

physics_blocks¶

use_geometry_branch = False¶

domain_feature_projs: torch.nn.ModuleDict | None = None¶

domain_decoder_blocks¶

domain_decoder_projections¶

geometry_branch_forward(geometry_position, geometry_supernode_idx, geometry_batch_idx, condition, geometry_attn_kwargs)¶

Forward pass through the geometry branch of the model.

Parameters:

geometry_position (torch.Tensor)
geometry_supernode_idx (torch.Tensor)
geometry_batch_idx (torch.Tensor)
condition (torch.Tensor | None)
geometry_attn_kwargs (dict[str, torch.Tensor])

Return type:

torch.Tensor

build_physics_input(domain_anchor_positions=None, domain_query_positions=None, domain_anchor_features=None, domain_query_features=None)¶

Build the physics-block input tensor and combined per-domain positions.

Each per-domain segment is [anchors | queries] with positional biases plus projected features (when data_specs.domains[name].feature_dim was set on the config). Domains are concatenated in self.domain_names order.

Returns:

Tuple of (x_physics, physics_positions). x_physics has shape (B, total_tokens, hidden_dim). physics_positions maps each domain name to its concatenated [anchors | queries] positions and can be passed directly to create_rope_frequencies().

Parameters:

domain_anchor_positions (dict[str, torch.Tensor] | None)
domain_query_positions (dict[str, torch.Tensor] | None)
domain_anchor_features (dict[str, torch.Tensor] | None)
domain_query_features (dict[str, torch.Tensor] | None)

Return type:

tuple[torch.Tensor, dict[str, torch.Tensor]]

physics_blocks_forward(x_physics, geometry_encoding, physics_token_specs, physics_attn_kwargs, physics_perceiver_attn_kwargs, condition, physics_blocks_cache=None)¶

Run the physics-block stack on a pre-built input tensor.

Perceiver blocks always re-project K/V from geometry_encoding and contribute None to the returned cache; only transformer blocks cache their anchor self-attention K/V.

Parameters:

x_physics (torch.Tensor)
geometry_encoding (torch.Tensor | None)
physics_token_specs (list[noether.core.schemas.modules.attention.TokenSpec])
physics_attn_kwargs (dict[str, Any])
physics_perceiver_attn_kwargs (dict[str, Any])
condition (torch.Tensor | None)
physics_blocks_cache (list[LayerCache | None] | None)

Return type:

tuple[torch.Tensor, list[LayerCache | None]]

decoder_blocks_forward(x_physics, physics_token_specs, per_domain_token_specs, decoder_attn_kwargs, condition, decoders_cache=None)¶

Forward pass through the per-domain decoder blocks.

Returns:

Tuple of (domain_predictions, new_domain_caches).

Parameters:

x_physics (torch.Tensor)
physics_token_specs (list[noether.core.schemas.modules.attention.TokenSpec])
per_domain_token_specs (dict[str, list[noether.core.schemas.modules.attention.TokenSpec]])
decoder_attn_kwargs (dict[str, dict[str, Any]])
condition (torch.Tensor | None)
decoders_cache (dict[str, list[LayerCache]] | None)

Return type:

tuple[dict[str, torch.Tensor], dict[str, list[LayerCache]]]

create_rope_frequencies(physics_positions, geometry_position=None, geometry_supernode_idx=None, geometry_rope=None)¶

Create RoPE frequencies for all relevant positions.

Parameters:

physics_positions (dict[str, torch.Tensor]) – Per-domain combined [anchors | queries] positions, as returned by build_physics_input().
geometry_position (torch.Tensor | None) – Geometry mesh coordinates (optional).
geometry_supernode_idx (torch.Tensor | None) – Geometry supernode indices (optional).
geometry_rope (torch.Tensor | None) – Precomputed geometry-supernode RoPE. When provided, bypasses geometry_position / geometry_supernode_idx for the perceiver k_freqs (needed in queries-only mode where geometry inputs aren’t available).

Returns:

Tuple of (geometry_attn_kwargs, decoder_attn_kwargs, physics_perceiver_attn_kwargs, physics_attn_kwargs, geometry_rope). geometry_rope is the rope tensor used / computed (or None when there’s no geometry branch).

Return type:

tuple[dict[str, Any], dict[str, dict[str, Any]], dict[str, Any], dict[str, Any], torch.Tensor | None]

forward(geometry_position=None, geometry_supernode_idx=None, geometry_batch_idx=None, domain_anchor_positions=None, domain_query_positions=None, domain_anchor_features=None, domain_query_features=None, conditioning_inputs=None, geometry_conditioning_inputs=None, kv_cache=None)¶

Forward pass of the AB-UPT model.

Example:

model(
    geometry_position=...,
    geometry_supernode_idx=...,
    geometry_batch_idx=...,
    domain_anchor_positions={"surface": surface_pos, "volume": volume_pos},
    domain_query_positions={"surface": query_pos},
    domain_anchor_features={"surface": surface_features, "volume": volume_features},
    domain_query_features={"surface": query_features},
    conditioning_inputs={"geometry_design_parameters": design_params},
)

Parameters:

geometry_position (torch.Tensor | None) – Coordinates of the geometry mesh. Tensor of shape (B * N_geometry, D_pos).
geometry_supernode_idx (torch.Tensor | None) – Supernode indices for the geometry points.
geometry_batch_idx (torch.Tensor | None) – Batch indices for the geometry points.
domain_anchor_positions (dict[str, torch.Tensor] | None) – Per-domain anchor positions, e.g. {"surface": (B, N, D), "volume": (B, M, D)}.
domain_query_positions (dict[str, torch.Tensor] | None) – Per-domain query positions (optional).
domain_anchor_features (dict[str, torch.Tensor] | None) – Per-domain anchor input features (optional), matching the shape of domain_anchor_positions.
domain_query_features (dict[str, torch.Tensor] | None) – Per-domain query input features (optional), matching the shape of domain_query_positions.
conditioning_inputs (dict[str, torch.Tensor] | None) – Conditioning tensors for physics + decoder blocks, e.g. {"geometry_design_parameters": (B, D)}.
geometry_conditioning_inputs (dict[str, torch.Tensor] | None) – Conditioning tensors for the geometry branch. When None and conditioning_inputs is set, the geometry branch automatically reuses conditioning_inputs if the configured geometry_conditioning_dims matches data_specs.conditioning_dims (the common case). Pass an explicit dict to feed a different conditioning to geometry, or leave it None when the geometry branch is unconditioned.
kv_cache (ModelKVCache | None) – KV cache from a previous forward call.

Returns:

Tuple of (predictions, kv_cache).

Return type:

tuple[dict[str, torch.Tensor], ModelKVCache]