noether.modeling.models.transformer

Classes

TransformerConfig

Configuration for a Transformer model.

Transformer

Implementation of a Transformer model.

Module Contents

class noether.modeling.models.transformer.TransformerConfig(/, **data)

Bases: noether.core.models.base.ModelBaseConfig, noether.core.schemas.mixins.InjectSharedFieldFromParentMixin

Configuration for a Transformer model.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

hidden_dim: int = None

Hidden dimension of the model. Used for all transformer blocks.

depth: int = None

Number of transformer blocks in the model.

transformer_block_config: Annotated[noether.modeling.modules.blocks.transformer.TransformerBlockConfig, noether.core.schemas.mixins.Shared]
class noether.modeling.models.transformer.Transformer(config)

Bases: torch.nn.Module

Implementation of a Transformer model.

Parameters:

config (TransformerConfig) – Configuration of the Transformer model.

blocks
forward(x, attn_kwargs, condition=None)

Forward pass of the Transformer model.

Parameters:
  • x (torch.Tensor) – Input tensor of shape (batch_size, seq_len, hidden_dim).

  • attn_kwargs (dict[str, torch.Tensor]) – Additional arguments for the attention mechanism.

  • condition (torch.Tensor | None) – Optional conditioning vector of shape (batch_size, condition_dim) consumed by each block’s AdaLN-Zero modulation. None (default) for unconditioned models.

Returns:

Output tensor after processing through the Transformer model.

Return type:

torch.Tensor