noether.data.pipeline¶

Submodules¶

Classes¶

`BatchProcessor`
`Collator`	Base object that uses torch.utils.data.default_collate in its __call__ function. Derived classes can overwrite
`MultiStagePipeline`	A Collator that processes the list of samples into a batch in multiple stages:
`PipelineConfig`	Internal base class for all registry-based configs.
`SampleProcessor`

Package Contents¶

class noether.data.pipeline.BatchProcessor¶

abstractmethod denormalize(key, value)¶

Inverts the normalization from the __call__ method of a single item in the batch. If nothing needs to be done for the denormalization, this method should simply return the passed key/value.

Parameters:

key (str) – The name of the item.
value (torch.Tensor) – The value of the item.

Returns:

The (potentially) back-mapped name and the (potentially) denormalized value.

Return type:

(key, value)

class noether.data.pipeline.Collator¶

Base object that uses torch.utils.data.default_collate in its __call__ function. Derived classes can overwrite the __call__ implementation to implement a custom collate function. The collator can be passed to torch.utils.data.DataLoader via the collate_fn argument (DataLoader(dataset, batch_size=2, collate_fn=Collator()).

Example

from noether.data.pipeline.collator import Collator

collator = Collator()
num_samples = 2
samples = [{"data": torch.randn(3, 256, 256)} for _ in range(num_samples)]
batch = collator(samples)
print(batch["data"].shape)

torch.Size([2, 3, 256, 256])

class noether.data.pipeline.MultiStagePipeline(collators=None, sample_processors=None, batch_processors=None)¶

Bases: noether.data.pipeline.collator.Collator

A Collator that processes the list of samples into a batch in multiple stages:

sample_processors: Processing the data before collation on a per-sample level.
collators: Conversion from a list of samples into a batch (dict of usually tensors).
batch_processors: Processing after collation on a batch-level.

Most of the work is usually done by the sample_processors. One or two collators, and batch processors are often not needed. However this depends on the use case. .. rubric:: Example

from noether.data.pipeline.multistage import MultiStagePipeline

pipeline = MultiStagePipeline()
samples = [{"data": torch.tensor([1.0, 2.0])}, {"data": torch.tensor([3.0, 4.0])}]
batch = pipeline(samples)
print(batch["data"].shape)

torch.Size([2, 2])

Parameters:

sample_processors (dict[str, SampleProcessorType] | list[SampleProcessorType] | None) – A list of callables that will be applied sequentially to pre-process on a per-sample level (e.g., subsample a pointcloud).
collators (dict[str, noether.data.pipeline.collator.CollatorType] | list[noether.data.pipeline.collator.CollatorType] | None) – A list of callables that will be applied sequentially to convert the list of individual samples into a batched format. If None, the default PyTorch collator will be used.
batch_processors (dict[str, BatchProcessorType] | list[BatchProcessorType] | None) – A list of callables that will be applied sequentially to process on a per-batch level.

sample_processors = []¶

batch_processors = []¶

get_sample_processor(predicate)¶

Retrieves a sample processor by a predicate function. Examples: - Search by type (assumes the sample processor type only occurs once in the list of sample processors)

pipeline.get_sample_processor(lambda p: isinstance(p, MySampleProcessorType))

Search by type and member pipeline.get_sample_processor(lambda p: isinstance(p, PointSamplingSampleProcessor) and “input_pos” in p.items)

Parameters:: predicate (collections.abc.Callable[[Any], bool]) – A function that is called for each processor and selects if this is the right one.
Returns:: The matching sample processor.
Return type:: Any
Raises:: ValueError – If no matching sample processor are found, multiple matching sample processors are found or if there are no sample processors.

class noether.data.pipeline.PipelineConfig(/, **data)¶

Bases: noether.core.schemas.lib._RegistryBase

Internal base class for all registry-based configs.

Provides auto-registration via __init_subclass__. Not meant to be used directly - use specific config base classes instead.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

kind: str¶

class noether.data.pipeline.SampleProcessor¶

abstractmethod inverse(key, value)¶

Inverts the transformation from the __call__ method of a single item in the batch. Only should be implemented if the SampleProcessor is invertable or if the identity function is valid.

Parameters:

key (str) – The name of the item.
value (torch.Tensor) – The value of the item.

Returns:

The (potentially) back-mapped name and the (potentially) denormalized value.

Return type:

(key, value)

static save_copy(obj)¶

Make a deep copy of an object to avoid modifying the original object.

Parameters:: obj (T) – Any object that should be copied.
Returns:: A deep copy of the input object.
Return type:: T