noether.data.pipeline

Submodules

Classes

BatchProcessor

Collator

Base object that uses torch.utils.data.default_collate in its __call__ function. Derived classes can overwrite

MultiStagePipeline

A Collator that processes the list of samples into a batch in multiple stages:

SampleProcessor

Package Contents

class noether.data.pipeline.BatchProcessor
abstractmethod denormalize(key, value)

Inverts the normalization from the __call__ method of a single item in the batch. If nothing needs to be done for the denormalization, this method should simply return the passed key/value.

Parameters:
  • key (str) – The name of the item.

  • value (torch.Tensor) – The value of the item.

Returns:

The (potentially) back-mapped name and the (potentially) denormalized value.

Return type:

(key, value)

class noether.data.pipeline.Collator

Base object that uses torch.utils.data.default_collate in its __call__ function. Derived classes can overwrite the __call__ implementation to implement a custom collate function. The collator can be passed to torch.utils.data.DataLoader via the collate_fn argument (DataLoader(dataset, batch_size=2, collate_fn=Collator()).

Example

>>> collator = Collator()
>>> num_samples = 2
>>> samples = [{"data": torch.randn(3, 256, 256)} for _ in range(num_samples)]
>>> batch = collator(samples)
>>> batch["data"].shape  # torch.Size([2, 3, 256, 256])
class noether.data.pipeline.MultiStagePipeline(collators=None, sample_processors=None, batch_processors=None)

Bases: noether.data.pipeline.collator.Collator

A Collator that processes the list of samples into a batch in multiple stages:
  • sample_processors: Processing the data before collation on a per-sample level.

  • collators: Conversion from a list of samples into a batch (dict of usually tensors).

  • batch_processors: Processing after collation on a batch-level.

Most of the work is usually done by the sample_processors. One or two collators, and batch processors are often not needed. However this depends on the use case. .. rubric:: Example

>>> sample_processors = [MySampleProcessor1(), MySampleProcessor2()]
>>> collators = [MyCollator1(), MyCollator2()]
>>> batch_processors = [MyBatchProcessor1(), MyBatchProcessor2()]
>>> multistage_pipeline = MultiStagePipeline(
>>>     sample_processors=sample_processors,
>>>     collators=collators,
>>>     batch_processors=batch_processors
>>> )
>>> batch = multistage_pipeline(samples)
Parameters:
  • sample_processors (dict[str, SampleProcessorType] | list[SampleProcessorType] | None) – A list of callables that will be applied sequentially to pre-process on a per-sample level (e.g., subsample a pointcloud).

  • collators (dict[str, noether.data.pipeline.collator.CollatorType] | list[noether.data.pipeline.collator.CollatorType] | None) – A list of callables that will be applied sequentially to convert the list of individual samples into a batched format. If None, the default PyTorch collator will be used.

  • batch_processors (dict[str, BatchProcessorType] | list[BatchProcessorType] | None) – A list of callables that will be applied sequentially to process on a per-batch level.

sample_processors = []
batch_processors = []
get_sample_processor(predicate)

Retrieves a sample processor by a predicate function. Examples: - Search by type (assumes the sample processor type only occurs once in the list of sample processors)

pipeline.get_sample_processor(lambda p: isinstance(p, MySampleProcessorType))

  • Search by type and member pipeline.get_sample_processor(lambda p: isinstance(p, PointSamplingSampleProcessor) and “input_pos” in p.items)

Parameters:

predicate (collections.abc.Callable[[Any], bool]) – A function that is called for each processor and selects if this is the right one.

Returns:

The matching sample processor.

Return type:

Any

Raises:

ValueError – If no matching sample processor are found, multiple matching sample processors are found or if there are no sample processors.

class noether.data.pipeline.SampleProcessor
abstractmethod inverse(key, value)

Inverts the transformation from the __call__ method of a single item in the batch. Only should be implemented if the SampleProcessor is invertable or if the identity function is valid.

Parameters:
  • key (str) – The name of the item.

  • value (torch.Tensor) – The value of the item.

Returns:

The (potentially) back-mapped name and the (potentially) denormalized value.

Return type:

(key, value)

static save_copy(obj)

Make a deep copy of an object to avoid modifying the original object.

Parameters:

obj (T) – Any object that should be copied.

Returns:

A deep copy of the input object.

Return type:

T