noether.data.base.wrappers

Submodules

Attributes

Classes

PropertySubsetWrapper

Wrapper around arbitrary noether.data.Dataset instances to make __getitem__ load the properties that are defined

RepeatWrapper

Repeats the wrapped dataset repetitions times.

ShuffleWrapper

Shuffles the dataset, optionally with seed.

SubsetWrapper

Wraps the dataset with a noether.data.Subset using indices generated by the properties from the constructor.

TimingWrapper

Wrapper that times __getitem__ calls and returns both the item and the time taken.

Package Contents

class noether.data.base.wrappers.PropertySubsetWrapper(dataset, properties)

Bases: noether.data.base.DatasetWrapper

Wrapper around arbitrary noether.data.Dataset instances to make __getitem__ load the properties that are defined in the properties attribute of this wrapper. For example, if we have a dataset that contains three kinds of items: “x”, “y”, and “z” (i.e., the dataset implements getitem_x, getitem_y, and getitem_z methods), we can create a PropertySubsetWrapper around that dataset with properties={“x”, “y”}. to only load “x” and “y” when __getitem__ is called. This is useful to avoid loading unnecessary data from disk. For example, it might be that you need different items from the same dataset during training and validation. During training, you might only need “x” and “y”, while during validation you might need “x”, “y”, and “z”. By using a PropertySubsetWrapper, you can create two different datasets for training and validation that only load the necessary items.

Example:

from noether.data import PropertySubsetWrapper, Dataset


class DummyDataset(Dataset):
    def __init__(self):
        self.data = torch.arange(10)

    def getitem_x(self, idx):
        return self.data[idx] * 2

    def getitem_y(self, idx):
        return self.data[idx] + 3

    def getitem_z(self, idx):
        return self.data[idx] - 5

    def __len__(self):
        return len(self.data)


dataset = DummyDataset()
wrapper = PropertySubsetWrapper(dataset=dataset, properties={"x", "y"})
sample = wrapper[4]  # calls dataset.getitem_x(4) and dataset.getitem_y(4), getitem_z is not called
sample  # {"x": 8, "y": 7}
wrapper.properties  # {"x", "y"}
Parameters:
Raises:
  • TypeError – If properties is not a set.

  • ValueError – If properties is empty or if any property does not correspond to a getitem

properties
classmethod from_included_excluded(dataset, included_properties, excluded_properties)

Creates a PropertySubsetWrapper from included and excluded properties.

Parameters:
  • dataset (noether.data.base.Dataset) – Base dataset to be wrapped.

  • included_properties (set[str] | None) – If defined, only these properties are included.

  • excluded_properties (set[str] | None) – If defined, these properties are excluded.

Returns:

The created PropertySubsetWrapper.

Return type:

PropertySubsetWrapper

class noether.data.base.wrappers.RepeatWrapper(config, dataset)

Bases: noether.data.base.Subset

Repeats the wrapped dataset repetitions times.

Example:

from noether.data import Dataset as ListDataset

dataset = ListDataset([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
len(dataset)
10
repeat_dataset = RepeatWrapper(dataset, repetitions=3)
len(repeat_dataset)
30
Parameters:
Raises:

ValueError – If repetitions is less than 2 or if the dataset is empty. You don’t need to use this wrapper with repetitions < 2.

repetitions
class noether.data.base.wrappers.ShuffleWrapper(config, dataset)

Bases: noether.data.base.Subset

Shuffles the dataset, optionally with seed.

Parameters:
Raises:

ValueError – If the dataset is not an instance of noether.data.Dataset or DatasetWrapper, or if the seed is not an integer or None.

seed
class noether.data.base.wrappers.SubsetWrapper(config, dataset)

Bases: noether.data.base.Subset

Wraps the dataset with a noether.data.Subset using indices generated by the properties from the constructor.

Parameters:
Raises:
noether.data.base.wrappers.META_GETITEM_TIME = '__meta_time_getitem'
class noether.data.base.wrappers.TimingWrapper(dataset)

Bases: noether.data.base.DatasetWrapper

Wrapper that times __getitem__ calls and returns both the item and the time taken.

Parameters:

dataset (noether.data.base.Dataset | noether.data.base.DatasetWrapper) – The dataset to wrap