noether.data.base.wrappers¶
Submodules¶
Attributes¶
Classes¶
Wrapper around arbitrary noether.data.Dataset instances to make __getitem__ load the properties that are defined |
|
Repeats the wrapped dataset repetitions times. |
|
Shuffles the dataset, optionally with seed. |
|
Wraps the dataset with a noether.data.Subset using indices generated by the properties from the constructor. |
|
Wrapper that times __getitem__ calls and returns both the item and the time taken. |
Package Contents¶
- class noether.data.base.wrappers.PropertySubsetWrapper(dataset, properties)¶
Bases:
noether.data.base.DatasetWrapperWrapper around arbitrary noether.data.Dataset instances to make __getitem__ load the properties that are defined in the properties attribute of this wrapper. For example, if we have a dataset that contains three kinds of items: “x”, “y”, and “z” (i.e., the dataset implements getitem_x, getitem_y, and getitem_z methods), we can create a PropertySubsetWrapper around that dataset with properties={“x”, “y”}. to only load “x” and “y” when __getitem__ is called. This is useful to avoid loading unnecessary data from disk. For example, it might be that you need different items from the same dataset during training and validation. During training, you might only need “x” and “y”, while during validation you might need “x”, “y”, and “z”. By using a PropertySubsetWrapper, you can create two different datasets for training and validation that only load the necessary items.
Example:
from noether.data import PropertySubsetWrapper, Dataset class DummyDataset(Dataset): def __init__(self): self.data = torch.arange(10) def getitem_x(self, idx): return self.data[idx] * 2 def getitem_y(self, idx): return self.data[idx] + 3 def getitem_z(self, idx): return self.data[idx] - 5 def __len__(self): return len(self.data) dataset = DummyDataset() wrapper = PropertySubsetWrapper(dataset=dataset, properties={"x", "y"}) sample = wrapper[4] # calls dataset.getitem_x(4) and dataset.getitem_y(4), getitem_z is not called sample # {"x": 8, "y": 7} wrapper.properties # {"x", "y"}
- Parameters:
dataset (noether.data.base.Dataset | noether.data.base.DatasetWrapper) – Base dataset to be wrapped. Can be a dataset or another dataset wrapper.
properties (set[str]) – Which properties to load from the wrapped dataset when __getitem__ is called.
- Raises:
TypeError – If properties is not a set.
ValueError – If properties is empty or if any property does not correspond to a getitem
- properties¶
- classmethod from_included_excluded(dataset, included_properties, excluded_properties)¶
Creates a PropertySubsetWrapper from included and excluded properties.
- Parameters:
dataset (noether.data.base.Dataset) – Base dataset to be wrapped.
included_properties (set[str] | None) – If defined, only these properties are included.
excluded_properties (set[str] | None) – If defined, these properties are excluded.
- Returns:
The created PropertySubsetWrapper.
- Return type:
- class noether.data.base.wrappers.RepeatWrapper(config, dataset)¶
Bases:
noether.data.base.SubsetRepeats the wrapped dataset repetitions times.
Example:
from noether.data import Dataset as ListDataset dataset = ListDataset([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) len(dataset) 10 repeat_dataset = RepeatWrapper(dataset, repetitions=3) len(repeat_dataset) 30
- Parameters:
config (noether.core.schemas.dataset.RepeatWrapperConfig) – Configuration for the RepeatWrapper. See
RepeatWrapperConfigfor available options.dataset (noether.data.base.Dataset) – The dataset to repeat.
- Raises:
ValueError – If repetitions is less than 2 or if the dataset is empty. You don’t need to use this wrapper with repetitions < 2.
- repetitions¶
- class noether.data.base.wrappers.ShuffleWrapper(config, dataset)¶
Bases:
noether.data.base.SubsetShuffles the dataset, optionally with seed.
- Parameters:
config (noether.core.schemas.dataset.ShuffleWrapperConfig) – Configuration for the ShuffleWrapper. See
ShuffleWrapperConfigfor available options.dataset (noether.data.base.Dataset | noether.data.base.DatasetWrapper) – The dataset to shuffle. Can be a base dataset or an already wrapped dataset.
- Raises:
ValueError – If the dataset is not an instance of noether.data.Dataset or DatasetWrapper, or if the seed is not an integer or None.
- seed¶
- class noether.data.base.wrappers.SubsetWrapper(config, dataset)¶
Bases:
noether.data.base.SubsetWraps the dataset with a noether.data.Subset using indices generated by the properties from the constructor.
- Parameters:
config (noether.core.schemas.dataset.SubsetWrapperConfig) – The configuration to use. See
SubsetWrapperConfigfor available options.dataset (noether.data.base.Dataset | noether.data.base.wrapper.DatasetWrapper) – The dataset to wrap.
- Raises:
ValueError – If the input parameters are invalid.
RuntimeError – If no valid indices are provided.
- noether.data.base.wrappers.META_GETITEM_TIME = '__meta_time_getitem'¶
- class noether.data.base.wrappers.TimingWrapper(dataset)¶
Bases:
noether.data.base.DatasetWrapperWrapper that times __getitem__ calls and returns both the item and the time taken.
- Parameters:
dataset (noether.data.base.Dataset | noether.data.base.DatasetWrapper) – The dataset to wrap