noether.data.samplers.interleaved_sampler¶
Classes¶
Configuration dataclass for setting up the dataloading pipeline, which is structured to load data from a "main" |
|
Sampler to allow efficient dataloading by using a single large dataset containing train/test/... datasets all at |
Module Contents¶
- class noether.data.samplers.interleaved_sampler.SamplerIntervalConfig¶
Configuration dataclass for setting up the dataloading pipeline, which is structured to load data from a “main” dataset (i.e., the dataset used for training), which is interleaved by iterations over other datasets (e.g., a test dataset to calculate a metric in a callback) in regular intervals.
- Parameters:
sampler (SizedIterable) – Any sampler that would be used in torch.utils.data.DataLoader(sampler=…). Examples: RandomSampler for a training dataset or SequentialSampler for evaluation.
every_n_epochs (int | None) – Epoch-based interval. Invokes the callback after every n epochs. Mutually exclusive with other intervals.
every_n_updates (int | None) – Update-based interval. Invokes the callback after every n epochs. Mutually exclusive with other intervals.
every_n_samples (int | None) – Sample-based interval. Invokes the callback after every n epochs. Mutually exclusive with other intervals.
pipeline (Optional[callable]) – Any function that would be used in torch.utils.data.DataLoader(collate_fn=…).
batch_size (int | None) – Batch size to use for this callback. Default: None (which will use the same batch_size as used for the “main” sampler, i.e., the one used for training).
- pipeline: collections.abc.Callable | None¶
- validate_frequency()¶
Ensures that exactly one frequency (‘every_n_*’) is specified and that ‘batch_size’ is present if ‘every_n_samples’ is used.
- Return type:
- class noether.data.samplers.interleaved_sampler.InterleavedSamplerConfig(/, **data)¶
Bases:
pydantic.BaseModel- Parameters:
data (Any)
- max_epochs: int | None = None¶
How many epochs to sample at most from the main_sampler. Whatever limit is reached first (epochs/updates/samples) will stop the sampling.
- max_updates: int | None = None¶
How many updates to sample at most from the main_sampler. Whatever limit is reached first (epochs/updates/samples) will stop the sampling.
- max_samples: int | None = None¶
How many samples to sample at most from the main_sampler. Whatever limit is reached first (epochs/updates/samples) will stop the sampling.
- start_epoch: int | None = None¶
At which epoch to start (used for resuming training). Mutually exclusive with start_update and start_sample.
- start_update: int | None = None¶
At which update to start (used for resuming training). Mutually exclusive with start_epoch and start_sample.
- start_sample: int | None = None¶
At which sample to start (used for resuming training). Mutually exclusive with start_epoch and start_update.
- evaluation: bool = False¶
If True, the sampler is used for evaluation and will only iterate over the interleaved samplers once without iterating over the main sampler.
- classmethod check_positive_values(v)¶
Ensures that all integer-based frequency and batch size fields are positive.
- validate_stop()¶
Ensures that at least one frequency (’_n_’) is specified and
- Return type:
- validate_start()¶
Ensures that at least one start (‘start_*’) is specified
- Return type:
- class noether.data.samplers.interleaved_sampler.InterleavedSampler(train_sampler, config, train_collator=None, callback_samplers=None)¶
Sampler to allow efficient dataloading by using a single large dataset containing train/test/… datasets all at once. The sampler will sample from different regionis in the dataset according to its specification. For example, consider a training dataset of length 100 and a test dataset of length 10. If the sampler is configured with a RandomSampler of the training dataset indices as main_sampler, it will repeatedly iterate over the training dataset. If the test dataset is configured with a sequential sampler that should be invoked after every epoch, the sampler will first return indices for the 100 training samples (randomly sampled) and then indices for the 10 test samples (in sequential order).
- Parameters:
train_sampler (noether.core.utils.common.SizedIterable) – Sampler that is invoked by default (e.g., randomly sample from the trainset)
config (InterleavedSamplerConfig) – Configuration for the InterleavedSampler.
train_collator (collections.abc.Callable | None) – Collator used to collate samples from indices sampled from the train sampler.
callback_samplers (list[SamplerIntervalConfig] | None) – Configurations when the train_sampler should be paused and indices from other samplers (e.g., from a testset) should be returned. Also configures the interval and optionally a different batch_size to use for the interleaved batches.
- config¶
- main_sampler¶
- extra_samplers = []¶
- index_offsets = []¶
- dataset¶
- collator¶
- batch_sampler¶
- batch_size¶
- static calculate_start(config, sampler_len)¶
- Parameters:
config (InterleavedSamplerConfig)
sampler_len (int)
- get_data_loader(num_workers=0, pin_memory=False)¶
Creates the DataLoader that uses the InterleavedSampler with the accordingly configured dataset.
- Parameters:
- Returns:
DataLoader that uses the InterleavedSampler with the accordingly configured dataset.
- Return type: