noether.data.pipeline.sample_processors

Submodules

Classes

ConcatTensorSampleProcessor

Concatenates multiple tensors into a single tensor.

DefaultTensorSampleProcessor

Create a tensor with a fixed dummy value, with a specified size.

DropOutliersSampleProcessor

Drops all outliers from key in a the input sample.

DuplicateKeysSampleProcessor

Utility processor that simply duplicates the dictionary keys in a batch.

MomentNormalizationSampleProcessor

Normalizes a value with its mean and standard deviation (i.e., its moments).

PointSamplingSampleProcessor

Randomly subsamples points from a tensor.

PositionNormalizationSampleProcessor

Pre-processes data on a sample-level to normalize positions.

RenameKeysSampleProcessor

Sample processor that simply renames the dictionary keys in a batch.

ReplaceKeySampleProcessor

Sample processor that replaces the key with multiple other keys.

SupernodeSamplingSampleProcessor

Randomly samples supernodes from a pointcloud.

Package Contents

class noether.data.pipeline.sample_processors.ConcatTensorSampleProcessor(items, target_key, dim=0)

Bases: noether.data.pipeline.SampleProcessor

Concatenates multiple tensors into a single tensor.

# dummy example
processor = ConcatTensorSampleProcessor(items=["image_part1", "image_part2"], target_key="full_image", dim=0)
input_sample = {
    "image_part1": torch.randn(3, 224, 224),
    "image_part2": torch.randn(3, 224, 224),
}
output_sample = processor(input_sample)
# output_sample['full_image'] will be a tensor of shape (6, 224, 224)
Parameters:
  • items (list[str]) – A list of keys in the input_sample dict whose tensors should be concatenated.

  • target_key (str) – The key in the sample dict where the concatenated tensor will be stored.

  • dim (int) – The dimension along which to concatenate the tensors. Defaults to 0.

items
target_key
dim = 0
class noether.data.pipeline.sample_processors.DefaultTensorSampleProcessor(item_key_name, feature_dim, size=None, matching_item_key=None, default_value=0.0)

Bases: noether.data.pipeline.SampleProcessor

Create a tensor with a fixed dummy value, with a specified size.

# dummy example
processor = DefaultTensorSampleProcessor(
    item_key_name="default_tensor",
    feature_dim=128,
    size=10,
    default_value=0.5,
)
input_sample = {}
output_sample = processor(input_sample)
# output_sample['default_tensor'] will be a tensor of shape (10, 128) filled with 0.5
Parameters:
  • item_key_name (str) – key of the created default tensor in the output sample dict.

  • default_value (float) – value to fill the created default tensor with.

  • feature_dim (int) – size of the feature dimension of the created default tensor.

  • size (int | None) – size of the first dimension of the created default tensor.

  • matching_item_key (str | None) – key of an existing tensor in the input sample dict to match the size of the first dimension.

item_key_name
feature_dim
size = None
matching_item_key = None
default_value = 0.0
class noether.data.pipeline.sample_processors.DropOutliersSampleProcessor(item, affected_items=None, min_value=None, max_value=None, min_quantile=None, max_quantile=None)

Bases: noether.data.pipeline.sample_processor.SampleProcessor

Drops all outliers from key in a the input sample.

# dummy example
processor = DropOutliersSampleProcessor(
    item="measurement",
    affected_items={"related_measurement1", "related_measurement2"},
    min_value=0.0,
    max_value=100.0,
)

input_sample = {
    "measurement": torch.tensor([[10.0], [200.0], [-5.0], [50.0]]),
    "related_measurement1": torch.tensor([[1.0], [2.0], [3.0], [4.0]]),
    "related_measurement2": torch.tensor([[5.0], [6.0], [7.0], [8.0]]),
}
output_sample = processor(input_sample)
# output_sample['measurement'] will be tensor([[10.0], [50.0]])
# output_sample['related_measurement1'] will be tensor([[1.0], [4.0]])
# output_sample['related_measurement2'] will be tensor([[5.0], [8.0]])
Parameters:
  • item (str) – The item to drop outliers from.

  • affected_items (set[str] | None) – List of item (keys) that is also affected by outlier removal. Defaults to None.

  • min_value (float | None) – Drop outliers below min_value. Defaults to None.

  • max_value (float | None) – Drop outliers above max_value. Defaults to None.

  • min_quantile (float | None) – Drop outliers in/below min_quantile. Defaults to None.

  • max_quantile (float | None) – Drop outliers in/above max_value. Defaults to None.

item
affected_items = None
min_value = None
max_value = None
min_quantile = None
max_quantile = None
class noether.data.pipeline.sample_processors.DuplicateKeysSampleProcessor(key_map)

Bases: noether.data.pipeline.sample_processor.SampleProcessor

Utility processor that simply duplicates the dictionary keys in a batch.

Duplicates keys in the batch if they are in the key_map. Creates a new dictionary whose keys are duplicated but uses references to the values of the old dict. This avoids copying the data and at the same time does not modify this function’s input.

# dummy example
processor = DuplicateKeysSampleProcessor(key_map={"original_key": "duplicated_key"})

input_sample = {
    "original_key": tensor_data,
}

output_sample = processor(input_sample)
# output_sample['original_key'] will be tensor_data
# output_sample['duplicated_key'] will also be tensor_data
Parameters:

key_map (dict[str, str]) – Dict with source keys as keys and target keys as values. The source keys are duplicated in the samples and the target keys are created. The values of the source keys are used for the target keys.

key_map
class noether.data.pipeline.sample_processors.MomentNormalizationSampleProcessor(item, mean=None, std=None, logmean=None, logstd=None, logscale=False)

Bases: noether.data.pipeline.sample_processor.SampleProcessor

Normalizes a value with its mean and standard deviation (i.e., its moments).

# dummy example
processor = MomentNormalizationSampleProcessor(
    item="measurement",
    mean=[10.0],
    std=[2.0],
    logscale=False,
)
input_sample = {
    "measurement": torch.tensor([[12.0], [14.0], [8.0]]),
    "other_item": torch.tensor([[1.0], [2.0], [3.0]]),
}
output_sample = processor(input_sample)
# output_sample['measurement'] will be tensor([[1.0], [2.0], [-1.0]])
# output_sample['other_item'] will be unchanged.
Parameters:
  • item (str) – The item (i.e., key in the input sample dictionary) to normalize.

  • mean (collections.abc.Sequence[float] | None) – The mean of the value. Mandatory if logscale=False.

  • std (collections.abc.Sequence[float] | None) – The standard deviation of the value. Mandatory if logscale=False.

  • logmean (collections.abc.Sequence[float] | None) – The mean of the value in logscale. Mandatory if logscale=True.

  • logstd (collections.abc.Sequence[float] | None) – The standard deviation of the value in logscale. Mandatory if logscale=True.

  • logscale (bool) – Whether to convert the value to logscale before normalization.

item
mean_tensor = None
std_tensor = None
logmean_tensor = None
logstd_tensor = None
logscale = False
inverse(key, value)

Inverts the normalization from the __call__ method of a single item in the batch.

Parameters:
  • key (str) – The name of the item.

  • value (torch.Tensor) – The value of the item.

Returns:

The same name and the denormalized value.

Return type:

(key, value)

class noether.data.pipeline.sample_processors.PointSamplingSampleProcessor(items, num_points, seed=None)

Bases: noether.data.pipeline.sample_processor.SampleProcessor

Randomly subsamples points from a tensor.

# dummy example
processor = PointSamplingSampleProcessor(
    items={"input_position", "output_position"},
    num_points=1024,
    seed=42,
)
input_sample = {
    "input_position": torch.randn(5000, 3),
    "output_position": torch.randn(5000, 3),
    "input_features": torch.randn(5000, 6),
}
output_sample = processor(input_sample)
# output_sample['input_position'] will be a tensor of shape (1024, 3)
# output_sample['output_position'] will be a tensor of shape (1024, 3)
# output_sample['input_features'] will be unchanged.
# If input features is also added to items, it will be of shape (1024, 6)
Parameters:
  • items (set[str]) – Which pointcloud items should be subsampled (e.g., input_position, output_position, …). If multiple

  • present (items are)

  • (e.g. (the subsampling will use identical indices for all items)

  • downsample (to)

  • subsampling). (output_position and output_pressure with the same)

  • num_points (int) – Number of points to sample.

  • seed (int | None) – Random seed for deterministic sampling for evaluation. Default None (i.e., no seed). If not None, requires sample index to be present in batch.

items
num_points
seed = None
class noether.data.pipeline.sample_processors.PositionNormalizationSampleProcessor(items, raw_pos_min, raw_pos_max, scale=1000)

Bases: noether.data.pipeline.sample_processor.SampleProcessor

Pre-processes data on a sample-level to normalize positions.

Should only be used when multiple items should be normalized with the same normalization. If only one item should be normalized, consider using the preprocessor PositionNormalizer instead.

Parameters:
  • items (set[str]) – The position items to normalize. I.e., keys of the input_sample dictionary that should be normalized.

  • raw_pos_min (collections.abc.Sequence[float]) – The minimum position in the source domain.

  • raw_pos_max (collections.abc.Sequence[float]) – The maximum position in the source domain.

  • scale (int | float) – The maximum value of the position. Defaults to 1000.

items
scale = 1000
raw_pos_min_tensor
raw_pos_max_tensor
raw_size
class noether.data.pipeline.sample_processors.RenameKeysSampleProcessor(key_map)

Bases: noether.data.pipeline.sample_processor.SampleProcessor

Sample processor that simply renames the dictionary keys in a batch.

Rename keys in the batch if they are in the key_map and keep old keys otherwise. Creates a new dictionary whose keys are renamed but uses references to the values of the old dict. This avoids copying the data and at the same time does not modify this function’s input.

# dummy example
processor = RenameKeysSampleProcessor(key_map={"old_key1": "new_key1", "old_key2": "new_key2"})
input_sample = {
    "old_key1": some_tensor1,
    "old_key2": some_tensor2,
    "unchanged_key": some_tensor3,
}

output_sample = processor(input_sample)
# output_sample will be: {
#     'new_key1': some_tensor1,
#     'new_key2': some_tensor2,
#     'unchanged_key': some_tensor3,
# }
Parameters:

key_map (dict[str, str]) – Dict with source keys as keys and target keys as values. The source keys are renamed target keys.

key_map
class noether.data.pipeline.sample_processors.ReplaceKeySampleProcessor(source_key, target_keys)

Bases: noether.data.pipeline.sample_processor.SampleProcessor

Sample processor that replaces the key with multiple other keys.

Replaces a key in the batch with one or multiple other keys. Creates a new dictionary whose keys are duplicated but uses references to the values of the old dict. This avoids copying the data and at the same time does not modify this function’s input.

# dummy example
processor = ReplaceKeySampleProcessor(source_key="source", target_keys={"target1", "target2"})
input_sample = {
    "source": some_tensor,
    "unchanged_key": some_other_tensor,
}
output_sample = processor(input_sample)
# output_sample will be: {
#     'target1': some_tensor,
#     'target2': some_tensor,
#     'unchanged_key': some_other_tensor,
# }
Parameters:
  • source_key (str) – Key in the input_sample to be replaced.

  • target_keys (set[str]) – List of keys where source_key should be replaced in.

source_key
target_keys
class noether.data.pipeline.sample_processors.SupernodeSamplingSampleProcessor(item, num_supernodes, supernode_idx_key='supernode_idx', items_at_supernodes=None, seed=None)

Bases: noether.data.pipeline.sample_processor.SampleProcessor

Randomly samples supernodes from a pointcloud.

Parameters:
  • item (str) – Which key in the input_sample (i.e., pointcloud item) is used to sample supernodes.

  • num_supernodes (int) – How many supernodes to sample.

  • items_at_supernodes (set[str] | None) – Selects items at the supernodes (e.g., pressure at supernodes). Defaults to None. These items are sampled accordingly and added to the output supernodes.

  • seed (int | None) – Random seed for deterministic sampling for evaluation. Default None (i.e., no seed). If not None, requires sample index to be present in batch.

  • supernode_idx_key (str)

item
num_supernodes
supernode_idx_key = 'supernode_idx'
items_at_supernodes = None
seed = None