noether.data.tools.calculate_statistics

Functions

parse_args()

Parse command line arguments for dataset statistics calculation.

parse_dataset_args(args)

Parse additional arguments for dataset constructor.

get_dataset_attributes(dataset)

Extract all available attributes from the dataset that have getitem methods.

calculate_statistics(dataset, dataset_attributes, ...)

Calculate statistics for all dataset attributes.

print_statistics(running_stats, log_scale)

Print calculated statistics for each attribute.

save_statistics_to_json(running_stats, output_path, ...)

Save calculated statistics to a JSON file.

main(dataset_kind, log_scale, exclude_attributes[, ...])

Main function to calculate and display dataset statistics.

Module Contents

noether.data.tools.calculate_statistics.parse_args()

Parse command line arguments for dataset statistics calculation.

Returns:

Dictionary containing all parsed arguments

Return type:

dict[str, Any]

noether.data.tools.calculate_statistics.parse_dataset_args(args)

Parse additional arguments for dataset constructor.

Parameters:

args (list) – List of unparsed command-line arguments

Returns:

Dictionary of parsed dataset constructor arguments

Return type:

Dict[str, Any]

noether.data.tools.calculate_statistics.get_dataset_attributes(dataset)

Extract all available attributes from the dataset that have getitem methods.

Parameters:

dataset – The dataset object

Returns:

Set of attribute names

Return type:

Set[str]

noether.data.tools.calculate_statistics.calculate_statistics(dataset, dataset_attributes, log_scale, num_workers=0)

Calculate statistics for all dataset attributes.

Parameters:
  • dataset – The dataset object

  • dataset_attributes (set[str]) – Set of attribute names to process

  • log_scale (set[str]) – Set of attributes to process in log scale

  • num_workers (int) – Number of workers for data loading

Returns:

Dictionary mapping attribute names to their statistics

Return type:

Dict[str, RunningMoments]

noether.data.tools.calculate_statistics.print_statistics(running_stats, log_scale)

Print calculated statistics for each attribute.

Parameters:
Return type:

None

noether.data.tools.calculate_statistics.save_statistics_to_json(running_stats, output_path, log_scale)

Save calculated statistics to a JSON file.

Parameters:
Return type:

None

noether.data.tools.calculate_statistics.main(dataset_kind, log_scale, exclude_attributes, output_json=None, num_workers=0, **dataset_constructor_args)

Main function to calculate and display dataset statistics.

Parameters:
  • dataset_kind (str) – Class path of the dataset

  • log_scale (set[str]) – Set of attributes to process in log scale

  • exclude_attributes (set[str]) – Set of attributes to exclude from calculation

  • output_json (str | None) – Optional path to save statistics as JSON

  • num_workers (int) – Number of workers for data loading

  • dataset_constructor_args – Additional arguments for dataset constructor

Return type:

None