How to Implement a Custom Sample ProcessorΒΆ

Inside the Multistage Pipeline, data samples can be processed using custom Sample Processors. A Sample Processor is a callable class that takes as input a single data sample (a dictionary) and returns a processed data sample (also a dictionary). To create your own Sample Processor, you need to extend the base SampleProcessor class and implement the __call__() method. Sample processors do not receive a configuration object, but can accept arbitrary keyword arguments in their constructor.

from noether.data.pipeline.sample_processor import SampleProcessor

class CustomSampleProcessor(SampleProcessor):
   """Utility processor that simply duplicates the dictionary keys in a batch."""

   def __init__(self, **kwargs) -> None:
      """
      Args:
         Sample processor don't get a config object as input, but can accept arbitrary keyword arguments.
      """



   def __call__(self, input_sample: dict[str, Any]) -> dict[str, Any]:
      """
      Args:
          input_sample: Input sample dictionary.
      Returns:
            Processed sample dictionary.
      """

      # do any form of processing here

      return output_sample