Change HEPAugFoldYielder to callback? #73
Labels
disruptive
Something which will likely cause large or breaking changes
improvement
Something which would improve current status, but not add anything new
investigation
Something which might require a careful study
medium priority
Not urgent but should be dealt with sooner rather than later
Current status
HEPAugFoldYielder
applied train-time and test-time data augmentaitons to HEP data (phi rotations, transverse & longitudinal flips). This is performed when loading the data since originally, this was the last point at which the feature names for the data were known to the model. Later changes to LUMIN, now mean that the model has a list of named features and how they map to the input features. This means that instead the data augmentation could be performed by a callback during training (similar to the suggestion of issue #68).Discussion
It seems a bit strange that the choice of whether or not to augment the data is made by changing how the data is loaded from file. Specifying the choice as a callback make a bit more sense (to me). This also avoids complications once addition forms of augmentation are added, which may otherwise require their own
FoldYielder
classes, and we must then account for all possible combinations of different types of augmentation.Depending on the choices made in issue #50, this may reduce the efficiency of augmentation, but it's possible that augmenting the data inplace on device may actually be more efficient by since it could be done multithreaded. This would perhaps avoid the need to augment as a
pandas.DataFrame
, and maybe pre-cached rotation matrices could be used, in some part, to speed things up. Since the data is already on device, this would actually be quicker than loaded from disc, augmenting, and then loading to device; this is known to cause particular slow-down when working on GPUPossible change
The callback would need to mimic the behaviour of
HEPAugFoldYielder
, i.e. provide random augmentation during training, and a choice of either set transformations during testing or random ones. It would need to be passed as a callback during training and prediction.Additionally, tests should be done to compare the speed and memory usage of the callback to
HEPAugFoldYielder
.If successful, this would depreciate
HEPAugFoldYielder
.The text was updated successfully, but these errors were encountered: