Implement sampler-based dataloading logic #1095

lrzpellegrini · 2022-07-15T14:28:09Z

This PR changes how dataloaders are created in data_loader.py.

The new mechanism:

Supports the PyTorch distributed sampler, which requires calling set_epoch. This will be useful in the immediate future to merge the support for distributed training in the main branch.
Will create DataLoaders (+Samplers objects) in the __iter__ method. This is needed to manage stateful samplers. The overall length of the final dataloder is computed in the constructor.
Default collate functions are taken from the newly created collate_functions.py. Existing collate functions have been moved there. Aliases have been created in data_loader.py (those aliases will probably be deprecated in the future).
- collate_functions.py contains additional functions used to better manage the broadcast of elements while distributed training. They will be useful later!

I will remove the related changes from the already opened distributed support PR #996, from which they were taken.

coveralls · 2022-07-15T14:37:02Z

Pull Request Test Coverage Report for Build 2677446091

99 of 134 (73.88%) changed or added relevant lines in 2 files are covered.
2 unchanged lines in 1 file lost coverage.
Overall coverage decreased (-0.005%) to 72.241%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
avalanche/benchmarks/utils/collate_functions.py	14	23	60.87%
avalanche/benchmarks/utils/data_loader.py	85	111	76.58%

Files with Coverage Reduction	New Missed Lines	%
avalanche/benchmarks/utils/data_loader.py	2	81.28%

Implement sampler-based dataloading logic.

712b8d2

lrzpellegrini requested a review from AntonioCarta July 15, 2022 14:28

AntonioCarta merged commit 06caf31 into ContinualAI:master Jul 18, 2022