Add base elements to support distributed comms. Add supports_distributed plugin flag. #1370

lrzpellegrini · 2023-05-10T12:34:17Z

This PR ports elements from the #996 PR by adding the following:

elements such DistributedHelper that, without changing the current behavior of Avalanche elements, is the starting point to start wiring the support for distributed training
the supports_distributed flag in BasePlugin. The method used to inherit such a flag in child classes is different from the one in Distributed support (rework) #996
A change in the data type used to store masks in multi-task modules

Unrelated to distributed training:

Adds the _obtain_common_dataloader_parameters in strategy
Minor fixes to the multi-task dynamic modules, that did not consider certain corner cases regarding evaluation before training
Fixes for the LVISDataset
Removed a few unused imports / added missing type hints / fixed typos / ...

Part of the effort described here: #1315

…ed_training_pt2

coveralls · 2023-05-10T12:44:51Z

Pull Request Test Coverage Report for Build 4993189103

258 of 822 (31.39%) changed or added relevant lines in 27 files are covered.
80 unchanged lines in 5 files lost coverage.
Overall coverage decreased (-1.9%) to 71.983%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
avalanche/logging/base_logger.py	2	3	66.67%
avalanche/models/utils.py	5	6	83.33%
avalanche/training/plugins/evaluation.py	11	12	91.67%
avalanche/training/supervised/l2p.py	2	3	66.67%
avalanche/training/supervised/strategy_wrappers.py	3	4	75.0%
avalanche/training/templates/base_sgd.py	19	20	95.0%
avalanche/benchmarks/utils/data_loader.py	4	6	66.67%
avalanche/models/dynamic_modules.py	10	12	83.33%
tests/distributed/distributed_test_utils.py	11	20	55.0%
avalanche/training/templates/base.py	10	22	45.45%

Files with Coverage Reduction	New Missed Lines	%
avalanche/benchmarks/utils/data_loader.py	1	80.39%
avalanche/training/templates/base.py	1	83.75%
tests/test_high_level_generators.py	1	99.71%
avalanche/benchmarks/generators/benchmark_generators.py	29	86.51%
avalanche/evaluation/metrics/cumulative_accuracies.py	48	42.19%

Totals
Change from base Build 4937044567:	-1.9%
Covered Lines:	15585
Relevant Lines:	21651

💛 - Coveralls

AntonioCarta · 2023-05-11T07:29:15Z

avalanche/distributed/distributed_consistency_verification.py

+def hash_benchmark(benchmark: 'DatasetScenario', *, 
+                   hash_engine=None, num_workers=0) -> str:


Shouldn't this be a class method? (``hash`) same for the other classes in this file, except the classes defined outside of avalanche

Yes, I think I can move those elements to the appropriate classes.

It seems that the only avalanche-specific hash function in that file is hash_benchmark. Do you think it is still appropriate to move it to CLScenario?

I think it's better to reuse the class __hash__ method if possible so that child classes can safely override its behavior if needed. Also, hash_dataset should work only on AvalancheDataset since we don't really support any other dataset.

Alas, __hash__ must return an int. It is designed to provide a coarse mechanism for populating hash maps. I think that we can just move those methods to the CLScenario and AvalancheDataset classes for the moment.

ok, if it's different we can keep it as is. Maybe it will be more clear to me once I see how you use it for distributed training

avalanche/distributed/distributed_helper.py

avalanche/logging/base_logger.py

avalanche/models/dynamic_modules.py

avalanche/core.py

avalanche/training/templates/base_sgd.py

avalanche/training/plugins/gdumb.py

AntonioCarta · 2023-05-11T08:02:45Z

Thanks @lrzpellegrini, I added a bunch of comments. I think there are some parts that can be simplified a bit hopefully.

AntonioCarta · 2023-05-16T13:37:31Z

Ok, I think the model_device and __hash__ are the only remaining issues.

…buted_training_pt2 Add base elements to support distributed comms. Add supports_distributed plugin flag.

lrzpellegrini added 3 commits May 5, 2023 16:50

Unobtrusive changes to prepare for distributed support.

88bcce8

Merge remote-tracking branch 'upstream/master' into porting_distribut…

cf1ca14

…ed_training_pt2

Added Distributed Helper. Implemented plugin supports distributed flag.

11bd700

lrzpellegrini requested a review from AntonioCarta May 10, 2023 12:34

lrzpellegrini mentioned this pull request May 10, 2023

Port distributed training support from existing PR #1315

Open

lrzpellegrini added 3 commits May 10, 2023 14:47

Fixed path of distributed tests runner.

87ccaf6

Fixed issues with optional lvis dependency in unit tests.

24962cc

PYTHONPATH fix for distributed unit tests.

4d4c63a

AntonioCarta requested changes May 11, 2023

View reviewed changes

lrzpellegrini added 2 commits May 15, 2023 18:29

Apply supports_distributed using __init_subclass__

a419d79

Better doc for obtain_common_dataloader_parameters

3b71a04

lrzpellegrini added 2 commits May 16, 2023 16:21

Add warning when passing non-None parameter.

3f5a78a

Change model_device to propery _adaptation_device

ee2881d

AntonioCarta merged commit d60eb90 into ContinualAI:master May 17, 2023

BenCrulis pushed a commit to BenCrulis/avalanche that referenced this pull request Jul 24, 2024

Merge pull request ContinualAI#1370 from lrzpellegrini/porting_distri…

2baddd1

…buted_training_pt2 Add base elements to support distributed comms. Add supports_distributed plugin flag.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add base elements to support distributed comms. Add supports_distributed plugin flag. #1370

Add base elements to support distributed comms. Add supports_distributed plugin flag. #1370

lrzpellegrini commented May 10, 2023 •

edited

Loading

coveralls commented May 10, 2023 •

edited

Loading

AntonioCarta May 11, 2023

lrzpellegrini May 11, 2023

lrzpellegrini May 15, 2023

AntonioCarta May 16, 2023

lrzpellegrini May 16, 2023

AntonioCarta May 16, 2023

AntonioCarta commented May 11, 2023

AntonioCarta commented May 16, 2023

		def hash_benchmark(benchmark: 'DatasetScenario', *,
		hash_engine=None, num_workers=0) -> str:

Add base elements to support distributed comms. Add supports_distributed plugin flag. #1370

Add base elements to support distributed comms. Add supports_distributed plugin flag. #1370

Conversation

lrzpellegrini commented May 10, 2023 • edited Loading

coveralls commented May 10, 2023 • edited Loading

Pull Request Test Coverage Report for Build 4993189103

💛 - Coveralls

AntonioCarta May 11, 2023

Choose a reason for hiding this comment

lrzpellegrini May 11, 2023

Choose a reason for hiding this comment

lrzpellegrini May 15, 2023

Choose a reason for hiding this comment

AntonioCarta May 16, 2023

Choose a reason for hiding this comment

lrzpellegrini May 16, 2023

Choose a reason for hiding this comment

AntonioCarta May 16, 2023

Choose a reason for hiding this comment

AntonioCarta commented May 11, 2023

AntonioCarta commented May 16, 2023

lrzpellegrini commented May 10, 2023 •

edited

Loading

coveralls commented May 10, 2023 •

edited

Loading