Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add base elements to support distributed comms. Add supports_distributed plugin flag. #1370

Merged

Conversation

lrzpellegrini
Copy link
Collaborator

@lrzpellegrini lrzpellegrini commented May 10, 2023

This PR ports elements from the #996 PR by adding the following:

  • elements such DistributedHelper that, without changing the current behavior of Avalanche elements, is the starting point to start wiring the support for distributed training
  • the supports_distributed flag in BasePlugin. The method used to inherit such a flag in child classes is different from the one in Distributed support (rework) #996
  • A change in the data type used to store masks in multi-task modules

Unrelated to distributed training:

  • Adds the _obtain_common_dataloader_parameters in strategy
  • Minor fixes to the multi-task dynamic modules, that did not consider certain corner cases regarding evaluation before training
  • Fixes for the LVISDataset
  • Removed a few unused imports / added missing type hints / fixed typos / ...

Part of the effort described here: #1315

@coveralls
Copy link

coveralls commented May 10, 2023

Pull Request Test Coverage Report for Build 4993189103

  • 258 of 822 (31.39%) changed or added relevant lines in 27 files are covered.
  • 80 unchanged lines in 5 files lost coverage.
  • Overall coverage decreased (-1.9%) to 71.983%

Changes Missing Coverage Covered Lines Changed/Added Lines %
avalanche/logging/base_logger.py 2 3 66.67%
avalanche/models/utils.py 5 6 83.33%
avalanche/training/plugins/evaluation.py 11 12 91.67%
avalanche/training/supervised/l2p.py 2 3 66.67%
avalanche/training/supervised/strategy_wrappers.py 3 4 75.0%
avalanche/training/templates/base_sgd.py 19 20 95.0%
avalanche/benchmarks/utils/data_loader.py 4 6 66.67%
avalanche/models/dynamic_modules.py 10 12 83.33%
tests/distributed/distributed_test_utils.py 11 20 55.0%
avalanche/training/templates/base.py 10 22 45.45%
Files with Coverage Reduction New Missed Lines %
avalanche/benchmarks/utils/data_loader.py 1 80.39%
avalanche/training/templates/base.py 1 83.75%
tests/test_high_level_generators.py 1 99.71%
avalanche/benchmarks/generators/benchmark_generators.py 29 86.51%
avalanche/evaluation/metrics/cumulative_accuracies.py 48 42.19%
Totals Coverage Status
Change from base Build 4937044567: -1.9%
Covered Lines: 15585
Relevant Lines: 21651

💛 - Coveralls

Comment on lines +17 to +18
def hash_benchmark(benchmark: 'DatasetScenario', *,
hash_engine=None, num_workers=0) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a class method? (``hash`) same for the other classes in this file, except the classes defined outside of avalanche

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think I can move those elements to the appropriate classes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the only avalanche-specific hash function in that file is hash_benchmark. Do you think it is still appropriate to move it to CLScenario?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to reuse the class __hash__ method if possible so that child classes can safely override its behavior if needed. Also, hash_dataset should work only on AvalancheDataset since we don't really support any other dataset.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alas, __hash__ must return an int. It is designed to provide a coarse mechanism for populating hash maps. I think that we can just move those methods to the CLScenario and AvalancheDataset classes for the moment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, if it's different we can keep it as is. Maybe it will be more clear to me once I see how you use it for distributed training

avalanche/distributed/distributed_helper.py Show resolved Hide resolved
avalanche/distributed/distributed_helper.py Show resolved Hide resolved
avalanche/logging/base_logger.py Show resolved Hide resolved
avalanche/models/dynamic_modules.py Outdated Show resolved Hide resolved
avalanche/core.py Outdated Show resolved Hide resolved
avalanche/training/templates/base_sgd.py Outdated Show resolved Hide resolved
avalanche/training/templates/base_sgd.py Outdated Show resolved Hide resolved
avalanche/training/templates/base_sgd.py Outdated Show resolved Hide resolved
avalanche/training/plugins/gdumb.py Outdated Show resolved Hide resolved
@AntonioCarta
Copy link
Collaborator

Thanks @lrzpellegrini, I added a bunch of comments. I think there are some parts that can be simplified a bit hopefully.

@AntonioCarta
Copy link
Collaborator

Ok, I think the model_device and __hash__ are the only remaining issues.

@AntonioCarta AntonioCarta merged commit d60eb90 into ContinualAI:master May 17, 2023
BenCrulis pushed a commit to BenCrulis/avalanche that referenced this pull request Jul 24, 2024
…buted_training_pt2

Add base elements to support distributed comms. Add supports_distributed plugin flag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants