Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major BiC fix/reimplementation #1550

Merged
merged 6 commits into from
Jan 25, 2024
Merged

Conversation

lrzpellegrini
Copy link
Collaborator

@lrzpellegrini lrzpellegrini commented Dec 19, 2023

This PR is a major re-implementation/global fix for BiC (paper)

The implementation found in Avalanche (and in other CL libraries as well) has various issues, mainly connected to the understanding of how the algorithm works. It's an honest mistake that I made too when reading the paper.

The official implementation of BiC can be found here (TF 1.x :/ ): https://github.com/wuyuebupt/LargeScaleIncrementalLearning/tree/master

BiC works like that:

  1. Distillation phase: classic training loop with distillation loss (on logits of old classes) + classification loss (logits of old and new classes)
  2. Bias correction phase: learn alpha and beta correction parameters for new classes

The main issue in Avalanche implementation is that BiC does not actually keep a separate bias correction layer for each experience, it only keeps single alpha and beta parameters (a single bias correction layer) which is then applied only to the "last" experience.

In practice:

  1. Distillation phase: distill on logits (after the FC layer) by applying the bias correction on the activations of the old model (used for distillation) for the classes found in experience current_experience-1. Bias correction is not applied to other classes (such as the ones in current_experience-t t>=2, nor current_experience. Bias correction is not applied to the activations from the current model. The usual classification loss is applied on the logits of the current model without bias correction.
  2. Discard the previous bias correction layer and learn new alpha and beta values for the classes in current_experience
  3. (eval) During the test phase, apply the bias correction only to the logits of classes from current_experience

What Avalanche was doing (and I suspect FACIL too) is to keep a separate (permanent) bias correction layer for each experience. Those were then used in all phases (computing the distillation loss, classification loss, bias correction for new classes, test phase).

I noticed these problems when fixing other accidental bugs related to the freezing of the old model and bias layers, which are of course fixed in this version :).

To recap:

  • Fix the distillation loss computation procedure (proper freezing of the old model, bias correction only on classes of current_experience-1, ...)
  • Fix the loss function (see: https://colab.research.google.com/drive/1eLQbfxe6i0V7tu5spMxLSopebuP7Wc9l?usp=sharing)
  • Fix the bias correction procedure (freezing of the current model, post-training bias layer freezing, ...)
  • Fix the BiasLayer implementation (the forward function was bugged as the beta-derived tensor was initialized to 1s instead of 0s)

@coveralls
Copy link

coveralls commented Dec 19, 2023

Pull Request Test Coverage Report for Build 7652723163

Warning: This coverage report may be inaccurate.

We've detected an issue with your CI configuration that might affect the accuracy of this pull request's coverage report.
To ensure accuracy in future PRs, please see these guidelines.
A quick fix for this PR: rebase it; your next report should be accurate.

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.05%) to 73.887%

Totals Coverage Status
Change from base Build 7126448314: 0.05%
Covered Lines: 18884
Relevant Lines: 25558

💛 - Coveralls

@AntonioCarta
Copy link
Collaborator

Thanks, the code looks ok. I see that we are missing a BiC reproducibility example in continual-learning-baselines. If you have time maybe you can add something there too.

@lrzpellegrini
Copy link
Collaborator Author

I'm trying to reproduce a CIFAR experiment. Alas, I also found a bug in the _ParametricSingleBuffer (to apply the iCarl herding). I'll try to include a fix for that, too.

@lrzpellegrini lrzpellegrini marked this pull request as ready for review January 24, 2024 10:39
@lrzpellegrini
Copy link
Collaborator Author

The results are far better than the ones obtained by using the previous version.

Once tests are finished, I recommend merging.

@lrzpellegrini
Copy link
Collaborator Author

@AntonioCarta In the code I used _concat_taskaware_classification_datasets. Is there a better way to do this without using a private function?

@AntonioCarta
Copy link
Collaborator

AntonioCarta commented Jan 24, 2024

@AntonioCarta In the code I used _concat_taskaware_classification_datasets. Is there a better way to do this without using a private function?

The AvalancheDataset.concat should work without issues. _concat_taskaware_classification_datasets is the old method, kept for compatibility because it also handles targets, transform, and collate.

@lrzpellegrini
Copy link
Collaborator Author

I switched to concat_datasets. Let me know if it makes sense to you.

Apart from the Python 3.11 tests (which, if I'm correct, are still being fixed), everything works fine.

@AntonioCarta
Copy link
Collaborator

Everything is ok. Just a minor comment about the test change.

@AntonioCarta AntonioCarta merged commit 40119a0 into ContinualAI:master Jan 25, 2024
11 of 12 checks passed
@lrzpellegrini lrzpellegrini deleted the bic_fix branch January 25, 2024 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants