A-GEM implementation #448
-
Hello, thanks for all your work. Could you explain why the A-GEM numbers are so bad ? Thanks a lot in advance. Best regards from Zurich, |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hi @JonasFrey96 😄 From our experiments, GEM works quite well with single head models, while A-GEM obtains poor performances. With multi-headed models, A-GEM recovers part of the performance in terms of catastrophic forgetting. This is the case also for Synaptic Intelligence: the original paper works with multi head models so the strategy performs poorly with a single head. |
Beta Was this translation helpful? Give feedback.
-
Moving this issue to "discussions", please @JonasFrey96 mark it as "answered" when you information need is satisfied! :) |
Beta Was this translation helpful? Give feedback.
Hi @JonasFrey96 😄 From our experiments, GEM works quite well with single head models, while A-GEM obtains poor performances. With multi-headed models, A-GEM recovers part of the performance in terms of catastrophic forgetting. This is the case also for Synaptic Intelligence: the original paper works with multi head models so the strategy performs poorly with a single head.
If you have been able to make A-GEM work with single-head , please let us know. That could indicate a bug in our implementation. However, up to now we have no sign pointing in that direction.