You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, PatrickStart could train the largest pretrained model with the lowest hardware requirement, i.e. gpu and cpu memory. However, this comes with a price, as we are specializing the design to naive bert and gpt structure as well as adam optimizer. This makes our users hard to use PatrickStar for their latest research project and hard for us to tweak some edge cases to be compatible with popular NLP repos.
Therefore, we decide to refactor PatrickStar to make it simple and flexible. After all, comparing to break record, we prefer creating a handy tool to the NLP community :)
Here are some of the changes we are making now:
Stop reusing the parameter chunks for gradient: This would increase memory usage but will make PatrickStar support more network structure.
No longer managing the optimizer state with chunks: This should allow us to use pytorch native or thirdparty optimizers directly.
We will try to make the new design as performant and as efficient as the old one, however if what you need is the extreme performance mentioned as the paper, please refer to release v0.4.6.
The text was updated successfully, but these errors were encountered:
Currently, PatrickStart could train the largest pretrained model with the lowest hardware requirement, i.e. gpu and cpu memory. However, this comes with a price, as we are specializing the design to naive bert and gpt structure as well as adam optimizer. This makes our users hard to use PatrickStar for their latest research project and hard for us to tweak some edge cases to be compatible with popular NLP repos.
Therefore, we decide to refactor PatrickStar to make it simple and flexible. After all, comparing to break record, we prefer creating a handy tool to the NLP community :)
Here are some of the changes we are making now:
We will try to make the new design as performant and as efficient as the old one, however if what you need is the extreme performance mentioned as the paper, please refer to release v0.4.6.
The text was updated successfully, but these errors were encountered: