This repository has been archived by the owner on Aug 10, 2023. It is now read-only.
v0.3.3
Pre-release
Pre-release
fix decoding efficiency by moving decoding cache from attention inputs to attention hiddens;
support shared vocabulary pruning of trained models.