Skip to content

v0.0.16: T5 export and inference, general training fixes

Compare
Choose a tag to compare
@michaelbenayoun michaelbenayoun released this 19 Dec 13:29
· 187 commits to main since this release

What's Changed

Training

A few fixes related to precompilation and checkpoiting. Those fixes enable training LLMs on AWS Trainium instances without friction.

  • Skip model saving during precompilation and provide option to skip cache push (#365)
  • Fixes checkpoint saving and consolidtation for TP (#378)
  • A torch_xla compatible version of safetensors.torch.save_file is now used in the NeuronTrainer (#329)

Inference

  • Support for the export and inference of T5 (#267)
  • New documentation for Stable Diffusion XL Turbo (#374)