Skip to content

Releases: huggingface/optimum-neuron

v0.0.25: SFT Trainer, Llama 3.1-3.2, ControlNet, AWS Neuron SDK 2.20

01 Oct 09:49
Compare
Choose a tag to compare

What's Changed

Inference

Training

Full Changelog: v0.0.24...v0.0.25

v0.0.24: PEFT training support, ControlNet, InstructPix2Pix, Audio models, TGI benchmarks

12 Aug 11:41
Compare
Choose a tag to compare

What's Changed

Training

Inference

TGI

Other changes

New Contributors

Full Changelog: v0.0.23...v0.0.24

v0.0.23: Bump transformers and optimum version

31 May 10:09
Compare
Choose a tag to compare

What's Changed

  • bump required packages versions: transformers==4.41.1, accelerate==0.29.2, optimum==1.20.*

Inference

TGI

  • Fix excessive CPU memory consumption on TGI startup by @dacorvo in #595
  • Avoid clearing all pending requests on early user cancellations by @dacorvo in #609
  • Include tokenizer during export and simplify deployment by @dacorvo in #610

Training

  • Performance improvements and neuron_parallel_compile and gradient checkpointing fixes by @michaelbenayoun in #602

New Contributors

Full Changelog: v0.0.22...v0.0.23

v0.0.22: Mixtral support, pipeline for sentence transformers, compatibility with Compel

07 May 16:51
Compare
Choose a tag to compare

What's Changed

Training

Inference

TGI

  • Set up TGI environment values with the ones used to build the model by @oOraph in #529
  • TGI benchmark with llmperf by @dacorvo in #564
  • Improve tgi env wrapper for neuron by @oOraph in #589

Caveat

Currently traced models with inline_weights_to_neff=False have higher than expected latency during the inference. This is due to the weights are not automatically moved to Neuron devices. The issue will be fixed in #584, please avoid setting inline_weights_to_neff=False in this release.

Other changes

New Contributors

Full Changelog: v0.0.21...v0.0.22

v0.0.21: Expand caching support for inference, GQA training support, TGI improved performance

09 Apr 08:46
Compare
Choose a tag to compare

What's Changed

Training

  • Add GQA optimization for Tensor Parallel training to support the case tp_size > num_key_value_heads by @michaelbenayoun in #498
  • Mixed-precision training with both torch_xla or torch.autocast by @michaelbenayoun in #523

Inference

  • Add caching support for traced TorchScript models (eg. encoders, stable diffusion models) by @JingyaHuang in #510
  • Support phi model on feature-extraction, text-classification, token-classification tasks by @JingyaHuang in #509

TGI

Caveat

AWS Neuron SDK 2.18 doesn't support the compilation of SDXL's unet with weights / neff separation, inline_weights_to_neff=True is forced through:

  • Disable weights / neff separation of SDXL's UNET for neuron sdk 2.18 by @JingyaHuang in #554

Other changes

New Contributors

Full Changelog: v0.0.20...v0.0.21

v0.0.20: Multi-node training, SD Lora, sentence transformers clip, TGI improvements

07 Mar 10:14
Compare
Choose a tag to compare

What's Changed

Training

TGI

  • optimize continuous batching and improve export (#506)

Inference

Doc

Bug fixes

  • inference cache: omit irrelevant config parameters in lookup dy @dacorvo (#494)
  • Optimize disk usage when fetching model checkpoints by @dacorvo (#505)

Full Changelog: v0.0.19...v0.0.20

v0.0.19: AWS Neuron SDK 2.17.0, training cache system, TGI improved batching

19 Feb 15:48
Compare
Choose a tag to compare

What's Changed

Training

TGI

  • Support higher batch sizes using transformers-neuronx continuous batching by @dacorvo in #488
  • Lift max-concurrent-request limitation usingTGI 1.4.1 by @dacorvo in #488

AMI

Major bugfixes

Other changes

New Contributors

Full Changelog: v0.0.18...v0.0.19

v0.0.18: AWS Neuron SDK 2.16.1, NeuronX TGI improvements, PP for Training

01 Feb 10:18
Compare
Choose a tag to compare

What's Changed

AWS SDK

  • Use AWS Neuron SDK 2.16.1 (#449)

Inference

  • Preliminary support for neff/weights decoupling by @JingyaHuang (#402)
  • Allow exporting decoder models using optimum-cli by @dacorvo (#422)
  • Add Neuron X cache registry by @dacorvo (#442)
  • Add StoppingCriteria to generate() of NeuronModelForCausalLM by @dacorvo (#454)

Training

TGI

  • TGI: support vanilla transformer models whose configuration is cached by @dacorvo (#445)

Tutorials and doc improvement

Major bugfixes

  • TGI: correctly identify special tokens during generation by @dacorvo (#438)
  • TGI: do not include the input_text in generated text by @dacorvo (#454)

Other changes

New Contributors

Full Changelog: v0.0.17...v0.0.18

v0.0.17: AWS Neuron SDK 2.16, Mistral, sentence transformers, inference cache

19 Jan 07:19
Compare
Choose a tag to compare

What's Changed

AWS SDK

  • Use AWS Neuron SDK 2.16 (#398)
  • Use offical serialization API for transformers_neuronx models instead of beta by @aws-yishanm (#387, #393)

Inference

  • Improve the support of sentence transformers by @JingyaHuang (#408)
  • Add Neuronx compile cache Hub proxy and use it for LLM decoder models by @dacorvo (#410)
  • Add support for Mistral models by @dacorvo (#411)
  • Do not upload Neuron LLM weights when they can be fetched from the hub by @dacorvo (#413)

Training

Tutorials and doc improvement

Major bugfixes

Other changes

New Contributors

Full Changelog: v0.0.16...v0.0.17

v0.0.16: T5 export and inference, general training fixes

19 Dec 13:29
Compare
Choose a tag to compare

What's Changed

Training

A few fixes related to precompilation and checkpoiting. Those fixes enable training LLMs on AWS Trainium instances without friction.

  • Skip model saving during precompilation and provide option to skip cache push (#365)
  • Fixes checkpoint saving and consolidtation for TP (#378)
  • A torch_xla compatible version of safetensors.torch.save_file is now used in the NeuronTrainer (#329)

Inference

  • Support for the export and inference of T5 (#267)
  • New documentation for Stable Diffusion XL Turbo (#374)