Releases: IntelLabs/fastRAG
v3.1.1
What's Changed
- Relax dependencies, add Streaming Callback by @dnoliver in #71
- OpenVINO Serialization fix by @danielfleischer in #73
New Contributors
Full Changelog: v3.1.0...v3.1.1
v3.1.0
What's Changed
- Update llava.py by @mosheber in #54
- Remove indexing function by @mosheber in #55
- IPEX benchmarking fix by @peteriz in #58
- Removing Handlers with Phi3.5 Suppport by @mosheber in #59
- replaced list[str] with List[str] by @mosheber in #67
- Adding files for multi modal pipeline by @mosheber in #68
- Lazy initialization of OVModel by @danielfleischer in #66
- update protobuf version to 5.28.3 by @mosheber in #70
- Update one link in nutrition_data.json by @bilgeyucel in #72
New Contributors
- @bilgeyucel made their first contribution in #72
Full Changelog: v3.0.2...v3.1.0
v3.0.2
v3.0.1
v3.0.0
Compatibility with Haystack v2
- ⚡ All our classes are now compatible with 🤖 Haystack v2, including the example notebooks and yaml pipeline configurations.
- 💻 We based our demos on the Chainlit UI library; examples include RAG chat with multi-modality! 🖼️
❤️ Feel free to report any issue, bug or question!
v2.0.0
fastRAG 2.0: Let's do RAG Efficiently 🔥
fastRAG 2.0 includes new highly-anticipated efficiency-oriented components, an updated chat-like demo experience with multi-modality and improvements to existing components.
The library now utilizes efficient Intel optimizations using Intel extensions for PyTorch (IPEX), 🤗 Optimum Intel and 🤗 Optimum-Habana for running as optimal as possible on Intel® Xeon® Processors and Intel® Gaudi® AI accelerators.
🚀 Intel Habana Gaudi 1 and Gaudi 2 Support
fastRAG is the first RAG framework to support Habana Gaudi accelerators for running LLMs efficiently; more details here.
🌀 Running LLMs with the ONNX Runtime and LlamaCPP Backends
Added support to run quantized LLMs on ONNX runtime and LlamaCPP for higher efficiency and speed for all your RAG pipelines.
⚡ CPU Efficient Embedders
We added support running bi-encoder embedders and cross-encoder ranker as efficiently as possible on Intel CPUs using Intel optimized software.
We integrated the optimized embedders into the following two components:
QuantizedBiEncoderRanker
- bi-encoder rankers; encodes the documents provided in the input and re-orders according to query similarity.QuantizedBiEncoderRetriever
- bi-encoder retriever; encodes documents into vectors given a vectors store engine.
⏳ REPLUG
An implementation of REPLUG, an advanced technique for ensemble prompting of retrieved documents, processing them in parallel and combining their next token predictions for better results.
🏆 New Demos
We updated our demos (and demo page) to include two new demos that depict a chat-like experience plus fusing multi-modality RAG.
🐠 Enhancements
- Added documentation for most models and components, containing examples and notebooks ready to run!
- Support for the Fusion-in-Decoder (FiD) model using a dedicated invocation layer.
- Various bug fixes and compatibility updates supporting the Haystack framework.
Full Changelog: v1.3.0...v2.0
v1.3.0
v1.2.1
v1.2.0: New: Retrieval Augmented Generation with LLM
Retrieval Augmented Generation with LLM Demo (#16) - Added a new RAG + prompt + LLM UI (demo). - Added an example config and notebook. - Updated main README with "updates" sub-section. - Updated `run_demo.py` to include all the options to run a demo (UI, UI + service, UI + <user_defined_service>)