Skip to content

OpenVINO™ Model Server 2024.2

Compare
Choose a tag to compare
@dkalinowski dkalinowski released this 17 Jun 13:37
· 2 commits to releases/2024/2 since this release
31ad50a

The major new functionality in 2024.2 is a preview feature of OpenAI compatible API for text generation along with state of the art techniques like continuous batching and paged attention for improving efficiency of generative workloads.

Changes and improvements

  • Updated OpenVINO Runtime backend to 2024.2

  • OpenVINO Model Server can be now used for text generation use cases using OpenAI compatible API

  • Added support for continuous batching and PagedAttention algorithms for text generation with fast and efficient in high concurrency load especially on Intel Xeon processors. Learn more about it.

  • Added LLM text generation OpenAI API demo.

  • Added notebook showcasing RAG algorithm with online scope changes delegated to the model server. Link

  • Enabled python 3.12 for python clients, samples and demos.

  • Updated RedHat UBI base image to 8.10

Breaking changes

No breaking changes.

You can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command:
docker pull openvino/model_server:2024.2 - CPU device support with the image based on Ubuntu 22.04
docker pull openvino/model_server:2024.2-gpu - GPU and CPU device support with the image based on Ubuntu 22.04
or use provided binary packages.
The prebuilt image is available also on RedHat Ecosystem Catalog