Release OpenVINO™ Model Server 2024.2 · openvinotoolkit/model_server

The major new functionality in 2024.2 is a preview feature of OpenAI compatible API for text generation along with state of the art techniques like continuous batching and paged attention for improving efficiency of generative workloads.

Changes and improvements

Updated OpenVINO Runtime backend to 2024.2
OpenVINO Model Server can be now used for text generation use cases using OpenAI compatible API
Added support for continuous batching and PagedAttention algorithms for text generation with fast and efficient in high concurrency load especially on Intel Xeon processors. Learn more about it.
Added LLM text generation OpenAI API demo.
Added notebook showcasing RAG algorithm with online scope changes delegated to the model server. Link
Enabled python 3.12 for python clients, samples and demos.
Updated RedHat UBI base image to 8.10

Breaking changes

No breaking changes.

You can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command:
docker pull openvino/model_server:2024.2 - CPU device support with the image based on Ubuntu 22.04
docker pull openvino/model_server:2024.2-gpu - GPU and CPU device support with the image based on Ubuntu 22.04
or use provided binary packages.
The prebuilt image is available also on RedHat Ecosystem Catalog

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenVINO™ Model Server 2024.2

Changes and improvements

Breaking changes