Release TorchServe v0.11.1 Release Notes · pytorch/serve

This is the release of TorchServe v0.11.1.

Highlights Include

Security Updates
- Token Authorization: TorchServe enforces token authorization by default which requires the correct token to be provided when calling a HTTP/S or gRPC API. This is a security feature which addresses the concern of unauthorized API calls. This is applicable in the scenario where an unauthorized user may try to access a running TorchServe instance. The default behavior is to enable this feature which creates a key file with the appropriate tokens to be used for API calls. Users have the option to disable this feature to prevent token authorization from being required for API calls. For more details, refer to the token authorization documentation: https://github.com/pytorch/serve/blob/master/docs/token_authorization_api.md
- Model API Control: TorchServe disables the ability to register and delete models using HTTP/S or gRPC API calls by default once TorchServe is running. This is a security feature which addresses the concern of unintended registration and deletion of models once TorchServe has started. This is applicable in the scenario where a user may upload malicious code to the model server in the form of a model or where a user may delete a model that is being used. The default behavior prevents users from registering or deleting models once TorchServe is running. Model API control can be enabled to allow users to register and delete models using the TorchServe model load and delete APIs. For more details, refer to the model API control documentation: https://github.com/pytorch/serve/blob/master/docs/model_api_control.md
PyTorch 2.x updates
- Standardized torch.compile configuration
- Added examples for tensorrt & hpu backends
GenAI updates
- Support continuous batching in sequence batch streaming
- Asynchronous backend worker communication for continuous batching
- No code LLM deployment
Support for Intel GPUs

Security Updates

Adding model-control-mode by @udaij12 in #3165
Enable Token Authorization by default by @udaij12 in #3163
Updating night CIs to account for model control and token auth by @udaij12 in #3188
Adding token auth and model api to workflow and https by @udaij12 in #3234
Enable token authorization and model control for gRPC by @namannandan in #3238

PyTorch 2.x Updates

torch compile config standardization update by @agunapal in #3166
Token Authorization fixes by @udaij12 in #3192
Changing mar file for Bert torch compile by @udaij12 in #3175
Fixing torch compile benchmark by @udaij12 in #3179
Add support for hpu_backend and Resnet50 compile example by @wozna in #3182
Update image_classifier/densenet-161 to include torch.compile by @lzcemma in #3200
TensorRT example with torch.compile by @agunapal in #3203
Update documentation for vgg16 to use torch.compile by @ijkilchenko in #3211
BERT with torch.compile by @agunapal in #3201
T5 Translation with torch.compile & TensorRT backend by @agunapal in #3223
Adjust Resnet50 hpu example by @wozna in #3219

GenAI

Support continuous batching in sequence batch streaming case by @lxning in #3160
GPT-FAST-MIXTRAL-MOE integration by @alex-kharlamov in #3151
clean a jobGroup immediately when it finished by @lxning in #3222
Asynchronous worker communication and vllm integration by @mreso in #3146
Add single command LLM deployment by @mreso in #3209
TensorRT-LLM Engine integration by @agunapal in #3228
Adds torch.compile documentation to alexnet example readme by @crmdias in #3227

Support for Intel GPUs

Torchserve support for Intel GPUs by @krish-navulla in #3132
Torchserve Metrics support for Intel GPUs enabled by @krish-navulla in #3141

Documentation

Update supported TS version in security documentation by @namannandan in #3144
Update performance documentation by @agunapal in #3159
model archiver example to multi-line by @GeeCastro in #3155
fix broken llm deployment link by @msaroufim in #3214
Security documentation update by @udaij12 in #3183

Improvements and Bug Fixing

workaround for compile example failure by @agunapal in #3190
Fix Inf2 benchmark by @namannandan in #3177
Make a copy of the torchtext utils to remove dependency by @agunapal in #3076
Pinning setuptools version by @udaij12 in #3152
Fixing Regression test CI GPU and CPU by @udaij12 in #3147
Fixing docker CI by @udaij12 in #3194
Replace pkg_resources.packaging by @udaij12 in #3187
Kserve ci fix by @udaij12 in #3196
Benchmark numpy fix by @udaij12 in #3197
Add workflow dispatch trigger to nightly builds by @agunapal in #3250
Bug fix for kserve build issue and fixing nightly tests by @agunapal in #3251
Remove vllm dependency to not bloat docker image size by @agunapal in #3245
Kserve fix ray & setuptools dependency issue by @udaij12 in #3205
clean a jobGroup immediately when it finished by @lxning in #3222
Updating examples for security tags by @udaij12 in #3224
Fix/llm launcher disable token by @mreso in #3230
Example update by @udaij12 in #3231
Updating docker cuda and github branch by @udaij12 in #3233
Reduce severity of xpu-smi logging by @namannandan in #3239
Upgrade kserve dependencies by @agunapal in #3246
Fix/vllm dependency by @mreso in #3249
Copy remote branch entrypoint to compile and production image stages by @lanxih in #3213
Fix Condition Checking for Intel GPUs Enabling by @Kanya-Mo in #3220

New Contributors

@alex-kharlamov made their first contribution in #3151
@lzcemma made their first contribution in #3200
@wozna made their first contribution in #3182
@krish-navulla made their first contribution in #3132
@ijkilchenko made their first contribution in #3211
@lanxih made their first contribution in #3213
@Kanya-Mo made their first contribution in #3220
@crmdias made their first contribution in #3227

Platform Support

Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe requires Python >= 3.8 and JDK17.

GPU Support Matrix

TorchServe version	PyTorch version	Python	Stable CUDA	Experimental CUDA
0.11.1	2.3.0	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.11.0	2.3.0	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.10.0	2.2.1	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.9.0	2.1	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.8.0	2.0	>=3.8, <=3.11	CUDA 11.7, CUDNN 8.5.0.96	CUDA 11.8, CUDNN 8.7.0.84
0.7.0	1.13	>=3.7, <=3.10	CUDA 11.6, CUDNN 8.3.2.44	CUDA 11.7, CUDNN 8.5.0.96

Inferentia2 Support Matrix

TorchServe version	PyTorch version	Python	Neuron SDK
0.11.1	2.1	>=3.8, <=3.11	2.18.2+
0.11.0	2.1	>=3.8, <=3.11	2.18.2+
0.10.0	1.13	>=3.8, <=3.11	2.16+
0.9.0	1.13	>=3.8, <=3.11	2.13.2+

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TorchServe v0.11.1 Release Notes