TorchServe v0.11.1 Release Notes
This is the release of TorchServe v0.11.1.
Highlights Include
- Security Updates
- Token Authorization: TorchServe enforces token authorization by default which requires the correct token to be provided when calling a HTTP/S or gRPC API. This is a security feature which addresses the concern of unauthorized API calls. This is applicable in the scenario where an unauthorized user may try to access a running TorchServe instance. The default behavior is to enable this feature which creates a key file with the appropriate tokens to be used for API calls. Users have the option to disable this feature to prevent token authorization from being required for API calls. For more details, refer to the token authorization documentation: https://github.com/pytorch/serve/blob/master/docs/token_authorization_api.md
- Model API Control: TorchServe disables the ability to register and delete models using HTTP/S or gRPC API calls by default once TorchServe is running. This is a security feature which addresses the concern of unintended registration and deletion of models once TorchServe has started. This is applicable in the scenario where a user may upload malicious code to the model server in the form of a model or where a user may delete a model that is being used. The default behavior prevents users from registering or deleting models once TorchServe is running. Model API control can be enabled to allow users to register and delete models using the TorchServe model load and delete APIs. For more details, refer to the model API control documentation: https://github.com/pytorch/serve/blob/master/docs/model_api_control.md
- PyTorch 2.x updates
- Standardized
torch.compile
configuration - Added examples for
tensorrt
&hpu
backends
- Standardized
- GenAI updates
- Support continuous batching in sequence batch streaming
- Asynchronous backend worker communication for continuous batching
- No code LLM deployment
- Support for Intel GPUs
Security Updates
- Adding model-control-mode by @udaij12 in #3165
- Enable Token Authorization by default by @udaij12 in #3163
- Updating night CIs to account for model control and token auth by @udaij12 in #3188
- Adding token auth and model api to workflow and https by @udaij12 in #3234
- Enable token authorization and model control for gRPC by @namannandan in #3238
PyTorch 2.x Updates
- torch compile config standardization update by @agunapal in #3166
- Token Authorization fixes by @udaij12 in #3192
- Changing mar file for Bert torch compile by @udaij12 in #3175
- Fixing torch compile benchmark by @udaij12 in #3179
- Add support for hpu_backend and Resnet50 compile example by @wozna in #3182
- Update image_classifier/densenet-161 to include torch.compile by @lzcemma in #3200
- TensorRT example with torch.compile by @agunapal in #3203
- Update documentation for vgg16 to use torch.compile by @ijkilchenko in #3211
- BERT with torch.compile by @agunapal in #3201
- T5 Translation with torch.compile & TensorRT backend by @agunapal in #3223
- Adjust Resnet50 hpu example by @wozna in #3219
GenAI
- Support continuous batching in sequence batch streaming case by @lxning in #3160
- GPT-FAST-MIXTRAL-MOE integration by @alex-kharlamov in #3151
- clean a jobGroup immediately when it finished by @lxning in #3222
- Asynchronous worker communication and vllm integration by @mreso in #3146
- Add single command LLM deployment by @mreso in #3209
- TensorRT-LLM Engine integration by @agunapal in #3228
- Adds torch.compile documentation to alexnet example readme by @crmdias in #3227
Support for Intel GPUs
- Torchserve support for Intel GPUs by @krish-navulla in #3132
- Torchserve Metrics support for Intel GPUs enabled by @krish-navulla in #3141
Documentation
- Update supported TS version in security documentation by @namannandan in #3144
- Update performance documentation by @agunapal in #3159
- model archiver example to multi-line by @GeeCastro in #3155
- fix broken llm deployment link by @msaroufim in #3214
- Security documentation update by @udaij12 in #3183
Improvements and Bug Fixing
- workaround for compile example failure by @agunapal in #3190
- Fix Inf2 benchmark by @namannandan in #3177
- Make a copy of the torchtext utils to remove dependency by @agunapal in #3076
- Pinning setuptools version by @udaij12 in #3152
- Fixing Regression test CI GPU and CPU by @udaij12 in #3147
- Fixing docker CI by @udaij12 in #3194
- Replace pkg_resources.packaging by @udaij12 in #3187
- Kserve ci fix by @udaij12 in #3196
- Benchmark numpy fix by @udaij12 in #3197
- Add workflow dispatch trigger to nightly builds by @agunapal in #3250
- Bug fix for kserve build issue and fixing nightly tests by @agunapal in #3251
- Remove vllm dependency to not bloat docker image size by @agunapal in #3245
- Kserve fix ray & setuptools dependency issue by @udaij12 in #3205
- clean a jobGroup immediately when it finished by @lxning in #3222
- Updating examples for security tags by @udaij12 in #3224
- Fix/llm launcher disable token by @mreso in #3230
- Example update by @udaij12 in #3231
- Updating docker cuda and github branch by @udaij12 in #3233
- Reduce severity of xpu-smi logging by @namannandan in #3239
- Upgrade kserve dependencies by @agunapal in #3246
- Fix/vllm dependency by @mreso in #3249
- Copy remote branch entrypoint to compile and production image stages by @lanxih in #3213
- Fix Condition Checking for Intel GPUs Enabling by @Kanya-Mo in #3220
New Contributors
- @alex-kharlamov made their first contribution in #3151
- @lzcemma made their first contribution in #3200
- @wozna made their first contribution in #3182
- @krish-navulla made their first contribution in #3132
- @ijkilchenko made their first contribution in #3211
- @lanxih made their first contribution in #3213
- @Kanya-Mo made their first contribution in #3220
- @crmdias made their first contribution in #3227
Platform Support
Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe requires Python >= 3.8 and JDK17.
GPU Support Matrix
TorchServe version | PyTorch version | Python | Stable CUDA | Experimental CUDA |
---|---|---|---|---|
0.11.1 | 2.3.0 | >=3.8, <=3.11 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 |
0.11.0 | 2.3.0 | >=3.8, <=3.11 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 |
0.10.0 | 2.2.1 | >=3.8, <=3.11 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 |
0.9.0 | 2.1 | >=3.8, <=3.11 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 |
0.8.0 | 2.0 | >=3.8, <=3.11 | CUDA 11.7, CUDNN 8.5.0.96 | CUDA 11.8, CUDNN 8.7.0.84 |
0.7.0 | 1.13 | >=3.7, <=3.10 | CUDA 11.6, CUDNN 8.3.2.44 | CUDA 11.7, CUDNN 8.5.0.96 |
Inferentia2 Support Matrix
TorchServe version | PyTorch version | Python | Neuron SDK |
---|---|---|---|
0.11.1 | 2.1 | >=3.8, <=3.11 | 2.18.2+ |
0.11.0 | 2.1 | >=3.8, <=3.11 | 2.18.2+ |
0.10.0 | 1.13 | >=3.8, <=3.11 | 2.16+ |
0.9.0 | 1.13 | >=3.8, <=3.11 | 2.13.2+ |