Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update torchvision to 0.20.1 #42

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pyup-bot
Copy link
Collaborator

This PR updates torchvision from 0.5.0 to 0.20.1.

Changelog

0.20.0

Highlights

Encoding / Decoding images

Torchvision is further extending its encoding/decoding capabilities. For this version, **we added a WEBP decoder**, and a **batch JPEG decoder on CUDA GPUs**, which can lead to 10X speed-ups over CPU decoding.

We have also improved the UX of our decoding APIs to be more user-friendly. The main entry point is now `torchvision.io.decode_image()`, and it can take as input either a path (as str or `pathlib.Path`), or a tensor containing the raw encoded data.

[Read more on the docs!](https://pytorch.org/vision/stable/io.html)

We also added support for HEIC and AVIF decoding, but these are currently only available when building from source. We are working on making those available directly in the upcoming releases. Stay tuned!


Detailed changes


Bug Fixes

[datasets] Update URL of SBDataset train_noval (8551)
[datasets] EuroSAT: fix SSL certificate issues (8563)
[io] Check average_rate availability in video reader (8548)


New Features

[io] Add batch JPEG GPU decoding (`decode_jpeg()`) (8496)
[io] Add WEBP image decoder: `decode_image()`, `decode_webp()`   (8527, 8612, 8610)
[io] Add HEIC and AVIF decoders, only available when building from source (8597, 8596, 8647, 8613, 8621)


Improvements

[io] Add support for decoding 16bits png (8524)
[io] Allow decoding functions to accept the mode parameter as a string (8627)
[io] Allow `decode_image()` to support paths (8624)
[io] Automatically send video to CPU in io.write_video (8537)
[datasets] Better progress bar for file downloading (8556)
[datasets] Add Path type annotation for ImageFolder (8526)
[ops] Register nms and roi_align Autocast policy for PyTorch Intel GPU backend (8541)
[transforms] Use Sequence for parameters type checking in `transforms.RandomErase` (8615)
[transforms] Support `v2.functional.gaussian_blur` backprop (8486)
[transforms] Expose `transforms.v2` utils for writing custom transforms. (8670)
[utils] Fix f-string in color error message (8639)
[packaging] Revamped and improved debuggability of setup.py build (8535, 8581, 8581, 8582, 8590, 8533, 8528, 8659)
[Documentation] Various documentation improvements (8605, 8611, 8506, 8507, 8539, 8512, 8513, 8583, 8633)
[tests] Various tests improvements (8580, 8553, 8523, 8617, 8518, 8579, 8558, 8617, 8641)
[code quality] Various code quality improvements (8552, 8555, 8516, 8526, 8602, 8615, 8639, 8532)
[ci] 8562, 8644, 8592, 8542, 8594, 8530, 8656


Contributors

We're grateful for our community, which helps us improve torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:


Adam J. Stewart, AJS Payne, Andreas Floros, Andrey Talman, Bhavay Malhotra, Brizar, deekay42, Ehsan, Feng Yuan, Joseph Macaranas, Martin, Masahiro Hiramori, Nicolas Hug, Nikita Shulga , Sergii Dymchenko, Stefan Baumann, venkatram-dev, Wang, Chuanqi

0.19.1

This is a patch release, which is compatible with [PyTorch 2.4.1](https://github.com/pytorch/pytorch/releases/tag/v2.4.1). There are no new features added.

0.19.0

Highlights

Encoding / Decoding images

Torchvision is extending its encoding/decoding capabilities. For this version, **we added a GIF decoder** which is available as `torchvision.io.decode_gif(raw_tensor)`, `torchvision.io.decode_image(raw_tensor)`, and `torchvision.io.read_image(path_to_image)`.

We also **added support for jpeg GPU encoding** in `torchvision.io.encode_jpeg()`. This is 10X faster than the existing CPU jpeg encoder.

[Read more on the docs!](https://pytorch.org/vision/stable/io.html)

Stay tuned for more improvements coming in the next versions. We plan to improve jpeg GPU decoding, and add more image decoders (webp in particular).


Resizing according to the longest edge of an image

It is now possible to resize images by setting `torchvision.transforms.v2.Resize(max_size=N)`: this will resize the longest edge of the image exactly to `max_size`, making sure the image dimension don't exceed this value. [Read more on the docs!](https://pytorch.org/vision/stable/generated/torchvision.transforms.v2.Resize.html#torchvision.transforms.v2.Resize)

Detailed changes

Bug Fixes

[datasets] `SBDataset`: Only download noval file when image_set='train_noval' (8475)
[datasets] Update the download url in class `EMNIST` (8350)
[io] Fix compilation error when there is no `libjpeg` (8342)
[reference scripts] Fix use of `cutmix_alpha` in classification training references (8448)
[utils] Allow `K=1` in `draw_keypoints` (8439)


New Features

[io] Add decoder for GIF images (`decode_gif()`, `decode_image()`,`read_image()`) (8406, 8419)
[transforms] Add `GaussianNoise` transform (8381)

Improvements

[transforms] Allow v2 `Resize` to resize longer edge exactly to `max_size` (8459)
[transforms] Add `min_area` parameter to `SanitizeBoundingBox` (7735)
[transforms] Make `adjust_hue()` work with `numpy 2.0` (8463)
[transforms] Enable one-hot-encoded labels in` MixUp` and` CutMix` (8427)
[transforms] Create kernel on-device for `transforms.functional.gaussian_blur` (8426)
[io] Adding GPU acceleration to `encode_jpeg` (10X faster than CPU encoder) (8391)
[io] `read_video`: accept `BytesIO` objects on `pyav` backend (8442)
[io] Add compatibility with FFMPEG 7.0 (8408)
[datasets] Add extra to install `gdown` (8430)
[datasets] Support encoded `RLE` format in for` COCO` segmentations (8387)
[datasets] Added binary cat vs dog classification target type to Oxford pet dataset (8388)
[datasets] Return labels for `FER2013` if possible (8452)
[ops] Force use of `torch.compile` on deterministic `roi_align` implementation (8436)
[utils] add float support to `utils.draw_bounding_boxes()` (8328)
[feature_extraction] Add concrete_args to feature extraction tracing. (8393)
[Docs] Various documentation improvements (8429, 8467, 8469, 8332, 8262, 8341, 8392, 8386, 8385, 8411).
[Tests] Various testing improvements (8454, 8418, 8480, 8455)
[Code quality] Various code quality improvements (8404, 8402, 8345, 8335, 8481, 8334, 8384, 8451, 8470, 8413, 8414, 8416, 8412)



Contributors

We're grateful for our community, which helps us improve torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:

Adam J. Stewart ahmadsharif1, AJS Payne, Andrew Lingg, Andrey Talman, Anner, Antoine Broyelle, cdzhan, deekay42, drhead, Edward Z. Yang, Emin Orhan, Fangjun Kuang, G, haarisr, Huy Do, Jack Newsom, JavaZero, Mahdi Lamb, Mantas, Nicolas Hug, Nicolas Hug , nihui, Richard Barnes , Richard Zou, Richie Bendall, Robert-André Mauchin, Ross Wightman, Siddarth Ijju, vfdev

0.18.1

This is a patch release, which is compatible with [PyTorch 2.3.1](https://github.com/pytorch/pytorch/releases/tag/v2.3.1). There are no new features added.

0.18.0

BC-Breaking changes

[datasets] [`gdown`](https://github.com/wkentaro/gdown) is now a required dependency for downloading datasets that are on Google Drive. This change was actually introduced in `0.17.1` (repeated here for visibility) (#8237)
[datasets] The `StanfordCars` dataset isn’t available for download anymore. Please follow [these instructions](https://github.com/pytorch/vision/issues/7545#issuecomment-1631441616)  to manually download it (8309, 8324)
[transforms] `to_grayscale` and corresponding transform now always return 3 channels when `num_output_channels=3` (8229)

Bug Fixes 
[datasets] Fix download URL of `EMNIST` dataset (8350)
[datasets] Fix root path expansion in `Kitti` dataset  (8164)
[models] Fix default momentum value of `BatchNorm2d` in `MaxViT` from 0.99 to 0.01 (8312)
[reference scripts] Fix CutMix and MixUp arguments (8287)
[MPS, build] Link essential libraries in cmake (8230)
[build] Fix build with ffmpeg 6.0 (8096)

New Features

[transforms] New GrayscaleToRgb transform (8247)
[transforms] New JPEG augmentation transform (8316)

Improvements

[datasets, io] Added `pathlib.Path` support to datasets and io utilities. (8196, 8200, 8314, 8321)
[datasets] Added `allow_empty` parameter to `ImageFolder` and related utils to support empty classes during image discovery (8311)
[datasets] Raise proper error in `CocoDetection` when a slice is passed (8227)
[io] Added support for EXIF orientation in JPEG and PNG decoders  (8303, 8279, 8342, 8302)
[io] Avoiding unnecessary copies on `io.VideoReader` with `pyav` backend (8173)
[transforms] Allow `SanitizeBoundingBoxes` to sanitize more than  labels (8319)
[transforms] Add `sanitize_bounding_boxes` kernel/functional (8308)
[transforms] Make `perspective` more numerically stable (8249)
[transforms] Allow 2D numpy arrays as inputs for `to_image` (8256)
[transforms] Speed-up `rotate` for 90, 180, 270 degrees (8295)
[transforms] Enabled torch compile on `affine` transform (8218)
[transforms] Avoid some graph breaks in transforms (8171)
[utils] Add float support to `draw_keypoints` (8276)
[utils] Add `visibility` parameter to `draw_keypoints` (8225)
[utils] Add float support to `draw_segmentation_masks` (8150)
[utils] Better show overlap section of masks in  `draw_segmentation_masks` (8213)
[Docs] Various documentation improvements (8341, 8332, 8198, 8318, 8202, 8246, 8208, 8231, 8300, 8197)
[code quality] Various code quality improvements (8273, 8335, 8234, 8345, 8334, 8119, 8251, 8329, 8217, 8180, 8105, 8280, 8161, 8313)


Contributors

We're grateful for our community, which helps us improve torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:


Adam Dangoor Ahmad Sharif , ahmadsharif1, Andrey Talman, Anner, anthony-cabacungan, Arun Sathiya, Brizar, Brizar , cdzhan, Danylo Baibak, Huy Do, Ivan Magazinnik, JavaZero, Johan Edstedt, Li-Huai (Allan) Lin, Mantas, Mark Harfouche, Mithra, Nicolas Hug, Nicolas Hug , nihui, Philip Meier, Philip Meier , RazaProdigy , Richard Barnes , Riza Velioglu, sam-watts, Santiago Castro, Sergii Dymchenko, Syed Raza, talcs, Thien Tran, Thien Tran , TilmannR, Tobias Fischer, vfdev, vfdev , Zhu Lin Ch'ng, Zoltán Böszörményi.

0.17.2

This is a patch release, which is compatible with [PyTorch 2.2.2](https://github.com/pytorch/pytorch/releases/tag/v2.2.2). There are no new features added.

0.17.1

This is a patch release, which is compatible with [PyTorch 2.2.1](https://github.com/pytorch/pytorch/releases/tag/v2.2.1).

Bug Fixes

* Add `gdown` dependency to support downloading datasets from Google Drive (https://github.com/pytorch/vision/pull/8237)
* Fix silent correctness with `convert_bounding_box_format`  when passing string parameters (https://github.com/pytorch/vision/issues/8258)

0.17.0

Highlights

The V2 transforms are now stable!

The `torchvision.transforms.v2` namespace was still in BETA stage until now. It is now stable! Whether you’re new to Torchvision transforms, or you’re already experienced with them, we encourage you to start with [Getting started with transforms v2](https://pytorch.org/vision/stable/auto_examples/transforms/plot_transforms_getting_started.html#sphx-glr-auto-examples-transforms-plot-transforms-getting-started-py) in order to learn more about what can be done with the new v2 transforms.

Browse our [main docs](https://pytorch.org/vision/stable/transforms.html#) for general information and performance tips. The available transforms and functionals are listed in the [API reference](https://pytorch.org/vision/stable/transforms.html#v2-api-ref). Additional information and tutorials can also be found in our [example gallery](https://pytorch.org/vision/stable/auto_examples/index.html#gallery), e.g. [Transforms v2: End-to-end object detection/segmentation example](https://pytorch.org/vision/stable/auto_examples/transforms/plot_transforms_e2e.html#sphx-glr-auto-examples-transforms-plot-transforms-e2e-py) or [How to write your own v2 transforms](https://pytorch.org/vision/stable/auto_examples/transforms/plot_custom_transforms.html#sphx-glr-auto-examples-transforms-plot-custom-transforms-py).

Towards `torch.compile()` support

We are progressively adding support for `torch.compile()` to torchvision interfaces, reducing graph breaks and allowing dynamic shape.

The torchvision ops (`nms`, `[ps_]roi_align`, `[ps_]roi_pool` and `deform_conv_2d`) are now compatible with `torch.compile` and dynamic shapes.

On the transforms side, the majority of [low-level kernels](https://github.com/pytorch/vision/blob/main/torchvision/transforms/v2/functional/__init__.py) (like `resize_image()` or `crop_image()`) should compile properly without graph breaks and with dynamic shapes. We are still addressing the remaining edge-cases, moving up towards full functional support and classes, and you should expect more progress on that front with the next release.

---------


Detailed Changes

Breaking changes / Finalizing deprecations

- [transforms] We changed the default of the  `antialias` parameter from None to True, in all transforms that perform resizing. This change of default has been communicated in previous versions, and should drastically reduce the amount of bugs/surprises as it aligns the tensor backend with the PIL backend. Simply put: **from now on, antialias is always applied when resizing (with bilinear or bicubic modes), whether you're using tensors or PIL images**. This change only affects the tensor backend, as PIL always applies antialias anyway. (7949)
- [transforms] We removed the `torchvision.transforms.functional_tensor.py` and `torchvision.transforms.functional_pil.py` modules, as these had been deprecated for a while. Use the public functionals from `torchvision.transforms.v2.functional` instead. (7953)
- [video] Remove deprecated path parameter to VideoReader and made src mandatory (8125)
- [transforms] `to_pil_image` now provides the same output for equivalent numpy arrays and tensor inputs (8097)


Bug Fixes

[datasets] Fix root path expansion in datasets.Kitti (8165)
[transforms] allow sequence fill for v2 AA scripted (7919)
[reference scripts] Fix quantized references (8073)
[reference scripts] Fix IoUs reported in segmentation references (7916)


New Features

[datasets] add Imagenette dataset (8139)

Improvements

[transforms] The v2 transforms are now officially stable and out of BETA stage (8111)
[ops] The ops (`[ps_]roi_align`, `ps_[roi_pool]`, `deform_conv_2d`) are now compatible with `torch.compile` and dynamic shapes (8061, 8049, 8062, 8063, 7942, 7944)
[models] Allow custom `atrous_rates` for deeplabv3_mobilenet_v3_large (8019)
[transforms] allow float fill for integer images in F.pad (7950)
[transforms] allow len 1 sequences for fill with PIL (7928)
[transforms] allow size to be generic Sequence in Resize (7999)
[transforms] Making root parameter optional for Vision Dataset (8124)
[transforms] Added support for tv tensors in torch compile for func ops (8110)
[transforms] Reduced number of graphs for compiled resize (8108)
[misc] Various fixes for S390x support (8149)
[Docs] Various Documentation enhancements (8007, 8014, 7940, 7989, 7993, 8114, 8117, 8121, 7978, 8002, 7957, 7907, 8000, 7963)
[Tests] Various test enhancements (8032, 7927, 7933, 7934, 7935, 7939, 7946, 7943, 7968, 7967, 8033, 7975, 7954, 8001, 7962, 8003, 8011, 8012, 8013, 8023, 7973, 7970, 7976, 8037, 8052, 7982, 8145, 8148, 8144, 8058, 8057, 7961, 8132, 8133, 8160)
[Code Quality] (8077, 8070, 8004, 8113, 

Contributors

We're grateful for our community, which helps us improve torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:

Aleksei Nikiforov. Alex Wei, Andrey Talman, Chunyuan WU, CptCaptain, Edward Z. Yang, Gu Wang, Haochen Yu, Huy Do, Jeff Daily, Josh Levy-Kramer, moto, Nicolas Hug, NVS Abhilash, Omkar Salpekar, Philip Meier, Sergii Dymchenko, Siddharth Singh, Thiago Crepaldi, Thomas Fritz, TilmannR, vfdev-5, Zeeshan Khan Suri.

0.16.2

This is a patch release, which is compatible with [PyTorch 2.1.2](https://github.com/pytorch/pytorch/releases/tag/v2.1.2). There are no new features added.

0.16.1

This is a minor release that only contains bug-fixes

Bug Fixes

* [models] Fix download of efficientnet weights (8036)
* [transforms] Fix v2 transforms in spawn multi-processing context (8067)

0.16.0

Highlights

[BETA] Transforms and augmentations

![sphx_glr_plot_transforms_getting_started_004](https://github.com/pytorch/vision/assets/1190450/fc42eabe-d3fe-40c1-8365-2177e389521b)


Major speedups

The new transforms in `torchvision.transforms.v2` support image classification, segmentation, detection, and video tasks. They are now [10%-40% faster](https://github.com/pytorch/vision/issues/7497#issuecomment-1557478635) than before! This is mostly achieved thanks to 2X-4X improvements made to `v2.Resize()`, which now supports native `uint8` tensors for Bilinear and Bicubic mode. Output results are also now closer to PIL's! Check out our [performance recommendations](https://pytorch.org/vision/stable/transforms.html#performance-considerations) to learn more.

Additionally, `torchvision` now ships with `libjpeg-turbo` instead of `libjpeg`, which should significantly speed-up the jpeg decoding utilities ([`read_image`](https://pytorch.org/vision/stable/generated/torchvision.io.read_image.html#torchvision.io.read_image), [`decode_jpeg`](https://pytorch.org/vision/stable/generated/torchvision.io.read_image.html#torchvision.io.decode_jpeg)), and avoid compatibility issues with PIL.

CutMix and MixUp

Long-awaited support for the `CutMix` and `MixUp` augmentations is now here! Check [our tutorial](https://pytorch.org/vision/stable/auto_examples/transforms/plot_cutmix_mixup.html#sphx-glr-auto-examples-transforms-plot-cutmix-mixup-py) to learn how to use them.

Towards stable V2 transforms

In the [previous release 0.15](https://github.com/pytorch/vision/releases/tag/v0.15.1) we BETA-released a new set of transforms in `torchvision.transforms.v2` with native support for tasks like segmentation, detection, or videos. We have now stabilized the design decisions of these transforms and made further improvements in terms of speedups, usability, new transforms support, etc.

We're keeping the `torchvision.transforms.v2` and `torchvision.tv_tensors` namespaces as BETA until 0.17 out of precaution, but we do not expect disruptive API changes in the future.

Whether you’re new to Torchvision transforms, or you’re already experienced with them, we encourage you to start with [Getting started with transforms v2](https://pytorch.org/vision/stable/auto_examples/transforms/plot_transforms_getting_started.html#sphx-glr-auto-examples-transforms-plot-transforms-getting-started-py) in order to learn more about what can be done with the new v2 transforms.

Browse our [main docs](https://pytorch.org/vision/stable/transforms.html#) for general information and performance tips. The available transforms and functionals are listed in the [API reference](https://pytorch.org/vision/stable/transforms.html#v2-api-ref). Additional information and tutorials can also be found in our [example gallery](https://pytorch.org/vision/stable/auto_examples/index.html#gallery), e.g. [Transforms v2: End-to-end object detection/segmentation example](https://pytorch.org/vision/stable/auto_examples/transforms/plot_transforms_e2e.html#sphx-glr-auto-examples-transforms-plot-transforms-e2e-py) or [How to write your own v2 transforms](https://pytorch.org/vision/stable/auto_examples/transforms/plot_custom_transforms.html#sphx-glr-auto-examples-transforms-plot-custom-transforms-py).

[BETA] MPS support

The `nms` and roi-align kernels (`roi_align`, `roi_pool`, `ps_roi_align`, `ps_roi_pool`) now support MPS. Thanks to [Li-Huai (Allan) Lin](https://github.com/qqaatw) for this contribution!


---------


Detailed Changes

Deprecations / Breaking changes

All changes below happened in the `transforms.v2` and `datapoints` namespaces, which were BETA and protected with a warning. **We do not expect other disruptive changes to these APIs moving forward!**

[transforms.v2] `to_grayscale()` is not deprecated anymore (7707)
[transforms.v2] Renaming: `torchvision.datapoints.Datapoint` ->  `torchvision.tv_tensors.TVTensor` (7904, 7894)
[transforms.v2] Renaming: `BoundingBox` -> `BoundingBoxes` (7778)
[transforms.v2] Renaming: `BoundingBoxes.spatial_size` -> `BoundingBoxes.canvas_size` (7734)
[transforms.v2] All public method on `TVTensor` classes (previously: `Datapoint` classes) were removed
[transforms.v2] `transforms.v2.utils` is now private. (7863)
[transforms.v2] Remove `wrap_like` class method and add `tv_tensors.wrap()` function (7832)

New Features

[transforms.v2] Add support for `MixUp` and `CutMix` (7731, 7784)
[transforms.v2] Add `PermuteChannels` transform (7624)
[transforms.v2] Add `ToPureTensor` transform (7823)
[ops] Add MPS kernels for `nms` and `roi` ops (7643)

Improvements

[io] Added support for CMYK images in `decode_jpeg` (7741)
[io] Package torchvision with  `libjpeg-turbo` instead of `libjpeg` (7672, 7840)
[models] Downloaded weights are now sha256-validated (7219)
[transforms.v2] Massive `Resize` speed-up by adding native `uint8` support for bilinear and bicubic modes (7557, 7668)
[transforms.v2] Enforce pickleability for v2 transforms and wrapped datasets (7860)
[transforms.v2] Allow catch-all "others" key in `fill` dicts. (7779)
[transforms.v2] Allow passthrough for `Resize` (7521)
[transforms.v2] Add `scale` option to `ToDtype`. Remove `ConvertDtype`. (7759, 7862)
[transforms.v2] Improve UX for `Compose` (7758)
[transforms.v2] Allow users to choose whether to return `TVTensor` subclasses or pure `Tensor` (7825)
[transforms.v2] Remove import-time warning for v2 namespaces (7853, 7897)
[transforms.v2] Speedup `hsv2rgb` (7754)
[models] Add `filter` parameters to `list_models()` (7718)
[models] Assert `RAFT` input resolution is 128 x 128 or higher (7339)
[ops] Replaced `gpuAtomicAdd` by `fastAtomicAdd` (7596)
[utils] Add GPU support for `draw_segmentation_masks` (7684)
[ops] Add deterministic, pure-Python `roi_align` implementation (7587)
[tv_tensors] Make `TVTensors` deepcopyable (7701)
[datasets] Only return small set of targets by default from dataset wrapper (7488)
[references] Added support for v2 transforms and `tensors` / `tv_tensors` backends (7732, 7511, 7869, 7665, 7629, 7743, 7724, 7742)
[doc] A lot of documentation improvements (7503, 7843, 7845, 7836, 7830, 7826, 7484, 7795, 7480, 7772, 7847, 7695, 7655, 7906, 7889, 7883, 7881, 7867, 7755, 7870, 7849, 7854, 7858, 7621, 7857, 7864, 7487, 7859, 7877, 7536, 7886, 7679, 7793, 7514, 7789, 7688, 7576, 7600, 7580, 7567, 7459, 7516, 7851, 7730, 7565, 7777)

Bug Fixes

[datasets] Fix `split=None` in `MovingMNIST` (7449)
[io] Fix heap buffer overflow in `decode_png` (7691)
[io] Fix blurry screen in video decoder (7552)
[models] Fix weight download URLs for some models (7898)
[models] Fix `ShuffleNet` ONNX export (7686)
[models] Fix detection models with pytorch 2.0 (7592, 7448)
[ops] Fix segfault in `DeformConv2d` when `mask` is None (7632)
[transforms.v2] Stricter `SanitizeBoundingBoxes` `labels_getter` heuristic (7880)
[transforms.v2] Make sure `RandomPhotometricDistort` transforms all images the same (7442)
[transforms.v2] Fix `v2.Lambda`’s transformed types (7566)
[transforms.v2] Don't call `round()` on float images for `Resize` (7669)
[transforms.v2] Let `SanitizeBoundingBoxes` preserve output type (7446)
[transforms.v2] Fixed int type support for sigma in `GaussianBlur` (7887)
[transforms.v2] Fixed issue with jitted `AutoAugment` transforms (7839)
[transforms] Fix `Resize` pass-through logic (7519)
[utils] Fix color in `draw_segmentation_masks` (7520)



Others

[tests] Various test improvements / fixes (7693, 7816, 7477, 7783, 7716, 7355, 7879, 7874, 7882, 7447, 7856, 7892, 7902, 7884, 7562, 7713, 7708, 7712, 7703, 7641, 7855, 7842, 7717, 7905, 7553, 7678, 7908, 7812, 7646, 7841, 7768, 7828, 7820, 7550, 7546, 7833, 7583, 7810, 7625, 7651)
[CI] Various CI improvements (7485, 7417, 7526, 7834, 7622, 7611, 7872, 7628, 7499, 7616, 7475, 7639, 7498, 7467, 7466, 7441, 7524, 7648, 7640, 7551, 7479, 7634, 7645, 7578, 7572, 7571, 7591, 7470, 7574, 7569, 7435, 7635, 7590, 7589, 7582, 7656, 7900, 7815, 7555, 7694, 7558, 7533, 7547, 7505, 7502, 7540, 7573)
[Code Quality]  Various code quality improvements (7559, 7673, 7677, 7771, 7770, 7710, 7709, 7687, 7454, 7464, 7527, 7462, 7662, 7593, 7797, 7805, 7786, 7831, 7829, 7846, 7806, 7814, 7606, 7613, 7608, 7597, 7792, 7781, 7685, 7702, 7500, 7804, 7747, 7835, 7726, 7796)

Contributors

We're grateful for our community, which helps us improve torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:
Adam J. Stewart, Aditya Oke , Andrey Talman, Camilo De La Torre, Christoph Reich, Danylo Baibak, David Chiu, David Garcia, Dennis M. Pöpperl, Dhuige, Duc Mguyen, Edward Z. Yang, Eric Sauser , Fansure Grin, Huy Do, Illia Vysochyn, Johannes, Kai Wana, Kobrin Eli, kurtamohler, Li-Huai (Allan) Lin, Liron Ilouz, Masahiro Hiramori, Mateusz Guzek, Max Chuprov, Minh-Long Luu (刘明龙), Minliang Lin, mpearce25, Nicolas Granger, Nicolas Hug , Nikita Shulga, Omkar Salpekar, Paul Mulders, Philip Meier , ptrblck, puhuk, Radek Bartoň, Richard Barnes , Riza Velioglu, Sahil Goyal, Shu, Sim Sun, SvenDS9, Tommaso Bianconcini, Vadim Zubov, vfdev-5

0.15.2

This is a minor release, which is compatible with [PyTorch 2.0.1](https://github.com/pytorch/pytorch/releases/tag/v2.0.1) and contains some minor bug fixes.

Highlights

Bug Fixes
- Move parameter sampling of v2.RandomPhotometricDistort into _get_params https://github.com/pytorch/vision/pull/7442
- Fix split parameter for MovingMNIST https://github.com/pytorch/vision/pull/7449
- Prevent unwrapping in v2.SanitizeBoundingBoxes https://github.com/pytorch/vision/pull/7446

0.15.1

Highlights
[[BETA](https://pytorch.org/blog/pytorch-feature-classification-changes/#beta)]  New transforms API
TorchVision is extending its Transforms API! Here is what’s new:
- You can use them not only for Image Classification but also for Object Detection, Instance & Semantic Segmentation and Video Classification.
- You can use new functional transforms for transforming Videos, Bounding Boxes and Segmentation Masks.

The API is **completely backward compatible** with the previous one, and remains the same to assist the migration and adoption. We are now releasing this new API as Beta in the `torchvision.transforms.v2` namespace, and we would love to get early feedback from you to improve its functionality. Please [reach out to us](https://github.com/pytorch/vision/issues/6753) if you have any questions or suggestions.

py
import torchvision.transforms.v2 as transforms

Exactly the same interface as V1:
trans = transforms.Compose([
 transforms.ColorJitter(contrast=0.5),
 transforms.RandomRotation(30),
 transforms.CenterCrop(480),
])
imgs, bboxes, masks, labels = trans(imgs, bboxes, masks, labels)


You can read more about these new transforms in our [docs](https://pytorch.org/vision/main/transforms.html), and you can also check out our examples:

- [End-to-end object detection example
](https://pytorch.org/vision/stable/auto_examples/plot_transforms_v2_e2e.html#sphx-glr-auto-examples-plot-transforms-v2-e2e-py)
- [Getting started with transforms v2
](https://pytorch.org/vision/stable/auto_examples/plot_transforms_v2.html#sphx-glr-auto-examples-plot-transforms-v2-py)

Note that this API is still Beta. **While we do not expect major breaking changes, some APIs may still change according to user feedback**. Please submit any feedback you may have in  https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes.

[[BETA](https://pytorch.org/blog/pytorch-feature-classification-changes/#beta)]  New Video Swin Transformer

We added a Video SwinTransformer model is based on the [Video Swin Transformer](https://arxiv.org/abs/2106.13230) paper.

py
import torch
from torchvision.models.video import swin3d_t

video = torch.rand(1, 3, 32, 800, 600)
or swin3d_b, swin3d_s
model = swin3d_t(weights="DEFAULT")
model.eval()
with torch.inference_mode():
 prediction = model(video)
print(prediction)


The model has the following accuracies on the Kinetics-400 dataset:

| Model | Acc1 | Acc5 |
| --- | ----------- | --------- |

0.14.1

This is a minor release, which is compatible with [PyTorch 1.13.1](https://github.com/pytorch/pytorch/releases/tag/v1.13.1). There are no new features added.

0.14.0

**Highlights**


[[BETA](https://pytorch.org/blog/pytorch-feature-classification-changes/#beta)] New Model Registration API

Following up on the [multi-weight support API](https://pytorch.org/blog/introducing-torchvision-new-multi-weight-support-api/) that was released on the previous version, we have added a new [model registration API](https://pytorch.org/blog/easily-list-and-initialize-models-with-new-apis-in-torchvision/) to help users retrieve models and weights. There are now 4 new methods under the `torchvision.models` module: `get_model`, `get_model_weights`, `get_weight`, and `list_models`. Here are examples of how we can use them:


python
import torchvision
from torchvision.models import get_model, get_model_weights, list_models


max_params = 5000000

tiny_models = []
for model_name in list_models(module=torchvision.models):
 weights_enum = get_model_weights(model_name)
 if len([w for w in weights_enum if w.meta["num_params"] <= max_params]) > 0:
     tiny_models.append(model_name)

print(tiny_models)
['mnasnet0_5', 'mnasnet0_75', 'mnasnet1_0', 'mobilenet_v2', ...]

model = get_model(tiny_models[0], weights="DEFAULT")
print(sum(x.numel() for x in model.state_dict().values()))
2239188



As of now, this API is still [beta](https://pytorch.org/blog/pytorch-feature-classification-changes/#beta) and there might be changes in the future in order to improve its usability based on your [feedback](https://github.com/pytorch/vision/issues/6365).


New Architecture and Model Variants


Classification Models

We’ve added the Swin Transformer V2 architecture along with pre-trained weights for its tiny/small/base variants. In addition, we have added support for the MaxViT transformer. Here is an example on how to use the models:


python
import torch
from torchvision.models import *

image = torch.rand(1, 3, 224, 224)
model = swin_v2_t(weights="DEFAULT").eval()
model = maxvit_t(weights="DEFAULT").eval()
prediction = model(image)



Here is the table showing the accuracy of the models tested on ImageNet1K dataset.


<table>
<tr>
<td><strong>Model</strong>
</td>
<td><strong>Acc1</strong>
</td>
<td><strong>Acc1</strong>
<p>
<strong>change over V1</strong>
</td>
<td><strong>Acc5</strong>
</td>
<td><strong>Acc5</strong>
<p>
<strong>change over V1</strong>
</td>
</tr>
<tr>
<td>swin_v2_t
</td>
<td><p style="text-align: right">

0.13.1

This minor release bumps the pinned PyTorch version to v1.12.1 and contains some minor bug fixes.

Highlights

Bug Fixes
- Small Patch SwinTransformer for FX compatibility https://github.com/pytorch/vision/pull/6252
- Indicate strings can be used to specify weights parameter  https://github.com/pytorch/vision/pull/6314
- Fix d/c IoU for different batch sizes https://github.com/pytorch/vision/pull/6338

0.13

py
from torchvision.models import *

0.13.0

Highlights

Models

Multi-weight support API

0.12.0

Highlights

New Models

Four new model families have been released in the latest version along with pre-trained weights for their variants: FCOS, RAFT, Vision Transformer (ViT) and ConvNeXt.

Object Detection

[FCOS](https://arxiv.org/pdf/1904.01355.pdf) is a popular, fully convolutional, anchor-free model for object detection. In this release we include a community-contributed model implementation as well as pre-trained weights. The model was trained on  COCO train2017 and can be used as follows:

python
import torch
from torchvision import models

x = [torch.rand(3, 224, 224)]
fcos = models.detection.fcos_resnet50_fpn(pretrained=True).eval()
predictions =  fcos(x)


The box AP of the pre-trained model on COCO val2017 is 39.2 (see [4961](https://github.com/pytorch/vision/pull/4961) for more details).

We would like to thank [Hu Ye](https://github.com/xiaohu2015) and [Zhiqiang Wang](https://github.com/zhiqwang) for contributing to the model implementation and initial training. This was the first community-contributed model in a long while, and given its success, we decided to use the learnings from this process and create a new [model contribution guidelines](https://github.com/pytorch/vision/blob/main/CONTRIBUTING_MODELS.md).  

Optical Flow support and RAFT model

Torchvision now supports optical flow! Optical flow models try to predict movement in a video: given two consecutive frames, the model predicts where each pixel of the first frame ends up in the second frame. Check out our [new tutorial on Optical Flow](https://pytorch.org/vision/0.12/auto_examples/plot_optical_flow.html#sphx-glr-auto-examples-plot-optical-flow-py)!

We implemented a torchscript-compatible [RAFT](https://arxiv.org/abs/2003.12039) model with pre-trained weights (both normal and “small” versions), and added support for [training and evaluating](https://github.com/pytorch/vision/tree/main/references/optical_flow) optical flow models. Our training scripts support distributed training across processes and nodes, leading to much faster training time than the original implementation. We also added 5 new [optical flow datasets](https://pytorch.org/vision/0.12/datasets.html#optical-flow): Flying Chairs, Flying Things, Sintel, Kitti, and HD1K.

![raft](https://github.com/pytorch/vision/releases/download/v0.12.0/raft.png "image_tooltip")

Image Classification

[Vision Transformer](https://arxiv.org/abs/2010.11929) (ViT) and [ConvNeXt](https://arxiv.org/abs/2201.03545) are two popular architectures which can be used as image classifiers or as backbones for downstream vision tasks. In this release we include 8 pre-trained weights for their classification variants. The models were trained on ImageNet and can be used as follows:

python
import torch
from torchvision import models

x = torch.rand(1, 3, 224, 224)
vit = models.vit_b_16(pretrained=True).eval()
convnext = models.convnext_tiny(pretrained=True).eval()
predictions1 = vit(x)
predictions2 = convnext(x)


The accuracies of the pre-trained models obtained on ImageNet val are seen below:

|Model	|Acc1	|Acc5	|
|---	|---	|---	|
|vit_b_16|81.072|95.318|
|vit_b_32|75.912|92.466|
|vit_l_16|79.662|94.638|
|vit_l_32|76.972|93.07|
|convnext_tiny|82.52|96.146|
|convnext_small|83.616|96.65|
|convnext_base|84.062|96.87|
|convnext_large|84.414|96.976|

The above models have been trained using an adjusted version of our new [training recipe](https://pytorch.org/blog/how-to-train-state-of-the-art-models-using-torchvision-latest-primitives/) and this allows us to offer models with accuracies significantly higher than the ones on the original papers.

GPU Video Decoding

In this release, we add support for GPU video decoding in the video reading API. To use hardware-accelerated decoding, we just need to pass a cuda device to the video reading API as shown below:

python
import torchvision

reader = torchvision.io.VideoReader(file_name, device='cuda:0')
for frame in reader:
 print(frame)


We also support seeking to anyframe or a keyframe in the video before reading, as shown below:

python
reader.seek(seek_time)


New Datasets

We have implemented 14 new [classification datasets](https://pytorch.org/vision/0.12/datasets.html#image-classification): CLEVR, GTSRB, FER2013, SUN397, Country211, Flowers102, fvgc_aircraft, OxfordIIITPet, DTD, Food 101, Rendered SST2, Stanford cars, PCAM, and EuroSAT.

As part of our work on Optical Flow support (see above for more details), we also added 5 new [optical flow datasets](https://pytorch.org/vision/0.12/datasets.html#optical-flow): Flying Chairs, Flying Things, Sintel, Kitti, and HD1K.

Documentation

New documentation layout

We have updated our documentation pages to be more compact and easier to browse. Each function / class is now documented in a separate page, clearing up some space in the per-module pages, and easing the discovery of the proposed APIs. Compare e.g. our [previous docs](https://pytorch.org/vision/0.11/transforms.html) vs the [new ones](https://pytorch.org/vision/0.12/transforms.html). Please let us know if you have any feedback!

Model contribution guidelines

New [model contribution guidelines](https://github.com/pytorch/vision/blob/main/CONTRIBUTING_MODELS.md) have been published  following the success of the [FCOS](https://www.google.com/url?q=https://github.com/pytorch/vision/pull/4961&sa=D&source=docs&ust=1645630832795238&usg=AOvVaw3IyBB6Eso_MWxSS_R0QZMk) model which was contributed by the community. These guidelines aim to be an overview of the model contribution process for anyone who would like to suggest, implement and train a new model.

Upcoming Prototype APIs

We are currently working on a prototype API which adds Multi-weight support on all of our model builder methods. This will enable us to offer multiple pre-trained weights, associated with their meta-data and inference transforms. The API is still under review and thus was not included in the release but you can read more about it on our [blogpost](https://pytorch.org/blog/introducing-torchvision-new-multi-weight-support-api/) and provide your feedback on the dedicated [Github issue](https://github.com/pytorch/vision/issues/5088).

Changes in our deprecation policy

Up until now, torchvision would almost never remove deprecated APIs. In order to be more [aligned and consistent with pytorch core](https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy), we are updating our deprecation policy. We are now following a 2-release deprecation cycle: deprecated APIs will raise a warning for 2 versions, and will be removed after that. To reflect these changes and to smooth the transition, we have decided to:

* Remove all APIs that had been deprecated before or on v0.8, released 1.5 years ago.
* Update the removal timeline of all other deprecated APIs to v0.14, to reflect the new 2-cycle policy starting now in v0.12.

Backward-incompatible changes

[models.quantization] Removed the Quantized shufflenet_v2_x1_5 and shufflenet_v2_x2_0 model builders which had no associated weights, rendering them useless. Additionally we added pre-trained weights for the shufflenet_v2_x0_5 quantized variant.. ([4854](https://github.com/pytorch/vision/pull/4854))
[ops] Change to stable sort in nms implementations - this change can lead to different behavior in rare cases therefore it has been flagged as backwards-incompatible  ([4767](https://github.com/pytorch/vision/pull/4767))
[transforms] Changed the center and the parametrization of shear X/Y in Auto Augment transforms to align with the original papers ([5285](https://github.com/pytorch/vision/pull/5285)) ([#5384](https://github.com/pytorch/vision/pull/5384))

Deprecations

Note: in order to be more aligned with pytorch core, we are updating our deprecation policy. Please read more above in the “Highlights” section.

[ops] The `ops.poolers.MultiScaleRoIAlign` public methods `setup_setup_scales`, `convert_to_roi_format`, and `infer_scale` have been deprecated and will be removed in 0.14 (4951) (4810)

New Features

[datasets] New optical flow datasets added: FlyingChairs,  Kitti, Sintel, FlyingThings3D, and HD1K (4860) (4845) (4858) (4890) (5004) (4889) (4888) (4870)
[datasets] New classification datasets support for FLAVA: CLEVR, GTSRB, FER2013, SUN397, Country211, Flowers102, fvgc_aircraft, OxfordIIITPet, DTD, Food 101, Rendered SST2, Stanford cars, PCAM, and EuroSAT (5120) (5130) (5117) (5132) (5138) (5177) (5178) (5116) (5115) (5119) (5220) (5166) (5203) (5114) (5164) (5280)
[models] Add VisionTransformer model (5173) (5210) (5172) (5085) (5226) (5025) (5086) (5159)
[models] Add ConvNeXt model (5330) (5253)
[models] Add RAFT models and support for optical flow model training (5022) (5070) (5174) (5381) (5078) (5076) (5081) (5079) (5026) (5027) (5082) (5060)  (4868)  (4657) (4732)
[models] Add FCOS model (4961) (5267)
[utils] Add utility to convert optical flow to an image (5134) (5308)
[utils] Add utility to draw keypoints (4216)
[video] Add video GPU decoder (5019) (5191) (5215) (5256) (4474) (3179) (4878) (5328) (5327) (5183) (4947) (5192)

Improvements

[datasets] Migrate mnist dataset from np.frombuffer (4598)
[io, tests] Switch from np.frombuffer to torch.frombuffer (4578)
[models] Update ResNet-50 accuracy with Repeated Augmentation (5201)
[models] Add regnet_y_128gf factory function, and several regnet model weights (5176) (4530)
[models] Adding min_size to classification and video models  (5223)
[models] Remove in-place mutation in DefaultBoxGenerator (5279)
[models] Added Dropout parameter to Models Constructors (4580)
[models] Allow to use custom norm_layer (4621)
[models] Add IntermediateLayerGetter on segmentation (5298)
[models] Use FX feature extractor for segm model (4563)
[models, ops, io] Add model, ops and io usage logging (4956) (4735) (4736) (4737) (5044) (4799) (5095) (5038)
[models.quantization] Implement is_qat in TorchVision (5299)
[models.quantization] Cleanup Quantized ShuffleNet (4854)
[models.quantization] Adding new Quantized models (4969)
[ops] [FBcode->GH] Fix missing kernel guards (4620) (4743)
[ops] Expose misc ops at package level (4812)
[ops] Fix giou naming bug (5270)
[ops] Change batched NMS threshold to choose for-loop version (4990)
[ops] Add bias parameter to ConvNormActivation (5012)
[ops] Feature extraction default arguments -  ops (4810)
[ops] Change to stable sort in nms implementations (4767)
[reference scripts] Support amp training (4923) (4933) (4994) (4547) (4570)
[reference scripts] Add types and improve descriptions to ArgumentParser parameters (4724)
[reference scripts] Replaced all 'no_grad()' instances with 'inference_mode()' (4629)
[reference scripts] Adding Repeated Augment Sampler  (5051)
[reference scripts] Reduce variance of classification references evaluation (4609)
[reference scripts] Avoid inplace modification of target boxes in detection references (5289)
[reference scripts] Allow variable number of repetitions for RA (5084)
[reference scripts, classification] Adding gradient clipping (4824)
[reference scripts, models.quantization] Add --prototype flag to quantization scripts. (5334)
[reference scripts, ops] Additional SOTA ingredients on Classification Recipe (4493)
[transforms] Added center arg to F.affine and RandomAffine ops (5208)
[transforms] Explicitly copying array in pil_to_tensor (4566)
[transforms] Update functional_tensor.py (4852)
[transforms] Add api usage log to transforms (5007)
[utils] Support random colors by default for draw_bounding_boxes (5127)
[utils] Add API usage calls to utils (5077)
Various documentation improvements (4913) (4892) (5305) (5273) (5089) (4653) (5302) (4647) (4922) (5124) (4972) (5165) (4843) (5238) (4846) (4823) (5316) (5195) (5153) (4783) (4798) (4797) (5368) (5037) (4830) (4681) (4579) (4520) (4586) (4536) (4574)) (4565) (4822) (5315) (4546) (4522) (5312) (5372) (4833)
[tests] Set seed on several tests to reduce flakiness (4911) (4764) (4762) (4759) (4766) (4763) (4758) (4761)
[tests]Other tests improvements (4756) (4775) (4867) (4929) (4632) (5029) (4597)
Added script to sync fbcode changes with main branch (4769)
[ci] Various CI improvements (4662) (4669) (4791) (4626) (5021) (4739) (3973)(4618) (4788) (4946) (5112) (5099) (5288) (5152) (4696) (5122) (4793) (4998) (4498)
[build] Various build improvements (5261) (5190) (4945) (4920) (5024) (4571) (4742) (4944) (4989) (5179) (4516) (4661) (4695) (4939) (4954)
[io] decode_* returns contiguous tensors (4898)
[io] Revert "decode_* returns contiguous tensors (4898)" (4901)

Bug Fixes

[datasets] fix Caltech datasets (4556)
[datasets] fix UCF101 on Windows (5129)
[datasets] remove extracted archive if flag was set (5055)
[datasets] Reverted folder.py back to using complete path to file for make_dataset and is_valid_file rather than just the filename (4885)
[datasets] fix `fromfile` on windows (4980)
[datasets] fix WIDERFace download links (4649)
[datasets] fix target_type selection for Caltech101 (4637)
[io] Skip jpeg comparison tests with PIL (5169)
[io] [Windows] Workaround for loading bundled DLLs (4893)
[models] Adding missing named param check on ViT (5196)
[models] Modifying keypoint_rcnn.py for keypoint_predictor issue (5180)
[models] Fixing bug on SSD backbone freezing (4590)
[models] [FBcode->GH] Removed type annotations from rcnn (4883)
[models.quantization] Amend the weights only if quantize=True (4966)
[models.quantization] fix mobilenetv3 quantization state dict loading (4997)
[ops] Adding masks_to_boxes to **all** in ops (4779)
[ops] Update the error message on DeformConv2d (4908)
[ops, onnx] RoiAlign aligned=True (4692)
[reference scripts] Fix reduce_across_processes inconsistent return type (4733)
[reference scripts] Fix bug on EMA n_averaged estimation (4544)
[reference scripts] support random seed for RA sampler (5053)
[reference scripts] fix bug in training model by amp (4874)
[reference scripts, transforms] Fix a bug on RandomZoomOut (5278)
[tests] Skip expected checks for quantized resnet50 due to flakiness (4686)
[transforms] Fix bug on autocontrast when `min==max` (4999)
[transforms] Fix augmentation space to be uint8 compatible (4806)
[utils] Fix `draw_bounding_boxes` and `draw_keypoints `for tensors on GPU (5101) (5102)
[build] fix formatting CIRCLECI_TAG when building docs (4693)
[build] Fix nvjpeg packaging into the wheel (4752)
[build] Switch Android app to pytorch_android stable (4926)
[ci] Add libtinfo5 dependency (4931)
[ci] Revert vit_h_14 as it breaks our CI (5259)
[ci] Remove pager on git diff (4800)
[ci] Fix failing CI job for android (4912)
[ci] Add numpy as explicit dependency to build_cmake.sh (4987)

Code Quality

Various typing improvements (4603) (4172) (4173) (4631) (4619) (4583) (4602) (5182)
Add ufmt (usort + black) as code formatter (4384)
Fix formatting issues (4535) (4747)
Add pre-commit hook to fix line endings (5021)
Various imports cleanups/improvements (4533) (4879)
Use f-strings almost everywhere, and other cleanups by applying pyupgrade (4585)
Update code to Python 3.7 compliance and remove Python 3.6 references (5125) (5161)
Consolidate   repr  methods throughout the repo  (5392)
Set allow_redefinition = True for mypy (4531)
Use `is` to compare type of objects (4605)
Various typos fixed (5031) (5092)
Fix annotations for Python >= 3.8 (5301)
Revamp log api usage method (5072)
[deprecation] Update deprecation messages stating APIs will be removed in 0.14 and remove APIs that were deprecated before 0.8 (5387) (5386)
[build] Updated setup.py to use TorchVersion object for version comparison (4307)
[ops] remove debugging asserts (5332)
[c++frontend] Fix missing Torch includes (5118)
[ci] Cleanup and removing unnecessary references and parameters (4983) (4930) (5042)
[datasets] [FBcode->GH] remove unused requests functionality (5014)
[datasets] allow single extension as str in make_dataset (5229)
[datasets] use helper function to extract archive in CelebA (4557)
[datasets] simplify QMNIST download logic (4562)
[documentation] fix `make html-noplot` docs build command (5389)
[models] Move all weight initializations from private methods to constructors (5331)
[models] simplify model builders (5001)
[models] Replace asserts with ValueErrors (5275)
[models] Use enumerate to get index of ModuleList (4534)
[models] Simplify efficientnet code by removing _efficientnet_conf (4690)
[models] Refactor Segmentation models (4646)
[models] Pass indexing param to meshgrid to avoid warning in detection models (4645)
[models] Refactor the backbone builders of detection (4656)
[models.quantization] Switch torch.quantization to torch.ao.quantization (5296) (4554)
[ops] Fixed unused variables in ops (4666)
[ops] Refactor poolers (4951)
[reference scripts] Simplify the gradient clipping code (4896)
[reference scripts] only set random generator if shuffle=true (5135)
[tests] Refactor BoxOps tests to use parameterize (5380)
[tests] rename TestWeights to appease pytest (5054)
[tests] fix and add test for sequence_to_str (5213)
[tests] remove get_bool_env_var (5222)
[models, tests] remove custom code for model output comparison (4971)
[utils, documentation] Fix annotation of draw_segmentation_masks (4527)
[video] Fix error message in demuxer (5293)

Contributors

We're grateful for our community, which helps us improve torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:

Abhijit Deo, Aditya Oke, Alexander Soare, Alexander Unnervik, Allen Goodman, Andrey Talman, Brian Johnson, Bruno Korbar, buckage, Carlosbogo, Chungman Lee, Daniel Falbel, David Fan, Dmytro, Eli Uriegas, Ethan White, Eugene Yurtsev, F-G Fernandez, Fedor, Francisco Massa, Guo, Harish Kulkarni, HeungwooLee, Hu Ye, Jane (Yuan) Xu, Jirka Borovec, Jithun Nair, Joao Gomes, Jopo, Kai Zhang, kbozas, Kevin Tse, Khushi Agrawal, Konstantinos Bozas, Kotchin, Kushashwa Ravi Shrimali, KyleCZH, Mark Harfouche, Marko Kohtala, Masahiro Masuda, Matti Picus, Mengwei Liu, Mohammad (Moe) Rezaalipour, Mriganka Nath, Muhammed Abdullah, Nicolas Granger, Nicolas Hug, Nikita Shulga, peterbell10, Philip Meier, Piyush Singh, Prabhat Roy, ProGamerGov, puhuk, Richard Barnes, rvandeghen, Sai Krishna, Santiago Castro, Saswat Das, Sepehr Sameni, Sergii Khomenko, Stephen Matthews, Sumanth Ratna, Sumukh Aithal, Tal Ben-Nun, Vasilis Vryniotis, vfdev, Xiaolin Wang, Yi Zhang, Yiwen Song, Yoshitomo Matsubara, Yuchen Huang, Yuxin Wu, zhiqiang, and Zhiqiang Wang.

0.11.3

This is a minor release compatible with [PyTorch 1.10.2](https://github.com/pytorch/pytorch/releases/tag/v1.10.2) and a minor bug fix.

Highlights

Bug Fixes
- [CI] Skip jpeg comparison tests with PIL (5232)

0.11.2

This minor release bumps the pinned PyTorch version to v1.10.1 and contains some minor bug fixes.

Highlights

Bug Fixes
- [CI] Fix clang_format issue (5061)
- [CI, MOBILE] Fix binary_libtorchvision_ops_android job (5062)
- [CI] Add numpy as explicit dependency to build_cmake.sh (5065)
- [MODELS] Amend the weights only if quantize=True. (5066)
- [TRANSFORMS] Fix augmentation space to be uint8 compatible (5067)
- [DATASETS] Fix WIDERFace download links (5068)
- [BUILD, WINDOWS] Workaround for loading bundled DLLs (5094)

0.11.1

Users were reporting issues installing torchvision on PyPI, this release contains an update to the dependencies for wheels to point directly to torch==0.10.0

0.11.0

This release introduces the RegNet and EfficientNet architectures, a new FX-based utility to perform Feature Extraction, new data augmentation techniques such as RandAugment and TrivialAugment, updated training recipes that support EMA, Label Smoothing, Learning-Rate Warmup, Mixup and Cutmix, and many more.

Highlights

New Models

[RegNet](https://arxiv.org/abs/2003.13678) and [EfficientNet](https://arxiv.org/abs/1905.11946) are two popular architectures that can be scaled to different computational budgets. In this release we include 22 pre-trained weights for their classification variants. The models were trained on ImageNet and can be used as follows:

python
import torch
from torchvision import models

x = torch.rand(1, 3, 224, 224)

regnet = models.regnet_y_400mf(pretrained=True)
regnet.eval()
predictions = regnet(x)

efficientnet = models.efficientnet_b0(pretrained=True)
efficientnet.eval()
predictions = efficientnet(x)


The accuracies of the pre-trained models obtained on ImageNet val are seen below (see [4403](https://github.com/pytorch/vision/pull/4403#issuecomment-930381524), [4530](https://github.com/pytorch/vision/pull/4530#issuecomment-933213238) and [4293](https://github.com/pytorch/vision/pull/4293) for more details)

|Model	|Acc1	|Acc5	|
|---	|---	|---	|
|regnet_x_400mf	|72.834	|90.95	|
|regnet_x_800mf	|75.212	|92.348	|
|regnet_x_1_6gf	|77.04	|93.44	|
|regnet_x_3_2gf	|78.364	|93.992	|
|regnet_x_8gf	|79.344	|94.686	|
|regnet_x_16gf	|80.058	|94.944	|
|regnet_x_32gf	|80.622	|95.248	|
|regnet_y_400mf	|74.046	|91.716	|
|regnet_y_800mf	|76.42	|93.136	|
|regnet_y_1_6gf	|77.95	|93.966	|
|regnet_y_3_2gf	|78.948	|94.576	|
|regnet_y_8gf	|80.032	|95.048	|
|regnet_y_16gf	|80.424	|95.24	|
|regnet_y_32gf	|80.878	|95.34	|
|EfficientNet-B0	|77.692	|93.532	|
|EfficientNet-B1	|78.642	|94.186	|
|EfficientNet-B2	|80.608	|95.31	|
|EfficientNet-B3	|82.008	|96.054	|
|EfficientNet-B4	|83.384	|96.594	|
|EfficientNet-B5	|83.444	|96.628	|
|EfficientNet-B6	|84.008	|96.916	|
|EfficientNet-B7	|84.122	|96.908	|

We would like to thank Ross Wightman and Luke Melas-Kyriazi for contributing the weights of the EfficientNet variants.

FX-based Feature Extraction

A new Feature Extraction method has been added to our utilities. It uses PyTorch FX and enables us to retrieve the outputs of intermediate layers of a network which is useful for feature extraction and visualization. Here is an example of how to use the new utility:

python
import torch
from torchvision.models import resnet50
from torchvision.models.feature_extraction import create_feature_extractor


x = torch.rand(1, 3, 224, 224)

model = resnet50()

return_nodes = {
 "layer4.2.relu_2": "layer4"
}
model2 = create_feature_extractor(model, return_nodes=return_nodes)
intermediate_outputs = model2(x)

print(intermediate_outputs['layer4'].shape)



We would like to thank Alexander Soare for developing this utility.

New Data Augmentations

Two new Automatic Augmentation techniques were added: [Rand Augment](https://arxiv.org/abs/1909.13719) and [Trivial Augment](https://arxiv.org/abs/2103.10158). Both methods can be used as drop-in replacement of the AutoAugment technique as seen below:

python
from torchvision import transforms

t = transforms.RandAugment()
t = transforms.TrivialAugmentWide()
transformed = t(image)

transform = transforms.Compose([
 transforms.Resize(256),
 transforms.RandAugment(),   transforms.TrivialAugmentWide()
 transforms.ToTensor()])


We would like to thank Samuel G. Müller for contributing Trivial Augment and for his help on refactoring the AA package.

Updated Training Recipes

We have updated our training reference scripts to add support of Exponential Moving Average, Label Smoothing, Learning-Rate Warmup, [Mixup](https://arxiv.org/abs/1710.09412), [Cutmix](https://arxiv.org/abs/1905.04899) and other [SOTA primitives](https://github.com/pytorch/vision/issues/3911). The above enabled us to improve the classification Acc1 of some pre-trained models by [over 4 points](https://github.com/pytorch/vision/issues/3995). A major update of the existing pre-trained weights is expected on the next release.

Backward-incompatible changes

[models] Use torch instead of scipy for random initialization of inception and googlenet weights (4256)

Deprecations

[models] Deprecate the C++ vision::models namespace (4375)

New Features

[datasets] Add iNaturalist dataset (4123)
[datasets] Download and Kinetics 400/600/700 Datasets (3680)
[datasets] Added LFW Dataset (4255)
[models] Add FX feature extraction as an alternative to intermediate_layer_getter (4302) (4418)
[models] Add RegNet Architecture in TorchVision (4403) (4530) (4550)
[ops] Add new masks_to_boxes op (4290) (4469)
[ops] Add StochasticDepth implementation (4301)
[reference scripts] Adding Mixup and Cutmix (4379)
[transforms] Integration of TrivialAugment with the current AutoAugment Code (4221)
[transforms] Adding RandAugment implementation (4348)
[models] Add EfficientNet Architecture in TorchVision (4293)

Improvements

Various documentation improvements (4239) (4251) (4275) (4342) (3894) (4159) (4133) (4138) (4089) (3944) (4349) (3754) (4308) (4352) (4318) (4244) (4362) (3863) (4382) (4484) (4503) (4376) (4457) (4505) (4363) (4361) (4337) (4546) (4553) (4565) (4567) (4574) (4575) (4383) (4390)  (3409)  (4451)  (4340) (3967)  (4072)  (4028) (4132)
[build] Add CUDA-11.3 builds to torchvision (4248)
[ci, tests] Skip some CPU-only tests on CircleCI machines with GPU (4002) (4025) (4062)
[ci] New issue templates (4299)
[ci] Various CI improvements, in particular putting back GPU testing on windows (4421) (4014) (4053) (4482) (4475) (3998) (4388) (4179) (4394) (4162) (4065) (3928) (4081) (4203) (4011) (4055) (4074) (4419) (4067) (4201) (4200) (4202) (4496) (3925)
[ci] ping maintainers in case a PR was not properly labeled (3993) (4012) (4021) (4501)
[datasets] Add bzip2 file compression support to datasets (4097)
[datasets] Faster dataset indexing (3939)
[datasets] Enable logging of internal dataset instanciations. (4319)  (4090) 
[datasets] Removed copy=False in torch.from_numpy in MNIST to avoid warning (4184)
[io] Add warning for files with corrupt containers (3961)
[models, tests] Add test to check that classification models are FX-compatible (3662)
[tests] Speedup various tests (3929) (3933)  (3936)
[models] Allow custom activation in SqueezeExcitation of EfficientNet (4448)
[models] Allow gradient backpropagation through GeneralizedRCNNTransform to inputs (4327)
[ops, tests] Add JIT tests (4472)
[ops] Make StochasticDepth FX-compatible (4373)
[ops] Added backward pass on CPU and CUDA for interpolation with anti-alias option (4208) (4211)
[ops] Small refactoring to support opt mode for torchvision ops (fb internal specific) (4080)  (4095)
[reference scripts] Added Exponential Moving Average support to classification reference script (4381) (4406) (4407)
[reference scripts] Adding label smoothing on classification reference (4335)
[reference scripts] Further enhance Classification Reference (4444)
[reference scripts] Replaced to_tensor() with pil_to_tensor() + convert_image_dtype() (4452)
[reference scripts] Update the metrics output on reference scripts (4408)
[reference scripts] Warmup schedulers in References (4411)
[tests] Add check for fx compatibility on segmentation and video models (4131)
[tests] Mock redirection logic for tests (4197)
[tests] Replace set_deterministic with non-deprecated spelling (4212)
[tests] Skip building torchvision with ffmpeg when python==3.9 (4417)
[tests] [jit] Make operation call accept Stack& instead Stack* (63414) (4380)
[tests] make tests that involve GDrive more robust (4454)
[tests] remove dependency for dtype getters (4291)
[transforms] Replaced example usage of ToTensor() by PILToTensor() + ConvertImageDtype() (4494)
[transforms] Explicitly copying array in pil_to_tensor (4566) (4573)
[transforms] Make get_image_size and get_image_num_channels public. (4321)
[transforms] adding gray images support for adjust_contrast and adjust_saturation (4477)  (4480)
[utils] Support single color in utils.draw_bounding_boxes (4075)
[video, documentation] Port the video_api.ipynb notebook to the example gallery (4241)
[video, io, tests] Added check for invalid input file (3932)
[video, io] remove deprecated function call (3861) (3989)
[video, tests] Removed test_audio_video_sync as it doesn't work as expected (4050)
[video] Build torchvision with ffmpeg only on Linux and ignore ffmpeg on other platforms (4413, 4410, 4041)

Bug Fixes

[build] Conda: Add numpy dependency (4442)
[build] Explicitly exclude PIL 8.3.0 from compatible dependencies (4148)
[build] More robust version check (4285)
[ci] Fix broken clang format test. (4320)
[ci] Remove mentions of conda-forge (4082)
[ci] fixup '*' -> '/.*/' for CI filter (4059)
[datasets] Fix download from google drive which was downloading empty files in some cases (4109)
[datasets] Fix splitting CelebA dataset (4377)
[datasets] Add support for files with periods in name (4099)
[io, tests] Don't check transparency channel for pil >= 8.3 in test_decode_png (4167)
[io] Fix size_t issues across JPEG versions and platforms (4439)
[io] Raise proper error when decoding 16-bits jpegs (4101)
[io] Unpinned the libjpeg version and fixed jpeg_mem_dest's size type Wind… (4288)
[io] deinterlacing PNG images with read_image (4268)
[io] More robust ffmpeg version query in setup.py (4254)
[io] Fixed read_image bug (3948)
[models] Don't download backbone weights if pretrained=True (4283)
[onnx, tests] Do not disable profiling executor in ONNX tests (4324)
[ops, tests] Fix DeformConvTester::test_backward_cuda by setting threads per block to 512 (3942)
[ops] Fix typing issue to make DeformConv2d scriptable (4079)
[ops] Fixes deform_conv issue with large input/output (4351)
[ops] Resolving tracing problem on StochasticDepth iterator. (4372)
[ops] Port quantize_val and dequantize_val into torchvision to avoid at::native and android xplat incompatibility (4311)
[reference scripts] Fix bug on EMA n_averaged estimation. (4544) (4545)
[tests] Avoid cmyk in nvjpeg tests (4246)
[tests] Catch ValueError due to recent change to torch.testing.assert_close (4165)
[tests] Fix failing tests by catching the proper exception from torch.testing (4121)
[tests] Skip test if connection issues on fate (4284)
[transforms] Fix RandAugment and TrivialAugment bugs (4370)
[transforms] [FBcode->GH] [JIT] Add reference semantics to TorchScript classes (44324) (4166)
[utils] Handle grayscale images on draw_bounding_boxes (4043)  (4049)
[video, io] Fixed missing audio with video_reader and pyav backend (3934, 4064)

Code Quality

Various typing improvements (4369) (4168) (4169) (4170) (4171) (4224) (4227) (4395) (4409) (4232) (4234 (4236) (4226)  (4416)
Renamed the “master” branch into “main” (4306) (4365)
[ci] (fb-internal only) Allow all torchvision test rules to run with RE (4073)
[ci] add pre-commit hooks for convenient formatting checks (4387)
[ci] Import hipify_python only when needed (4031)
[io] Fixed a couple of typos and removed unnecessary bracket (4345)
[io] use from_blob to avoid memcpy (4118)
[models, ops] Moving common layers to ops (4504)
[models, ops] Replace MobileNetV3's SqueezeExcitation with EfficientNet's one (4487)
[models] Explicitely store a distance value that is reused (4341)
[models] Use torch instead of scipy for random initialization of inception and googlenet weights (4256)
[onnx, tests] Use test images from repo rather than internet for ONNX tests (4176)
[onnx] Import ONNX utils from symbolic_opset11 module (4230)
[ops] Fix clang formatting in deform_conv2d_kernel.cu (3943)
[ops] Update gpu atomics include path (4478) (reverted)
[reference scripts] Cleaned-up coco evaluation code (4453)
[reference scripts] remove unused package in coco_eval.py (4404)
[tests] Ported all tests to pytest (3962) (3996) (3950) (3964) (3957) (3959) (3981) (3952) (3977) (3974) (3976) (3983) (3971) (3988) (3990) (3985) (3984) (4030) (3955)r (4008) (4010) (4023) (3954) (4026) (3953) (4047) (4185) (3947) (4045) (4036) (4034) (3978) (4046) (3991) (3930) (4038) (4037) (4215) (3972) (3966) (4114) (4177) (4280) (3946) (4233) (4258) (4035) (4040) (4000) (4196) (3922) (4032)
[tests] Prevent tests from leaking their respective RNG (4497) (3926) (4250)
[tests] Remove TestCase dependency for test_models_detection_anchor_utils.py (4207)
[tests] Removed tests executing deprecated F_t.center/five/ten_crop methods (4479)
[tests] Replace set_deterministic with non-deprecated spelling (4212)
[tests] Remove torchvision/test/fakedata_generation.py (4130)
[transforms, reference scripts] Added PILToTensor and ConvertImageDtype classes in reference scripts and used them to replace ToTensor(4495, 4481)
[transforms] Refactor AutoAugment to support more augmentations. (4338)
[transforms] Replace deprecated torch.lstsq with torch.linalg.lstsq 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant