YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100. YOLOv7-E6 object detector (56 FPS V100, 55.9% AP) outperforms both transformer-based detector SWIN-L Cascade-Mask R-CNN (9.2 FPS A100, 53.9% AP) by 509% in speed and 2% in accuracy, and convolutional-based detector ConvNeXt-XL Cascade-Mask R-CNN (8.6 FPS A100, 55.2% AP) by 551% in speed and 0.7% AP in accuracy, as well as YOLOv7 outperforms: YOLOR, YOLOX, Scaled-YOLOv4, YOLOv5, DETR, Deformable DETR, DINO-5scale-R50, ViT-Adapter-B and many other object detectors in speed and accuracy. Moreover, we train YOLOv7 only on MS COCO dataset from scratch without using any other datasets or pre-trained weights.
performance tested on Ascend 910(8p) with graph mode
Name | Scale | BatchSize | ImageSize | Dataset | Box mAP (%) | Params | Recipe | Download |
---|---|---|---|---|---|---|---|---|
YOLOv7 | Tiny | 16 * 8 | 640 | MS COCO 2017 | 37.5 | 6.2M | yaml | weights |
YOLOv7 | L | 16 * 8 | 640 | MS COCO 2017 | 50.8 | 36.9M | yaml | weights |
YOLOv7 | X | 12 * 8 | 640 | MS COCO 2017 | 52.4 | 71.3M | yaml | weights |
performance tested on Ascend 910*(8p)
Name | Scale | BatchSize | ImageSize | Dataset | Box mAP (%) | ms/step | Params | Recipe | Download |
---|---|---|---|---|---|---|---|---|---|
YOLOv7 | Tiny | 16 * 8 | 640 | MS COCO 2017 | 37.5 | 496.21 | 6.2M | yaml | weights |
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Box mAP: Accuracy reported on the validation set.
- We refer to the official YOLOV7 to reproduce the P5 series model, and the differences are as follows:
- We use 8x NPU(Ascend910) for training, and the single-NPU batch size for tiny/l/x is 16/16/12. This is different from the official code.
Please refer to the GETTING_STARTED in MindYOLO for details.
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
# distributed training on multiple GPU/Ascend devices
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov7_log python train.py --config ./configs/yolov7/yolov7.yaml --device_target Ascend --is_parallel True
Similarly, you can train the model on multiple GPU devices with the above msrun command. Note: For more information about msrun configuration, please refer to here.
For detailed illustration of all hyper-parameters, please refer to config.py.
Note: As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
# standalone training on a CPU/GPU/Ascend device
python train.py --config ./configs/yolov7/yolov7.yaml --device_target Ascend
To validate the accuracy of the trained model, you can use test.py
and parse the checkpoint path with --weight
.
python test.py --config ./configs/yolov7/yolov7.yaml --device_target Ascend --weight /PATH/TO/WEIGHT.ckpt
See here.
[1] Chien-Yao Wang, Alexey Bochkovskiy, and HongYuan Mark Liao. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696, 2022.