By Heng Zhang, Elisa FROMONT, Sébastien LEFEVRE, Bruno AVIGNON
Most deep learning object detectors are based on the anchor mechanism and resort to the Intersection over Union (IoU) between predefined anchor boxes and ground truth boxes to evaluate the matching quality between anchors and objects. In this paper, we question this use of IoU and propose a new anchor matching criterion guided, during the training phase, by the optimization of both the localization and the classification tasks: the predictions related to one task are used to dynamically assign sample anchors and improve the model on the other task, and vice versa. This is the Pytorch implementation of Mutual Guidance detectors. For more details, please refer to our ACCV paper.
Detector | Resolution | mAP | AP50 | AP75 | Trained model |
---|---|---|---|---|---|
FSSD (VGG16) | 320x320 | 54.1 | 80.1 | 58.3 | uploading |
FSSD (VGG16) + MG | 320x320 | 56.2 | 80.4 | 61.4 | uploading |
RetinaNet (VGG16) | 320x320 | 55.2 | 80.2 | 59.6 | uploading |
RetinaNet (VGG16) + MG | 320x320 | 57.7 | 81.1 | 62.9 | uploading |
RFBNet (VGG16) | 320x320 | 55.6 | 80.9 | 59.6 | uploading |
RFBNet (VGG16) + MG | 320x320 | 57.9 | 81.5 | 62.6 | uploading |
RetinaNet (VGG16) + PAFPN | 320x320 | 58.1 | 81.7 | 63.3 | uploading |
RetinaNet (VGG16) + PAFPN + MG | 320x320 | 59.5 | 82.3 | 64.2 | uploading |
Detector | Resolution | mAP | AP50 | AP75 | FPS (V100) | Trained model |
---|---|---|---|---|---|---|
FSSD (VGG16) | 320x320 | 31.1 | 48.9 | 32.7 | 365 | uploading |
FSSD (VGG16) + MG | 320x320 | 32.0 | 49.3 | 33.9 | 365 | uploading |
RetinaNet (VGG16) | 320x320 | 32.3 | 50.3 | 34.0 | 270 | uploading |
RetinaNet (VGG16) + MG | 320x320 | 33.6 | 50.8 | 35.7 | 270 | uploading |
RFBNet (VGG16) | 320x320 | 33.4 | 51.6 | 35.1 | 115 | uploading |
RFBNet (VGG16) + MG | 320x320 | 34.6 | 52.0 | 36.8 | 115 | uploading |
RetinaNet (VGG16) + PAFPN | 320x320 | 33.9 | 51.9 | 35.7 | 220 | Google Drive |
RetinaNet (VGG16) + PAFPN + MG | 320x320 | 35.3 | 52.4 | 37.3 | 220 | Google Drive |
RetinaNet (VGG16) | 512x512 | 37.1 | 56.5 | 39.5 | 250 | uploading |
RetinaNet (VGG16) + MG | 512x512 | 38.2 | 56.6 | 41.0 | 250 | uploading |
RetinaNet (VGG16) + PAFPN | 512x512 | running | running | running | 195 | uploading |
RetinaNet (VGG16) + PAFPN + MG | 512x512 | 39.4 | 57.5 | 42.3 | 195 | Google Drive |
First download the VOC and COCO dataset, you may find the sripts in data/scripts/
useful.
Then create a folder named datasets
and link the downloaded datasets inside:
$ mkdir datasets
$ ln -s /path_to_your_voc_dataset datasets/VOCdevkit
$ ln -s /path_to_your_coco_dataset datasets/coco2017
Finally prepare folders to save evaluation results:
$ mkdir eval
$ mkdir eval/COCO
$ mkdir eval/VOC
For training with Mutual Guide:
$ python3 main.py --version fssd --backbone vgg16 --dataset voc --size 320 --mutual_guide
retinanet resnet18 coco 512
rfbnet
pafpn
Remarks:
- For training without Mutual Guide, just remove the '--mutual_guide';
- The default folder to save trained model is
weights/
.
Every time you want to evaluate a trained network:
$ python3 main.py --version fssd --backbone vgg16 --dataset voc --size 320 --trained_model path_to_saved_weights
retinanet resnet18 coco 512
pafpn
rfbnet
It will directly print the mAP, AP50 and AP50 results on VOC2007 Test or COCO2017 Val.