Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing your results #1

Closed
ludovic-carre opened this issue Aug 27, 2018 · 11 comments
Closed

Reproducing your results #1

ludovic-carre opened this issue Aug 27, 2018 · 11 comments
Labels
Stale Stale and schedule for closing soon

Comments

@ludovic-carre
Copy link

ludovic-carre commented Aug 27, 2018

Hi,

I am working on a similar project, xview and yolo, and I would like to reproduce your results. I have a few questions:

  1. Why do use 30 anchors ? Did you run some analysis or what is your intuition ?
  2. I noticed that you have multiple cfg files but that your 30 anchors symmetric is the default one in train so I figure it is the one that gave you the best results ? What size should my input be to use your cfg file correctly ?
  3. I know you started working using the repo of eriklindernoren, can you mention the major changes you did on the training process (or other) so that I know what to pay attention to ?
  4. I haven't looked at everything yet but from my first look you don't seem to use a loss that emphasize mistakes on xview classes that are not well represented. How do you deal with the fact that building and small car make up a huge part of the dataset ? My network only learns to predict these two classes.
  5. I haven't noticed any data augmentation, do you use any ?

Finally if you can mention/explain anything that you think could help someone reproduce your results it would be really helpful !

@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 29, 2018

Hi @PiggyGenius, good questions, here are your answers!

  1. I used 30 anchors (compared to 9 in yolov3) because I've read that in general higher anchor counts correlate to higher mAPs. Here is an example from http://machinethink.net/blog/object-detection/
    number-of-centroids 2x

  2. I actually achieved the best results with c60_a30.cfg, but I reasoned that since this is overhead imagery, if you had infinite examples of each class, the anchor boxes should be vertical-horizontal symmetric, so to force this idea I duplicated the data, transposing the boxes in the duplicated set, before doing kmeans for the 'symmetric' cfg files. In the end its probably much to do about nothing though, as the two cfg files ended up being very similar to each other (symmetric and non-symmetric). I doubt your results will change materially depending on which you use, but if you want to duplicate my results use c60_a30.cfg.

  3. Yes, the eriklindernoren repo was great for inference but did not train correctly, so I modified the cost functions in models.py and build_targets() in detect.py. I also essentially rewrote much of datasets.py, switching it from PIL to opencv, and adding augmentation, which of course is necessary for training but not inference.

  4. I use a weighted loss for the classification loss term, so buildings and cars for example are much less important. The weight is the inverse of the class frequency. In utils.py you'll find the weights as a lookup table. The numbers here are the number of occurances of each class in the dataset. The weights are their inverses, normalized to sum to 1.

def xview_class_weights(indices):  # weights of each class in the training set, normalized to mu = 1
    weights = 1 / torch.FloatTensor(
        [74, 364, 713, 71, 2925, 209767, 6925, 1101, 3612, 12134, 5871, 3640, 860, 4062, 895, 149, 174, 17, 1624, 1846, 125, 122, 124, 662, 1452, 697, 222, 190, 786, 200, 450, 295, 79, 205, 156, 181, 70, 64, 337, 1352, 336, 78, 628, 841, 287, 83, 702, 1177, 313865, 195, 1081, 882, 1059, 4175, 123, 1700, 2317, 1579, 368, 85])
    weights /= weights.sum()
    return weights[indices]
  1. Yes, there is significant data augmentation in datasets.py: both spatial augmentation (translation, rotation, skew, zoom, flipping), and lighting augmentation (variation of the SV channels when the RGB image is projected to HSV). Bounding boxes are automatically augmented along with the image. Note that the lighting augmentation actually hurt the results on xview however. The mAP dropped from 0.16 to 0.12 when I used it. Also note that rotating bounding boxes can get a bit dicey at rotation angles around 45 degrees, as the box may become much larger about the object than desireable.

To reproduce the results, you should just be able to start training. You should notice right away after a few epochs if the results are similar, as the results posted to results.txt should match the image on the repo home page. You can use plotResults() in utils.py to plot your results.

@abidmalikwaterloo
Copy link

@glenn-jocher I am also trying to reproduce the results. I have the following graphs for precision and recall.
figure_2 1

The last 100 epochs behavior is not in line with the behavior you have on the web. I am getting mAP=0.20 on the training set as compared to 0.30 claimed by you. Any comments.

I see you turned off the Cuda flag in detect.py. Any special reason for this? it is pretty slow on CPU.

I am trying to reduce the classes for my experiments. You have 61 classes and label for them. I want to reduce it to "10" classes. I see that you have

def xview_classes2indices(classes):  # remap xview classes 11-94 to 0-61
    indices = [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, -1, 3, -1, 4, 5, 6, 7, 8, -1, 9, 10, 11, 12, 13, 14,
               15, -1, -1, 16, 17, 18, 19, 20, 21, 22, -1, 23, 24, 25, -1, 26, 27, -1, 28, -1, 29, 30, 31, 32, 33, 34,
               35, 36, 37, -1, 38, 39, 40, 41, 42, 43, 44, 45, -1, -1, -1, -1, 46, 47, 48, 49, -1, 50, 51, -1, 52, -1,
               -1, -1, 53, 54, -1, 55, -1, -1, 56, -1, 57, -1, 58, 59]
    return [indices[int(c)] for c in classes]

My understanding is to change all indices of unnecessary classes to "-1" and they will be filtered out. Am I on the right track or I am have to do more?

@glenn-jocher
Copy link
Member

@abidmalikwaterloo you're free to set the CUDA flag as you like. The graphs looks good, your specific results may vary as I was making changes to the repository after uploading those results to try to optimize it.

Yes, if you want to use custom classes and data you will need to redefine those sections of the code that are relevant like the one you highlighted. There are many ways to do this. The purpose of the function you see there is to use arbitrary class numbers. You do not need to use this if your classes are ordered simply, such as 0, 1, 2, 3 etc. In xview the classes skip numbers, i.e. 5, 6, 17, 20, etc.

@abidmalikwaterloo
Copy link

abidmalikwaterloo commented Dec 4, 2018

@glenn-jocher I am playing with parameters but unable to get the mAP =0.16 on the data using validation set ( images not included in the training). I am using 791 images for training and 85 for validation. The max mAP I get is 0.09. Do you have any specific parameter values that I can use to get the mAP close to 0.16?

@abidmalikwaterloo
Copy link

@PiggyGenius Were you able to get mAP = 0.16? What parameters did you use for your architecture?

@glenn-jocher
Copy link
Member

Be advised that the https://github.com/ultralytics/xview-yolov3 repository is not under active development anymore. We recommend you use https://github.com/ultralytics/yolov3 instead, our main YOLOv3 repository.

@sawhney-medha
Copy link

@abidmalikwaterloo I am also trying to train on a subset of the data for around 9-10 classes. Could you please tell me if you were successful in doing it and how?

@glenn-jocher
Copy link
Member

@sawhney-medha please be advised that the https://github.com/ultralytics/xview-yolov3 repository is not under active development anymore. We recommend you use https://github.com/ultralytics/yolov3 instead, our main YOLOv3 repository.

@glenn-jocher glenn-jocher pinned this issue Aug 9, 2019
@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove Stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Jun 10, 2020
@im-tanyasuri
Copy link

Hi, I want to use resized xview images,. like, decrease their resolution first and then use your model with it. I think, you are cropping the patches from the original images which are of around 3k x 3k. I want to do the same with 1000 x 1000 sized images. Please help. Thanks

@glenn-jocher
Copy link
Member

@im-tanyasuri you can achieve this by resizing the images using any image processing library such as OpenCV or PIL before feeding them into the model. You can use the cv2.resize() function in OpenCV or the Image.resize() method in PIL to resize the images to the desired dimensions. Once resized, you can proceed with using the modified images as inputs to the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Stale and schedule for closing soon
Projects
None yet
Development

No branches or pull requests

5 participants