Question: Guidance for low accuracy after QAT #18

josht000 · 2024-12-05T15:14:07Z

Getting about 50% lower mAP scores on a custom dataset. You've done a great job on this repo, but one thing that lacks is guidance for how to improve low accuracy.

How to change number of epochs? Suggestions for tuning LR and so on?

Thanks in advanced,
Josh

levipereira · 2024-12-05T16:34:34Z

You can adjust the calibrate_model to obtain more representative data (we recommend using at least 10% of your main dataset) by modifying the batch_size parameter here:

yolov9-qat/models/quantize.py

Line 431 in c293d1f

def calibrate_model(model : torch.nn.Module, dataloader, device, num_batch=25):

Experiment with different calibration methods, as this could be the main factor affecting your results. You can test various calibration approaches without regenerating histograms to identify which yields the best accuracy.
Try different values percentile
Consider modifying the calibration settings here by enabling percentile calibration and disabling MSE:

yolov9-qat/models/quantize.py

Line 477 in c293d1f

    
           #compute_amax(model, method="percentile", percentile=99.99, strict=True) # strict=False avoid Exception when some quantizer are never used

For fine-tuning optimization, you can adjust the learning rate and other hyperparameters to improve the quantization results:

yolov9-qat/models/quantize.py

Line 482 in c293d1f

def finetune(

josht000 · 2024-12-05T20:11:27Z

Ok, I was going to go down the road of adjusting the hparams of LR, epochs, and LR like as if it was a standard training run. I didn't see anything about calibration_model in your docs.

Our model responds well to standard settings using as if were a typical COCO dataset.
Question: With that in mind, what were you exact settings that you used to get your awesome results on the COCO dataset?

levipereira · 2024-12-05T22:15:29Z

This project was a research project that I executed, and although some parameters are documented, many low-level are not documented.
All parameters I used for COCO are in the codebase. You can look at the pytorch-quantization-toolkit documentation to customize the project according to your needs.
https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html

josht000 · 2024-12-09T22:15:24Z

@levipereira

I uncommented the line you suggested and looking at the results again. I now see the the QAT model is essentially on par with the "origin" model. BUT the scores of both are substationally lower than the fp32 model. Is this due to reparametrization?

Here's the results after QAT:

Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 707/707 08:38
all      42371      38473      0.246      0.266      0.147     0.0493
    0      42371       4614      0.248      0.254      0.163     0.0576
    1      42371       1051      0.155      0.133     0.0367    0.00804
    2      42371      19887      0.233      0.351      0.157     0.0447
    3      42371       9666      0.464      0.337      0.292      0.109
    4      42371       3255      0.132      0.251     0.0847     0.0278

QAT: Epoch-10, weights saved as yolov9s_dual_img640_hexablu_v7.2_QAT/percentile_amax/weights/qat_ep_10_ap_0.0493_converted.pt (31.1 MB)

Eval Model | AP       | AP50     | Precision  | Recall  
-------------------------------------------------------
Origin     | 0.048    | 0.143    | 0.247      | 0.237   
PTQ        | 0.048    | 0.143    | 0.233      | 0.244   
QAT - Best | 0.05     | 0.148    | 0.249      | 0.267

However, the original model actually has a mAP50:95 of 0.10112. This is still about a 50% reduction in mAP50:95 from the real original (un reparameterized).

So it appears the the majority of the loss is in the re parameterized model. Is this what you've seen as well? If so... how do I gain the accuracy back? What's going on there in reparameterization?

Thanks for you help.

josht000 · 2024-12-09T22:42:43Z

Looking into this issue to see if it fixes my accuracy loss. WongKinYiu/yolov9#198.

Turns out there were no discrepancies with my settings. Trying a converted model without model.half() and a gelan-s.yaml model.

levipereira · 2024-12-10T14:24:36Z

original model actually has a mAP50:95 of 0.10112

Here is the problem:
Your model has very poor performance (mAP50:95 of 0.10112), so any quantization or modification will result in a huge performance drop.
Try to improve the training to reach at least 50% mAP. Is your dataset complex? Because if your dataset is not complex, you have a serious problem with your dataset.

Complex datasets are those with very similar classes or very small object sizes

Try to solve the problem by increasing the network resolution. But you definitely have a serious issue with this model performing at 10% mAP.
Some recommendations:

Evaluate if your dataset has inherent complexity (similar classes, small objects)
If not complex, review your dataset quality and labeling
Try increasing the input resolution
Focus on improving the base model performance before attempting any optimizations like quantization
Aim for at least 50% mAP as a baseline

The current performance (10% mAP) indicates fundamental issues that need to be addressed before considering model optimization techniques.

josht000 · 2024-12-10T14:34:32Z

@levipereira Ok, thanks Levi.
Yes, it's a very complex dataset. Classes are very similar and average size is around 30x30 pixels. I was going for a quick QAT experiment. There are several things I know I can do to increase the accuracy. I'll try those and report back and hopefully come to closure on this issue. I don't know if I've ever gotten above 50% mAP50:95 though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Guidance for low accuracy after QAT #18

Question: Guidance for low accuracy after QAT #18

josht000 commented Dec 5, 2024

levipereira commented Dec 5, 2024

josht000 commented Dec 5, 2024

levipereira commented Dec 5, 2024

josht000 commented Dec 9, 2024

josht000 commented Dec 9, 2024 •

edited

Loading

levipereira commented Dec 10, 2024

josht000 commented Dec 10, 2024

Question: Guidance for low accuracy after QAT #18

Question: Guidance for low accuracy after QAT #18

Comments

josht000 commented Dec 5, 2024

levipereira commented Dec 5, 2024

josht000 commented Dec 5, 2024

levipereira commented Dec 5, 2024

josht000 commented Dec 9, 2024

josht000 commented Dec 9, 2024 • edited Loading

levipereira commented Dec 10, 2024

josht000 commented Dec 10, 2024

josht000 commented Dec 9, 2024 •

edited

Loading