Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Guidance for low accuracy after QAT #18

Open
josht000 opened this issue Dec 5, 2024 · 7 comments
Open

Question: Guidance for low accuracy after QAT #18

josht000 opened this issue Dec 5, 2024 · 7 comments

Comments

@josht000
Copy link

josht000 commented Dec 5, 2024

Getting about 50% lower mAP scores on a custom dataset. You've done a great job on this repo, but one thing that lacks is guidance for how to improve low accuracy.

How to change number of epochs? Suggestions for tuning LR and so on?

Thanks in advanced,
Josh

@levipereira
Copy link
Owner

You can adjust the calibrate_model to obtain more representative data (we recommend using at least 10% of your main dataset) by modifying the batch_size parameter here:

def calibrate_model(model : torch.nn.Module, dataloader, device, num_batch=25):

Experiment with different calibration methods, as this could be the main factor affecting your results. You can test various calibration approaches without regenerating histograms to identify which yields the best accuracy.
Try different values percentile
Consider modifying the calibration settings here by enabling percentile calibration and disabling MSE:

#compute_amax(model, method="percentile", percentile=99.99, strict=True) # strict=False avoid Exception when some quantizer are never used

For fine-tuning optimization, you can adjust the learning rate and other hyperparameters to improve the quantization results:

def finetune(

@josht000
Copy link
Author

josht000 commented Dec 5, 2024

Ok, I was going to go down the road of adjusting the hparams of LR, epochs, and LR like as if it was a standard training run. I didn't see anything about calibration_model in your docs.

Our model responds well to standard settings using as if were a typical COCO dataset.
Question: With that in mind, what were you exact settings that you used to get your awesome results on the COCO dataset?

@levipereira
Copy link
Owner

This project was a research project that I executed, and although some parameters are documented, many low-level are not documented.
All parameters I used for COCO are in the codebase. You can look at the pytorch-quantization-toolkit documentation to customize the project according to your needs.
https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html

@josht000
Copy link
Author

josht000 commented Dec 9, 2024

@levipereira

I uncommented the line you suggested and looking at the results again. I now see the the QAT model is essentially on par with the "origin" model. BUT the scores of both are substationally lower than the fp32 model. Is this due to reparametrization?

Here's the results after QAT:

Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 707/707 08:38
all      42371      38473      0.246      0.266      0.147     0.0493
    0      42371       4614      0.248      0.254      0.163     0.0576
    1      42371       1051      0.155      0.133     0.0367    0.00804
    2      42371      19887      0.233      0.351      0.157     0.0447
    3      42371       9666      0.464      0.337      0.292      0.109
    4      42371       3255      0.132      0.251     0.0847     0.0278

QAT: Epoch-10, weights saved as yolov9s_dual_img640_hexablu_v7.2_QAT/percentile_amax/weights/qat_ep_10_ap_0.0493_converted.pt (31.1 MB)

Eval Model | AP       | AP50     | Precision  | Recall  
-------------------------------------------------------
Origin     | 0.048    | 0.143    | 0.247      | 0.237   
PTQ        | 0.048    | 0.143    | 0.233      | 0.244   
QAT - Best | 0.05     | 0.148    | 0.249      | 0.267   

However, the original model actually has a mAP50:95 of 0.10112. This is still about a 50% reduction in mAP50:95 from the real original (un reparameterized).

So it appears the the majority of the loss is in the re parameterized model. Is this what you've seen as well? If so... how do I gain the accuracy back? What's going on there in reparameterization?

Thanks for you help.

@josht000
Copy link
Author

josht000 commented Dec 9, 2024

Looking into this issue to see if it fixes my accuracy loss. WongKinYiu/yolov9#198.

Turns out there were no discrepancies with my settings. Trying a converted model without model.half() and a gelan-s.yaml model.

@levipereira
Copy link
Owner

original model actually has a mAP50:95 of 0.10112

Here is the problem:
Your model has very poor performance (mAP50:95 of 0.10112), so any quantization or modification will result in a huge performance drop.
Try to improve the training to reach at least 50% mAP. Is your dataset complex? Because if your dataset is not complex, you have a serious problem with your dataset.

Complex datasets are those with very similar classes or very small object sizes

Try to solve the problem by increasing the network resolution. But you definitely have a serious issue with this model performing at 10% mAP.
Some recommendations:

Evaluate if your dataset has inherent complexity (similar classes, small objects)
If not complex, review your dataset quality and labeling
Try increasing the input resolution
Focus on improving the base model performance before attempting any optimizations like quantization
Aim for at least 50% mAP as a baseline

The current performance (10% mAP) indicates fundamental issues that need to be addressed before considering model optimization techniques.

@josht000
Copy link
Author

@levipereira Ok, thanks Levi.
Yes, it's a very complex dataset. Classes are very similar and average size is around 30x30 pixels. I was going for a quick QAT experiment. There are several things I know I can do to increase the accuracy. I'll try those and report back and hopefully come to closure on this issue. I don't know if I've ever gotten above 50% mAP50:95 though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants