-
Notifications
You must be signed in to change notification settings - Fork 696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: EfficientAd is slower than other models in anomalib #2150
Comments
Hi, When measuring this implementation of EfficientAD which claims to reach the paper result timing stats, I get the same speed of 64ms per image on my GPU. This makes me think that the speed of EfficientAD in anomalib isn't slower as it should be. The authors of EfficientAD state that So you should be sure to set padding=False and use half-precision. Especially half-precision matters for some kinds of GPUs |
I was curious and made some more experiments. half precision really matters for example for a T4 GPU.
What I assume from these results (and isn't big news): Half precision matters especially for convolution intensive models. Image size matters. The choice of GPU matters. The EfficientAD authors might not have made a fair comparison between the models because I have the feeling they didn't use half precision for all the others they compare their inference speed with. |
@alexriedel1 Thanks for your response. |
@alexriedel1 When I look at your testing results, it is clear that regardless of the image size even with half precision the EfficientAd model is slower than the full precision Fastflow model. That is quite a surprise ... I have exported my models to ONNX and then converted to TensorRT on Nvidia. How have you turned on or off the half precision mode? |
I think that half precision makes quite a big difference due to tensor cores that operate with FP16. I'm not sure if having the data in fp32 even guarantees that it's not turned to fp16 (either by pytorch or cuda) for tensor cores behind the scenes. However, it really seems like the speed greatly depends on the image size, and with compute heavy models the GPU plays quite a big role as well (H100 for example has significantly faster tensor cores that work with FP16).
I'm not sure how exactly they did that, but I think every model they used can be set to FP16, BUT some really don't benefit much from that (probably due to tensor cores mentioned above that mostly do MMA operations). To answer the other two questions: For FastFlow model, you can specify the ResNet model either by config file or by passing the anomalib/src/anomalib/models/image/fastflow/lightning_model.py Lines 38 to 40 in d1f824a
|
|
You could use the |
@samet-akcay |
Preferably both during Training and inferencing |
@alexriedel1 @blaz-r @samet-akcay There is a loss of precision when converting efficientAD to tensorrt. If you have time, can you help me find out what is different from python? I really can’t find it. |
Unfortunately I don't have experience with tensorrt and can't help in this regard. |
without any further information it's hard to help. Did you go completely after this example https://github.com/wang-xinyu/tensorrtx/tree/master/efficient_ad ? How large is your accuracy drop? |
The model was trained on its own data set. The python test has an accuracy of 80%, while the C++ test with the above code only has an accuracy of 62.5%. |
Have you checked if all parameters (model weights, normalization values, threshold values) are equal between your tensorrt and pytorch model? |
After comparing the codes, inference does not include Dropout, a layer of network, because this layer of network only works during training and does not affect reasoning. There is no way to check the model weights. Now I just input an image, and tensorrt compares the output of each layer of python. , the biggest change was found in map_combined = 0.5 * map_st + 0.5 * map_stae, but no big changes were found in map_st and map_stae. python anomaly_map: C++ anomaly_map: normalizedMap_stae: normalize map_stae: torch.Size([1, 1, 256, 256]) normalizedMap_st: normalize map_st: tensor(-0.0656, device='cuda:0', grad_fn=) map_combined = 0.5 * map_st + 0.5 * map_stae
|
The values in your map_st are not equal and the difference would explain your accuracy drop I think. |
Describe the bug
I had the impression that the EfficientAd model would be among the fastest in anomalib in terms of prediction times. To verify that I have trained three models, Padim, Fastflow, and EfficientAd, all with the same training data and an image dimension of 512x512 pixels. Then I have written a small script that loads these models, warms up the GPU, and then runs prediction on 100 images. I measure only the model forward time, no image loading or any pre- or post-processing.
With the models exported to ONNX I get these results (avg. model forwards times on 100 images):
So in other words: The EfficientAd model is the slowest from these three, and Padim the fastest - I thought it would be the other way round. Am I missing something, or is this a bug in anomalib?
Dataset
Other (please specify in the text field below)
Model
Other (please specify in the field below)
Steps to reproduce the behavior
I trained three models on the same dataset, then predict 100 images with each of them and measure the avg. model forward / inferencing time, without pre- or post-processing.
OS information
OS information:
Expected behavior
I would expect the EfficientAd net to be considerable faster than the other models.
Screenshots
No response
Pip/GitHub
pip
What version/branch did you use?
No response
Configuration YAML
Logs
Code of Conduct
The text was updated successfully, but these errors were encountered: