-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking Benchmarks #29
Comments
I've done a bit of research on benchmarking and found it hard to come up with a good strategy. What do you propose? |
Have a look at https://github.com/python/pyperformance which is used for benchmarking Python itself. It allows you to create a suite of test to regularly run. The biggest challenge comes from having a properly configured benchmark machine, as a lot of factors can contribute to different benchmark results that have nothing to do with the actual code changes. But I still think it's worthwhile to check for unintended regression that are big enough to be caught by noisy benchmarking. As long as you use the same hardware and follow some best practices for benchmarking, I think you'll be fine. Having some numbers may be better than none at all. We can always improve the benchmark setup once the noise starts to dominate. |
Scikit-Image also maintains a benchmark suite. At the time I was contributing it was still being finalized, so I can't say how much it is actually enforced, but it could serve as inspiration: Docs: https://scikit-image.org/docs/stable/contribute.html#benchmarks |
I really like the airspeed velocity tool that sk-image uses! Like Ivo indicates, to me the core issue has always been to compare results from one benchmark run to another, between machines and even the same machine over time. The concept that av brings to the table - just run the benchmark for older commits here and now on the same machine, and compare - is a pretty clever solution! |
I've been working on a continuous benchmarking tool called Bencher that supports both use cases, either tracking benchmarks over time for comparison or using relative benchmarking (similar to |
Since we constantly talk about improving the performance of the routines provided by pylinalg, I was wondering if there are any plans on formally tracking benchmarks of various routines to make sure that we don't regress in terms of speed.
The text was updated successfully, but these errors were encountered: