Tracking Benchmarks #29

FirefoxMetzger · 2022-12-24T08:30:09Z

Since we constantly talk about improving the performance of the routines provided by pylinalg, I was wondering if there are any plans on formally tracking benchmarks of various routines to make sure that we don't regress in terms of speed.

Korijn · 2022-12-24T10:07:31Z

I've done a bit of research on benchmarking and found it hard to come up with a good strategy. What do you propose?

ivoflipse · 2022-12-24T10:31:33Z

Have a look at https://github.com/python/pyperformance which is used for benchmarking Python itself. It allows you to create a suite of test to regularly run.

The biggest challenge comes from having a properly configured benchmark machine, as a lot of factors can contribute to different benchmark results that have nothing to do with the actual code changes. But I still think it's worthwhile to check for unintended regression that are big enough to be caught by noisy benchmarking.

As long as you use the same hardware and follow some best practices for benchmarking, I think you'll be fine. Having some numbers may be better than none at all. We can always improve the benchmark setup once the noise starts to dominate.

FirefoxMetzger · 2022-12-24T11:00:38Z

Scikit-Image also maintains a benchmark suite. At the time I was contributing it was still being finalized, so I can't say how much it is actually enforced, but it could serve as inspiration:

Docs: https://scikit-image.org/docs/stable/contribute.html#benchmarks
CI: https://github.com/scikit-image/scikit-image/blob/main/.github/workflows/benchmarks.yml

Korijn · 2022-12-27T08:34:26Z

I really like the airspeed velocity tool that sk-image uses! Like Ivo indicates, to me the core issue has always been to compare results from one benchmark run to another, between machines and even the same machine over time. The concept that av brings to the table - just run the benchmark for older commits here and now on the same machine, and compare - is a pretty clever solution!

epompeii · 2023-04-17T13:18:43Z

I've been working on a continuous benchmarking tool called Bencher that supports both use cases, either tracking benchmarks over time for comparison or using relative benchmarking (similar to asv) : https://github.com/bencherdev/bencher
The idea is for it to be like a pyperformance for your application code. Would that be helpful here?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking Benchmarks #29

Tracking Benchmarks #29

FirefoxMetzger commented Dec 24, 2022

Korijn commented Dec 24, 2022

ivoflipse commented Dec 24, 2022

FirefoxMetzger commented Dec 24, 2022

Korijn commented Dec 27, 2022 •

edited

Loading

epompeii commented Apr 17, 2023

Tracking Benchmarks #29

Tracking Benchmarks #29

Comments

FirefoxMetzger commented Dec 24, 2022

Korijn commented Dec 24, 2022

ivoflipse commented Dec 24, 2022

FirefoxMetzger commented Dec 24, 2022

Korijn commented Dec 27, 2022 • edited Loading

epompeii commented Apr 17, 2023

Korijn commented Dec 27, 2022 •

edited

Loading