Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI for benchmarks to track performance #13

Open
vogler opened this issue May 28, 2021 · 8 comments
Open

CI for benchmarks to track performance #13

vogler opened this issue May 28, 2021 · 8 comments

Comments

@vogler
Copy link

vogler commented May 28, 2021

GitHub Actions are fine for running the regression tests, but we also want something to track performance (and precision) for long-running benchmarks.


Originally posted by @michael-schwarz in goblint/analyzer#234 (comment):

or better having some server with a job queue checking every commit (https://github.com/goblint/analyzer/settings/hooks).

Something along these lines was supposed to be the outcome. Basing it on this benchexec framework has the advantages that it is the same setup for SV-Comp so all those tests work out of the box and our own tests can be integrated without too many issues. Also this tablegen tool would in theory give us a nice diff of what changed between runs (or configurations) that could simply be served at some URL to look at the results without having to ssh to the machine.

One probably wants some glue code so that this is not all shell scripts but a bit more robust. But the idea was exactly this.

checking every commit

This is a bit optimistic given that one of these runs will likely take >12h (at least for SV-Com) even on the new hardware.

@sim642
Copy link
Member

sim642 commented May 28, 2021

Now that this issue exists, I'll write down one thought. Maybe we could just use a GitHub Actions self-hosted runner for this: https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners. I haven't looked into it but it looks like it already has a builtin job queue system etc, so it would avoid a lot of reinventing of the wheel.

Each workflow run is limited to 72 hours.

This limit should be sufficiently high that we can run big jobs that the free GitHub hosted runner probably doesn't allow.

Also GitHub Actions can schedule jobs a la cron instead of trying to do them on each push: https://docs.github.com/en/actions/reference/events-that-trigger-workflows#schedule. And there looks to be even a way to manually trigger jobs.

And of course the integration would be minimal: no need to build some properly authenticated HTTPS webhook server to handle GitHub hooks into testing-framework or whatever.

@vogler
Copy link
Author

vogler commented May 28, 2021

Does it make sense to look at something like https://www.jenkins.io/ or do we make our own?

A simple implementation would probably be some nodejs server as an endpoint reacting to the GitHub commit hook.
There are libraries for job queues with priorities and web-interfaces: https://github.com/Automattic/kue, https://github.com/OptimalBits/bull

@vogler
Copy link
Author

vogler commented May 28, 2021

Now that this issue exists, I'll write down one thought. Maybe we could just use a GitHub Actions self-hosted runner for this: https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners. I haven't looked into it but it looks like it already has a builtin job queue system etc, so it would avoid a lot of reinventing of the wheel.

Each workflow run is limited to 72 hours.
This limit should be sufficiently high that we can run big jobs that the free GitHub hosted runner probably doesn't allow.

Also GitHub Actions can schedule jobs a la cron instead of trying to do them on each push: https://docs.github.com/en/actions/reference/events-that-trigger-workflows#schedule. And there looks to be even a way to manually trigger jobs.

And of course the integration would be minimal: no need to build some properly authenticated HTTPS webhook server to handle GitHub hooks into testing-framework or whatever.

Ok, that looks like an easy option.
Just need to make sure the limits are fine for the selected benchmarks:

Each job for self-hosted runners can be queued for a maximum of 24 hours. If a self-hosted runner does not start executing the job within this limit, the job is terminated and fails to complete.

@vogler
Copy link
Author

vogler commented May 28, 2021

If we do our own, there'd be no limits and one could think about more sophisticated prioritization strategies.
What's the GitHub behavior? Start if nothing is running, ignore following commits until run is done and then start accepting again?
Ideally you'd have the same, but then start bisecting on idle if there are changes above a certain threshold.

@sim642 sim642 added the testing label May 28, 2021
@sim642
Copy link
Member

sim642 commented May 28, 2021

If we do our own, there'd be no limits and one could think about more sophisticated prioritization strategies.

I would be very cautious of trying to roll something decent from scratch. If we really need something beyond those limits, then it still might be worth looking at Jenkins or something else existing and mature. For example, Jenkins even seems to have a plugin for bisecting. Although I'm not sure how necessary such functionality would be. If we already do nightly benchmarks, then there's probably not that much to bisect. And even if there is a need, one can bisect a single/handful of benchmarks locally by hand instead of having to do bisect with an entire 12h suite or whatever.

@vogler
Copy link
Author

vogler commented May 28, 2021

Yea, just some greenfield thinking, but likely the devil is in the details 😄
Bisect is also good for looking back to see what changes had a big (unexpected) impact.

@michael-schwarz michael-schwarz transferred this issue from goblint/analyzer Jul 14, 2022
@michael-schwarz
Copy link
Member

Moved it over here, as it seems more appropriate here.

@sim642 sim642 added benchmark and removed testing labels Nov 15, 2022
@michael-schwarz
Copy link
Member

We now have a minimum working version of this running on server01 and reporting to Zulip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants