-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI for benchmarks to track performance #13
Comments
Now that this issue exists, I'll write down one thought. Maybe we could just use a GitHub Actions self-hosted runner for this: https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners. I haven't looked into it but it looks like it already has a builtin job queue system etc, so it would avoid a lot of reinventing of the wheel.
This limit should be sufficiently high that we can run big jobs that the free GitHub hosted runner probably doesn't allow. Also GitHub Actions can schedule jobs a la cron instead of trying to do them on each push: https://docs.github.com/en/actions/reference/events-that-trigger-workflows#schedule. And there looks to be even a way to manually trigger jobs. And of course the integration would be minimal: no need to build some properly authenticated HTTPS webhook server to handle GitHub hooks into testing-framework or whatever. |
Does it make sense to look at something like https://www.jenkins.io/ or do we make our own? A simple implementation would probably be some nodejs server as an endpoint reacting to the GitHub commit hook. |
Ok, that looks like an easy option.
|
If we do our own, there'd be no limits and one could think about more sophisticated prioritization strategies. |
I would be very cautious of trying to roll something decent from scratch. If we really need something beyond those limits, then it still might be worth looking at Jenkins or something else existing and mature. For example, Jenkins even seems to have a plugin for bisecting. Although I'm not sure how necessary such functionality would be. If we already do nightly benchmarks, then there's probably not that much to bisect. And even if there is a need, one can bisect a single/handful of benchmarks locally by hand instead of having to do bisect with an entire 12h suite or whatever. |
Yea, just some greenfield thinking, but likely the devil is in the details 😄 |
Moved it over here, as it seems more appropriate here. |
We now have a minimum working version of this running on |
GitHub Actions are fine for running the regression tests, but we also want something to track performance (and precision) for long-running benchmarks.
Originally posted by @michael-schwarz in goblint/analyzer#234 (comment):
Something along these lines was supposed to be the outcome. Basing it on this benchexec framework has the advantages that it is the same setup for SV-Comp so all those tests work out of the box and our own tests can be integrated without too many issues. Also this
tablegen
tool would in theory give us a nice diff of what changed between runs (or configurations) that could simply be served at some URL to look at the results without having to ssh to the machine.One probably wants some glue code so that this is not all shell scripts but a bit more robust. But the idea was exactly this.
This is a bit optimistic given that one of these runs will likely take >12h (at least for SV-Com) even on the new hardware.
The text was updated successfully, but these errors were encountered: