-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Idea: Merge gating #103180
Comments
Pinging @elastic/kibana-operations (Team:Operations) |
Nice- I'm excited to see how this turns out! BTW, this is, essentially, the main feature of Zuul CI, as well. Perhaps there is some further inspiration or implementation details that will be useful to you here: https://zuul-ci.org/docs/zuul/discussion/gating.html Your description already sounds pretty similar, though. |
@brianseeders Thanks for rebooting this! @nkammah @kseniia-kolpakova @cachedout @alpar-t Maybe we can do this more generally? |
Heads up, one of the reasons we enabled If it helps |
@v1v the main problem with mergify for us is that it doesn't seem to support parallelization or batches. Our CI just takes too long to support a serial one-at-a-time queue. |
We have been playing around with the idea in cloud too. The description of this issue would I think work well for cloud too, the only additional feature I see us needing is the ability to manually override the queue or move a PR to the front (maybe both). The reason is that we may have infrastructure PRs that we need to be able to merge very quickly as we might be responding to an incident in ESS. For us an added major benefit would be that we don't need to come up with tools and process to triage and route failures on the master branch as these should be extremely rare and would likely have to do with CI infrastructure more than with the actual change. In fact once we get enough confidence ti doesn't really make sense to run builds on master any more. Since we have build avoidance with Gradle, assuming that we still run the PR checks as they are and we run that or a potentially extended set of tests on merge, I'm not sure to what extent it would be worth investing in a complex solution that would try to optimistically parallelize the merges as most of the checks would have been run on the PR (especially if master is not different in meaningful ways) so it won't take too long to asses if a PR can be merged and we don't rely have so many PRs all merging at once, nor urgency around it most of the time. It would be nice for sure, but I'm not sure it's something we absolutely need in a first implementation. The challenges we identified are flaky tests and dependencies on external resources on the internet, both of which can cause a lot of back and forth reducing trust in the process. Sometimes these dependencies disappear from the internet and they would result in a failure popping up unjustly (e.g. a PR failing that changes a completely different part ) and that the PR author is not best suited to handle. We are adsressing these separately but I think they need to be addressed before the approach outlined here can be successful. I'm curious about the advantages of the proposed implementation that uses a new branch. I was thinking we could also automate it by having the bot "click" the merge button when it knows things are ok to merge giving more visibility into the process and keeping things more familiar for folks. Merging master in for as many times as it needs to. That would I think keep Github as a single UI to interact with the process as we could also link the individual builds to the individual commits on the PR. |
If we do it, we're planning to have a way to tag a PR as "emergency", which would immediately bump it to the front of the queue and let it run in a batch by itself, isolated from other PRs. Pushing straight to master is also an option, which would cause all of the PRs in the queue to restart testing in our current proposal.
A branch is simply a place to place the commits that need to be tested. If PRs A,B,C are going to be tested in a batch together, we need a commit that contains
We considered this option as well. The main downside is that the commits that are tested by the gating system will be different than the commits that ultimately end up in master. If we simply fast-forward master to the latest tested batch, the PRs will still show up as merged, and master will contain exactly the commits that were tested. Doing it this way will simplify some of the implementation details, as commits will get created at the beginning and then essentially flow through the system to the end. The big downside is that we are going to be squashing commits inside the PR before we test, since we won't be able to do a real squash merge. |
I believe they do support batches, but they call them Speculative Checks |
I came across another service today that's trying to fit this niche as well. Could be worth evaluating: https://mergequeue.com/ |
Closing this issue, thanks for everyones input. Seeing as the Github solution does not support squashed merges, we are going to continue with this. The next phase will be generating the project specification and we will ping those here once that's available. |
@tylersmalley given how many teams have expressed interest in such feature (Cloud Delivery cc @alpar-t @jeredding), O11y (cc @v1v @KseniaElastic), would you be opened to approaching this collaboratively ? If so, let's sync up offline on the best collaboration model. |
@nkammah FYI: you meant to ping @kseniia-kolpakova instead. :) We'd probably be interested in some exploratory discussions on this topic, but it's not a top priority ATM. |
https://github.blog/2023-07-12-github-merge-queue-is-generally-available/ Squash and merge does appear to be an option now. |
@jbudz thanks for the heads up here. We have an item on our roadmap to look at this but I was awaiting for that GA announcement. Aside from that, when the time comes to look into this, we should test this in a smaller repo because a couple months ago when we were looking into it there was still some bugs that would prevent us to use it for Kibana. |
We just hit yet another example of where a merge queue would have saved us from two PR's landing in |
After researching, debating and trialing what would be needed on our side to enable a merge queue we decided to not go forward with it this time. The main reason is that we concluded the costs to implement one exceeds its benefits. The merge queue will improve a not common incidence having as cost an adding complexity to the most common use case on top of great addition of used Buildkite minutes which we are trying to reduce for the moment. |
Overview
The Not Rocket Science Rule Of Software Engineering:
The Kibana Operations team has been brainstorming around the idea of implementing this rule for the Kibana repository. The high-level idea is that CI for
master
is always green, and that this is achieved by merging code using automated processes (instead of humans clicking the Merge button). Nothing gets merged unless it's guaranteed to work, assuming no network/dependency issues, etc.The high-level process
If CI fails:
We have discussed many more implementation details. This is just a high-level overview.
Pros
(In theory)
merge upstream
because of an unrelated failureCons
mergebot force-pushed the pr branch from aabbccdd to aabbccee 1 day ago
Prior Art / Links
The text was updated successfully, but these errors were encountered: