-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use needles from correct ref of NEEDLES_DIR #5175
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think creating another worktree for every rev we might encounter scales very well. It would make more sense to use git show …
to obtain a specific needle for a specific rev.
Note that we've been pondering about this problem ourselves and coming up with an approach that scales well enough for being usable on our production instances is hard (e.g. see https://progress.opensuse.org/issues/56789. https://progress.opensuse.org/issues/91905 and other related tickets).
I appreciate the feedback!
I agree that it's unlikely to scale well, using |
By the way, I think the most important point of my review was that this should be put behind a configuration option and be disabled by default - at least until we are confident this works well in all kinds of instances (and currently we are likely pretty far from that). |
Yes, definitely, this patch is very much a hack that was put together to get a quick and nasty solution to the problem for a specific use case, I realise there's an awful lot of change required for it to be anything like production-ready. |
562ad9e
to
30a6b8e
Compare
I've changed this to make use of It now also has a config option to prevent the feature being used by default. The hacky clean up has been removed for now, so another method of cleaning up will be needed. |
30a6b8e
to
0953592
Compare
7756787
to
6c60bba
Compare
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
6c60bba
to
fe86c76
Compare
fe86c76
to
bbe3787
Compare
f2d4db8
to
e6992d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling should be improved. I commented on the most important places.
Also, see my other inline-comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling should be improved as mentioned in my last review.
The cleanup code also needs further improvement, see https://github.com/os-autoinst/openQA/pull/5175/files#r1480067530.
@sdclarke You're not actively working on this, right? I would take over as we'd actually also like to have this feature ourselves (see https://progress.opensuse.org/issues/154783). I would keep your commits as-is (except for rebasing) and add my own commit on top of them. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5175 +/- ##
==========================================
- Coverage 98.96% 98.86% -0.11%
==========================================
Files 396 398 +2
Lines 39381 39503 +122
==========================================
+ Hits 38975 39054 +79
- Misses 406 449 +43 ☔ View full report in Codecov by Sentry. |
See https://progress.opensuse.org/issues/154783#note-10 for the most recent update on this. |
This pull request is now in conflicts. Could you fix it? 🙏 |
Is this currently being worked on, or is there anything that could be done to help get it merged? |
There are at least three points necessary:
|
Also see https://progress.opensuse.org/issues/154783#note-10 and my subsequent comments in that ticket for what's left. |
The minimum effort we need to put into this PR to be mergable is:
If anyone wants to work on it (e.g. @sdclarke or someone from our team), drop a comment here for better coordination. We now also have https://progress.opensuse.org/issues/162035 for tracking this. |
07e3a1f
to
9567c32
Compare
@sdclarke I saw that you pushed an update. Can you state what changes we should expect? |
This push was just fixing the conflicts, I'm trying to look for the correct places to improve the test coverage but I'm having some trouble working out where the changes are needed. Any pointers in that regard would be appreciated. Assuming I can deal with the test coverage issues, would that be enough to get this merged for now? From my perspective it would be good to get this over the line as it works well enough for our purposes, and it is disabled by default in the config. We can potentially revisit it later to address any outstanding points. |
As you are also adding new modules you could create new test modules like
Yes. I already asked the team to support for which I created ticket https://progress.opensuse.org/issues/162035 but unfortunately no one picked it up yet. |
@sdclarke As stated in #5175 (comment) the answer is no - nothing else needs to be done except further problems are found in PR review. However, I'd strongly recommend looking into point 2 of this note because in my opinion it makes not much sense to enable it on a production instance without it. |
This pull request is now in conflicts. Could you fix it? 🙏 |
}; | ||
chomp $needles_ref; | ||
return undef unless $needles_ref; | ||
return undef unless run_cmd_with_log ['git', '-C', $needle_dirs, 'fetch', '--depth', 1, 'origin', $needles_ref]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sdclarke We have not forgotten about this.
We are currently working on some git features and fixes, and I just wanted to note that we have a git_auto_clone
feature now that will fetch any requested ref on the webui (branch or sha) in a minion job before the job starts running. (Note that it only works if it is the same origin and not some fork.)
That would make this fetch
here not necessary anymore and also avoid the need for a lockfile.
The sha should always be in NEEDLES_GIT_HASH
so the rev-parse
also seems unnecessary.
Would you consider to use the git_auto_clone
feature in order to not have to implement that part here?
I'm not sure if there are any cases which git_auto_clone
wouldn't cover. @Martchus said there might be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that git_auto_clone
only helps if a Git-URL is specified as NEEDLES_DIR
(or CASEDIR
which is relevant if needles are part of the tests).
Even then it is not guaranteed to fetch the refs the test actually runs on. The test might specify a branch name (or no branch which means "default branch") and when the test is finally executed on the worker host the ref that branch points to might differ from the ref checked out by the web UI at the time the test was scheduled.
Additionally, we probably want to have some cleanup for the fetching done by the git_auto_clone
feature. So the ref might have also already been cleaned up.
So I think we need that fetch in any case.
That would make this fetch here not necessary anymore and also avoid the need for a lockfile.
I don't think we need a lockfile here. We can probably arrange it so that it would not matter if any other commands are executed in the middle. (The use of FETCH_HEAD
definitely needs to go way but it can probably be replaced with whatever we specify on the fetch
command.)
The sha should always be in NEEDLES_GIT_HASH so the rev-parse also seems unnecessary.
That's true. So we could (and probably should) get rid of it. If one specifies a branch name via NEEDLES_DIR
the NEEDLES_GIT_HASH
still has the concrete sha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that
git_auto_clone
only helps if a Git-URL is specified asNEEDLES_DIR
(orCASEDIR
which is relevant if needles are part of the tests).
- The PR description says that it is about CASEDIR/NEEDLES_DIR being set.
- And if it is not set, there is no certain ref to fetch, just the default remote branch
- In the near future,
git_auto_clone
will also automatically update clones if CASEDIR/NEEDLES_DIR is empty. It will just update the checkout branch in the checked out directory.
And this is all we need, no?
What else needs to be fetched that git_auto_clone does not fetch?
The only thing could be forks, but as far as I can see this PR also doesn't handle forks.
Even then it is not guaranteed to fetch the refs the test actually runs on. The test might specify a branch name (or no branch which means "default branch") and when the test is finally executed on the worker host the ref that branch points to might differ from the ref checked out by the web UI at the time the test was scheduled.
What's checked out on the webui does not matter. The important thing is that it has been fetched at some point, so we can do a git show
on it, right?
Additionally, we probably want to have some cleanup for the fetching done by the
git_auto_clone
feature. So the ref might have also already been cleaned up.
Yes. I'm not sure what git already offers there and how to know what we can cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's checked out on the webui does not matter. The important thing is that it has been fetched at some point, so we can do a git show on it, right?
I meant "fetched by the web UI". And I meant the following sequence of events:
- A test is scheduled pointing to a branch, e.g. just the default branch.
- The web UI fetches the latest ref of that branch because
git_auto_clone = 1
is configured. - Someone pushes additional commits to that branch or even force-pushes there.
- The test is finally assigned to a worker and that worker now fetches a different ref than the web UI in step 2 so
NEEDLES_GIT_HASH
points to a ref that is not yet checked out (despitegit_auto_clone = 1
). - Someone wants to display the candidate needles for that job. We need to fetch the missing ref.
And this is not very contrived. That someone specifies a branch where many commits happen (e.g. the default branch of some repo with many contributors) is probably a common use case. That it can take between the scheduling of a job and the time it actually gets executed is also something we need to expect.
Just to clarify a few things, I'm currently trying to understand the feature.
The title is talking about NEEDLES_DIR, the code mentions NEEDLES_GIT_SHA, but the PR description talks about CASEDIR and TEST_GIT_SHA. I was also wondering: Even if you don't specify CASEDIR/NEEDLES_DIR, the feature is about showing the correct ref from the time the test was running, even if there was a newer version in git meanwhile, right? Because we have this ticket, but I can't find a clear description of the story, so no acceptance criteria, so I have to figure it out from the code. |
@sdclarke Hey! 👋 Did you have a chance to look at the ticket description and the acceptance criteria? To be sure we're on the same page now on what's being implemented here. |
Hi all, sorry for the lack of updates, I haven't had time to follow up on this lately.
Looking at the ticket, the acceptance criteria look to match what I was trying to achieve.
I think the title became out of sync with the description after an initial round of reviews, when I first wrote it the workflow I was using had the needles directory within the CASEDIR so I didn't initially consider them separately. I subsequently used the NEEDLES_DIR in this PR but must have missed the description.
The original problem we were having that inspired this was that the web UI might not have any version of a particular needle, as we had a workflow which involved testing new needles in branches of the tests/needles repository before merging them to the main branch. But ultimately, yes your description of the problem this PR is attempting to solve is correct, I based it on CASEDIR/NEEDLES_DIR as that was how we used it, and it provided the path of least resistance to doing what we needed. As mentioned I don't have time to work on this for now, once again sorry for my delayed response. |
514c06a
to
b31f643
Compare
If the NEEDLES_DIR (or CASEDIR as a backup) variable is a git repo URL with a fragment specifying the ref of the repository, we attempt to use it as the source of needles when viewing needle diffs in test results. If the NEEDLES_GIT_SHA variable is set, we use that instead as it should be the most accurate way to determine exactly which commit was used
There is now a minion task which will clean up alternate needle files created by the WebUI. The minimum retention time of the needle files can be set in the config, and defaults to 30 minutes
* Improve error handling * Simplify code * Split code into smaller functions * Move needle-related code into its own module so utilities don't become too big * Avoid using a shell to run Git commands; this breaks when needle paths contain characters that needed escaping * Avoid invoking Git commands if temporary needle files are still present * Use the usual directory the web UI stores cache files in (instead of hard-coding `/tmp/…`) * Output temporary files directly instead of using intermediate buffer * Keep using the real needle path as before for populating last seen/match because using the relative path breaks compatibility with existing databases and this change is supposedly not required anyway * See https://progress.opensuse.org/issues/154783
* Use the settings key `temp_needle_refs_retention` which makes it clear that this setting is about temporary needle refs * Document settings key in default `openqa.ini` * Adapt the cleanup to be in accordance with the changed path for temporary needle refs * Rename cleanup task to `limit_temp_needle_refs` to be more in-line with existing cleanup tasks * Add systemd units to enqueue the cleanup task automatically * Change the default retention to two hours (from 30 minutes) * Use the modification time instead of the access time because the access time might be updated very to easily unintendedly * Use signal guard to retry cleanup in case the gru service is restarted * See https://progress.opensuse.org/issues/154783
b31f643
to
124af5c
Compare
Description=Enqueues a needle refs task for openQA every day. | ||
|
||
[Timer] | ||
OnCalendar=hourly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if the description is accurate? OnCalendar says hourly
@@ -117,6 +117,10 @@ | |||
#git_auto_clone = yes | |||
## Experimental - Ensure a git update of all test code and needles | |||
#git_auto_update = no | |||
# whether openQA should attempt to display needles of the correct version in the web UI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling that the comment is a bit cryptic from users' side. What does it mean attempt
? Also correct version
I think is not very explanatory. I am talking from the POV of someone who do not know much about the openQA or a new user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we'll be able to explain this feature within the config file. I think this comment is good enough for people trying to enable this feature after reading documentation about it elsewhere. So we could document this feature as part of the normal openQA documentation. However, I'd be fine keeping this as an almost undocumented "expert feature" for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe we can still change the word "correct". Who wouldn't want "correct versions"? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
# whether openQA should attempt to display needles of the correct version in the web UI | |
# whether openQA should display needles from different branches in the web UI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not about the branch that was used (branches are moving targets) but really about the exact version that was used (not a moving target). This lookup will probably also be best effort. So I find the version of this comment as it is right now much better than the suggested change.
Who wouldn't want "correct versions"? :)
Someone who doesn't want to pay the price for it. We could state the disadvantages here, especially that the feature is still experimental.
my $old_timestamp = time - (120 * ONE_MINUTE + 1); | ||
my $new_timestamp = time - ONE_MINUTE + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could avoid repeated calls to time
.
If the CASEDIR variable is a git repo URL with a fragment specifying the ref of the repository, we attempt to use it as the source of needles when viewing needle diffs in test results.
If the TEST_GIT_SHA variable is set, we use that instead as it should be the most accurate way to determine exactly which commit was used