Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review dotnet sdk caching error message #28

Open
1 task
dgor82 opened this issue Apr 2, 2024 · 0 comments
Open
1 task

Review dotnet sdk caching error message #28

dgor82 opened this issue Apr 2, 2024 · 0 comments
Assignees
Labels
infrastructure DevOps, Db, Scripts etc.
Milestone

Comments

@dgor82
Copy link
Contributor

dgor82 commented Apr 2, 2024

Relates to #16

  • Review if the following error still appears in the dotnet sdk caching step of the main branch workflow:

Failed to restore: "/usr/bin/tar" failed with error: The process '/usr/bin/tar' failed with exit code 2

Despite this error, the cache seems to be used fine as the dotnet sdk and workload installs only take 1 or 2 seconds. I suspect the flakiness is due to GitHub Infrastructure and want to let a few weeks pass by to see if the problem solves itself. See further below in comment my chat with Paul B. about it.


Chat with Paul about the unexplained warning messages in the workflow:

Daniel

So this is the situation:

  • My GitHub Action workflow behaves as expected when it comes to caching (and reusing that cache) of the dotnet SDK incl. workloads (in total, several GB). I can tell because, when caches are present (which I can see on the GitHub U.I.) the sdk and workload installs take 2 seconds or less. Otherwise between 40sec and 1 min. Also, the GitHub U.I. shows when a particular cache was last used, and next to the correct cache it shows that it was just used.

  • However, in terms of GH’s output, on two points, it seems to indicate that using the cache does not work out, including the output of some Warnings.

The pragmatic side in me says to ignore the warnings and output indicating failure to use cache, because the system does what I expect it to right now. But there is another side in me that can’t find peace with it, it doesn’t feel clean to leave it like that. I want to get to the bottom of things.

What would you do in my shoes, I’m wondering?

In case that’s of relevance, here are the two faulty outputs from GH Actions:

  1. The following ‘step’ gets skipped i.e. the if condition is not met, which typically means no cache was found and used:
  • name: Check cache-dotnet hit
    if: steps.cache-dotnet.outputs.cache-hit == 'true'
    run: echo "There was a cache-hit for restoring dotnet SDK & Workload dependencies."

The syntax is correct, I use the exact same syntax for checking a cache-hit for the NuGet packages and there it works as expected.

  1. The following output in the cache step:

Warning: Failed to restore: "/usr/bin/tar" failed with error: The process '/usr/bin/tar' failed with exit code 2

Cache not found for input keys: Linux-build-cache-dotnet-Release_Desktop-1ea4996cf89ace1a7b765f1d5241ef63e8a1c2a728d49df7953a55b3071f2dd6

(even though that’s EXACTLY the cache it actually does end up using)

Basically a case of error message about an error that doesn’t actually happen. Ignore?
So yes, my question to you is not about the details or how to debug, but about the principles of whether, in my CI/CD pipeline it’s pragmatic to ignore error messages about errors that don’t actually happen 😃
(or whether that’s negligent and stupid)

Paul

I understand where your concern comes from and I share it. I would want to get to the bottom of it.

There’s one mitigating factor: if this warning originated as part of the build of the code that comprises the final release, I would 100% want to get to the bottom of it (because I would be worried that there’s some problem that could affect production which might bite me at an inopportune moment).

This, though, is a warning in your build infrastructure. Which clearly carries less risk.

So it would be OK to pragmatically decide to ignore it. I’d perhaps therefore want to “time box” an investigation: if I can’t work it out after an hour (or whatever) of investigation then allow it to slide.

Time boxing is a surprisingly powerful technique: particularly in “difficult” situations.

@dgor82 dgor82 added this to CheckMade Apr 2, 2024
@dgor82 dgor82 moved this to Backlog / User Story in CheckMade Apr 2, 2024
@dgor82 dgor82 self-assigned this Apr 2, 2024
@dgor82 dgor82 added the infrastructure DevOps, Db, Scripts etc. label Apr 2, 2024
@dgor82 dgor82 added this to the MVP milestone Apr 2, 2024
@dgor82 dgor82 moved this from Backlog to Idea / Raw Epic / LT in CheckMade Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure DevOps, Db, Scripts etc.
Projects
Status: Idea / Raw Epic / LT
Development

No branches or pull requests

1 participant