Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build puller/loader from source for repository rules #1918

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

gravypod
Copy link
Collaborator

PR Checklist

Please check if your PR fulfills the following requirements:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

PR Type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update (formatting, local variables)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Other... Please describe:

What is the current behavior?

Issue Number: N/A

What is the new behavior?

Does this PR introduce a breaking change?

  • Yes
  • No

Other information

@gravypod gravypod requested a review from pcj July 23, 2021 20:28
@google-cla google-cla bot added the cla: yes label Jul 23, 2021
@gravypod gravypod force-pushed the build-go-cmds-in-repository-rule branch from 8f01414 to b8cd944 Compare July 29, 2021 21:32
@gravypod gravypod force-pushed the build-go-cmds-in-repository-rule branch from b8cd944 to c617a02 Compare July 29, 2021 21:50
@gravypod gravypod requested a review from pcj as a code owner July 29, 2021 21:50
@gravypod gravypod force-pushed the build-go-cmds-in-repository-rule branch from c617a02 to d332683 Compare July 29, 2021 21:53
@sluongng
Copy link
Contributor

sluongng commented Aug 4, 2021

Just a bit curious: whats the motivation for this PR? Also what is the current status? Do you need any help or having any blocker?

@gravypod gravypod changed the title WIP: Starting to build puller/pusher instead of downloading Starting to build puller/pusher instead of downloading Aug 6, 2021
@gravypod
Copy link
Collaborator Author

gravypod commented Aug 6, 2021

@pcj, so I've done another pass and it looks like the pusher is not something that needs to be built during the repository phase. What we cooked up that night was mostly right. I think the last thing I need to add is some sort of CI check that the go.mod is up to date and in sync with the bazel files which I'm not too sure on. I'll ping some people from the Bazel slack.

@gravypod gravypod requested a review from alexeagle August 6, 2021 22:27
@gravypod
Copy link
Collaborator Author

gravypod commented Aug 6, 2021

Just a bit curious: whats the motivation for this PR? Also what is the current status? Do you need any help or having any blocker?

Sorry about the delayed reply @sluongng. My main blocker was balancing this with other work. Right now rules_docker downloads a copy of the puller binary during the repository phase. This is for container_pull()s in your WORKSPACE file. This is done because during the repository phase we do not have access to go_binary() yet (that's loaded in later) and cannot build binaries. This MR does something slightly odd that some of the other golang tooling does: it uses go build in the repository phase to make a hermetic build of our tooling for your local platform.

The main motivations are as follows:

  1. None of us have access to the GCS bucket that rules_docker downloaded from previously.
  2. The CI for updating the puller broke at some point and no one knows how it was configured.
  3. To add a new os (linux, windows, mac) and platform (x86, arm, etc) pair requires us to build and push a new binary in CI.

This is obviously not ideal and was causing the following platforms to not be supported:

  1. M1 Macbooks
  2. Windows

By merging this we will remove the need to download an external binary to use rules_docker in exchange for having a hard dep on rules_go. This is not a perfect win in my book as I'd prefer to not have a dependency on rules_go but right now many parts of the repo already depend on this.

@gravypod gravypod force-pushed the build-go-cmds-in-repository-rule branch from d332683 to 42157c2 Compare August 6, 2021 23:35
@iffyio
Copy link

iffyio commented Aug 19, 2021

Hi, just wanted to leave some findings here after we've tried testing this PR out on windows. Unfortunately, we're still unable to get the rule to work on windows:

  • this action doesn't run on windows

    ctx.actions.run(
    outputs = [output_tar, output_metadata],
    executable = output_script,
    inputs = [image_tar],
    tools = [ctx.executable._extract_image_id],
    use_default_shell_env = True,
    )

    we get the following error

    Action failed to execute: java.io.IOException: ERROR: src/main/native/windows/process.cc(202): CreateProcessW("C:\bzl\xf6b35ik\execroot\embark\bazel-out\x64_windows-fastbuild\bin\pkgs.sh"): %1 is not a valid Win32 application.
    

    and needed to patch it to use a bash compatible action to run the generated download_pkgs script.

  • the generated script itself isn't compatible with windows filepaths, the toolchain currently uses which to figure out docker cli path which on windows can return something like C:\Program Files\Docker\... (even after converting backslashes to forward slashes), the download script commands fail with something like

C:/Program: No such file or directory
  • Beyond patching the filepath issue in the script, it also seems the docker daemon itself doesn't like windows filepaths which the client sends to it, at some point we get the following error back from docker
docker: Error response from daemon: the working directory 'C:/Program Files/Git/' is invalid, it needs to be an absolute path.
See 'docker run --help'.

@gravypod
Copy link
Collaborator Author

@iffyio I definitely need to do more testing on windows. This is only a first step towards adding support for other platforms. As a test, could you let me know if you're able to execute the puller that is being built by the repository rules? If we can then that's a big step in the right direction.

@sluongng
Copy link
Contributor

Sorry about the delayed reply @sluongng. My main blocker was balancing this with other work. Right now rules_docker downloads a copy of the puller binary during the repository phase. This is for container_pull()s in your WORKSPACE file. This is done because during the repository phase we do not have access to go_binary() yet (that's loaded in later) and cannot build binaries. This MR does something slightly odd that some of the other golang tooling does: it uses go build in the repository phase to make a hermetic build of our tooling for your local platform.

The main motivations are as follows:

  1. None of us have access to the GCS bucket that rules_docker downloaded from previously.
  2. The CI for updating the puller broke at some point and no one knows how it was configured.
  3. To add a new os (linux, windows, mac) and platform (x86, arm, etc) pair requires us to build and push a new binary in CI.

@gravypod Hey. Yeah these motivations are clear to me now. But I have words of caution:

Many folks in different Open Source projects using Bazel prefer to have a binary dependency instead of having to build the binary to avoid expensive dependencies download. One biggest example would be depending on the kubectl binary: to compile the binary, you would need to download the entire dependency tree of Kubernetes project at that release, which add hundred of MB to the bootstrap process of a project.

So it might be worth clarifying (perhaps in the Change Log once this is merged) whether switching from Binary dependencies to 'compile from scratch' dependencies would add X amount of downloaded data from Y combination of additional Go libs needed to be downloaded and compile.

It would be even better if we can just have a knob somewhere that let people override these binary dependencies with their own 🤔 . So that they can replace it with a binary dependencies if needed, or specify an existing build target in their workspace to have it compiled from scratch. But that does not need to be in the same scope as this PR.

Thanks.

@gravypod
Copy link
Collaborator Author

gravypod commented Aug 21, 2021

Many folks in different Open Source projects using Bazel prefer to have a binary dependency instead of having to build the binary to avoid expensive dependencies download. One biggest example would be depending on the kubectl binary: to compile the binary, you would need to download the entire dependency tree of Kubernetes project at that release, which add hundred of MB to the bootstrap process of a project.

@sluongng, that was exactly my approach as well. I made an initial prototype of what it might look like if we did something like that in my personal binaries-in-repo branch. In this branch I set it up so that we could just refer to the precompiled puller binaries as files and choose the correct binary at runtime during the rule's execution. I also made a _test to validate the binaries were up to date in a presubmit.

The other maintainers on rules_docker pointed out these issues:

  1. This is currently ~12mb/(OS+ARCH) and will only grow if we add more functionality.
  2. If we wanted to slim this down we would need to do a rules_docker_{os}_{arch}.zip distribution but this was considered to be too was difficult.
  3. This would grow the git repo size. This wasn't my main concern as I'm pretty used to using git-lfs but it was raised. To mitigate this we'd have to go back to using http_file which is a pretty bad development UX. If you change the puller/pusher there'd be no way to test that code in a CI without pushing the puller to an object store first. This is all possible, just reduces velocity. It's much easier to do ibazel run //precompiled:update_precompiled or something and have any changes automatically built for you.
  4. We'd need to manually curate a list of platforms and OSs to build for. This isn't impossible since there's a finite amount but it is annoying for users of more obscure platforms (I think we had some people who use an s390x using this rule set?)

That's why we're trending this way. If there are easy solutions to these that you can point to I'd much rather adopt them. This approach is pretty hacky.

So it might be worth clarifying (perhaps in the Change Log once this is merged) whether switching from Binary dependencies to 'compile from scratch' dependencies would add X amount of downloaded data from Y combination of additional Go libs needed to be downloaded and compile.

Since we are using golang's vendoring feature there would be no other downloads besides the source in our repository. However, it is important to point out that all of this source code has had a material size increase on our repo:

➜  Downloads du -h rules_docker-build-go-cmds-in-repository-rule.zip rules_docker-master.zip
1.5M	rules_docker-build-go-cmds-in-repository-rule.zip
856K	rules_docker-master.zip

That's a 1.7x increase in repo size. This is, however, much smaller than the binaries we were downloading before:

➜  Downloads wget https://storage.googleapis.com/rules_docker/aad94363e63d31d574cf701df484b3e8b868a96a/puller-linux-amd64
--2021-08-21 09:50:37--  https://storage.googleapis.com/rules_docker/aad94363e63d31d574cf701df484b3e8b868a96a/puller-linux-amd64
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.35.176, 142.251.32.112, 142.250.81.240, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.35.176|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7955988 (7.6M) [application/octet-stream]
Saving to: ‘puller-linux-amd64’

puller-linux-amd64  100%[===================>]   7.59M  43.6MB/s    in 0.2s    

2021-08-21 09:50:38 (43.6 MB/s) - ‘puller-linux-amd64’ saved [7955988/7955988]

So, if we are looking at this from an "absolute change in download size" we are saving ~7 MB and removing network access assuming you use local_repository().

It would be even better if we can just have a knob somewhere that let people override these binary dependencies with their own . So that they can replace it with a binary dependencies if needed, or specify an existing build target in their workspace to have it compiled from scratch. But that does not need to be in the same scope as this PR.

Agreed, this is a good idea. This is also a binary that, technically, not everyone actually needs. I'd like it if rules_docker shifted from making cc_image and go_image and instead made it easy to take a bunch of binaries + files and put them into image layers. Macros like cc_image could then be built on top of this base functionality. We could then use a rules_distroless to fetch all of the packages and create base images hermetically which would side step the need for most users to ever use container_pull. We're a long ways off from this kind of thinking but anything we can do to reduce the feature set of rules_docker, or allow users to opt out and supply their own, would be great. I see rules_docker as rules_pkg with metadata.

@alexeagle
Copy link
Collaborator

I think it's quite reasonable for us to publish releases to github which consist of the existing .tgz as well as a distribution of the puller binary for each platform.
rules_go makes it quite easy for us to build&support all the popular os+arch
Similar to how https://github.com/bazelbuild/buildtools/releases/tag/4.0.1 does their pushes.

Then we'd include a bazel repository rule and toolchain in rules_docker that provides the puller for your platform. We have some of these in rules_nodejs that I can describe if that's helpful, for fetching node and now esbuild and cypress binaries.

Reducing the dependencies here isn't just about network size. It's also sometimes a spaghetti project of making all your WORKSPACE deps happy with the rules_go we'd require (and maybe bazel_gazelle to get the go_repository rule?). You might have to upgrade your rules_go, and error messages won't make it clear if that's the case. I'd be happy to set aside some time to help with publishing static-linked puller binaries as part of the release process to make this possible.

Copy link
Collaborator

@alexeagle alexeagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason to vendor the dependencies? I'd much rather use go_repository to fetch them.

I just recently added some Go code in rules_python (bazelbuild/rules_python#514) so the pattern for getting the dependency resolution setup is quite fresh.

@pcj
Copy link
Member

pcj commented Aug 24, 2021 via email

@gravypod
Copy link
Collaborator Author

@pcj, do you think it would make sense to use gazelle to generate BUILD files for our vendor folder instead of having two ways of specifying deps? This would help make sure that the two build chains never fall out of sync.

@MatthieuSarter
Copy link

FYI, I tested this on Linux ppc64le to pull an image, it worked fine. Thanks 👍

@iffyio
Copy link

iffyio commented Aug 31, 2021

@iffyio I definitely need to do more testing on windows. This is only a first step towards adding support for other platforms. As a test, could you let me know if you're able to execute the puller that is being built by the repository rules? If we can then that's a big step in the right direction.

Hi @gravypod sorry about the slow response, I can confirm that the puller does in fact work on our repo on windows yes!

@gravypod
Copy link
Collaborator Author

Friendly ping to @alexeagle and @pcj. If we can get some reviews let me know.

@pcj
Copy link
Member

pcj commented Sep 16, 2021

@gravypod I think the code looks good (I didn't see any changes since we paired on it, is there a particular spot I should look at?). I think it is mostly missing documentation.

As for windows, yes that should be tested more thoroughly. I'm not sure if windows is considered a supported platform or not for rules_docker. @philwo is windows a possibility on bazel buildkite? If not it might be worth setting up a github action to test this part of the rules.

@pcj
Copy link
Member

pcj commented May 13, 2022

@ianoc A few comments:

  • Removing golang from rules_docker is IMHO out-of-scope, the effort is too large, without a clear reason. If you really want containers with bazel and not go, I'd suggest starting a new repo.
  • The diff here is actually not big. Ignore LOC from vendored code, it's a useless metric.
  • The puller is used in a repository rule. As such, the puller binary would be part of the repository cache, not the action cache. But yes, it's cached.
  • I'm not sure what you mean by confusing errors (all bazel errors are confusing :) ). Is there a more specific concern?

@ianoc
Copy link
Contributor

ianoc commented May 13, 2022

  • Could you speak to why it is large? the pattern that is in the repo now would support it easily no? Just means adding a few current binaries to the same approach? The clear reason is that we don't (and i assume plenty of others don't) use go, but even when we have to for dependencies such as this we have to chase down compatible matrix of os/go/bazel/rules_docker combinations. Which seems unnecessary for a system such as rules_docker to have.
  • i'm not sure i'd agree its useless, its another set of dependencies and concerns. And seems unnecessary
  • Unless i'm mistaken thats not how the repository cache works. Downloads are kept in that cache, actions taken by repository rules are not hermetic and are not cached by bazel. They are just on disk state. Changing branches back and forth for instance would blow them away with no caching
  • I mean we have a different way to invoke go than using rules go. If a user gets an error they have to care about custom code paths here, on every possible location its used. If anything is setup/configured to occur on the command line, if things that are on disk and effect commands run/compiles because they aren't sandboxed. All possible sources of issues.

@pcj
Copy link
Member

pcj commented May 13, 2022

To restate the problem:

Note @ianoc this is not a response to your last comments, I was writing it independent of that.

  • The puller is used by pull.bzl container_pull, which is a repository rule. That means that the binary must be available by the time a container_pull rule is called by bazel, which is before any bazel actions would run, preventing one from building the puller via something like go_binary.
  • Historically, cloudbuild was used to construct new pullers, and a rules_docker maintainer would have to go and update the source of the repositories/repositories.bzl file and update all the storage urls and hashes. Not fun. Since cloudbuild for rules_docker was discontinued a long time ago, the puller hasn't been updated in years. I doubt anyone can remember or knows what exactly the version of the puller we are actually using is.
  • One can build artifacts via github actions, or any CI service, that's not the hard part. But you still have to periodically go and update the source repo, or invent some bot to do that? Extra work.
  • Also, every time someone wants new architecture, it has to be explictly provided (e.g. Providing ppc64le puller and loader #1790)

Gazelle has the same issue as rules_docker in that it both houses the source code, yet references it in a repository rule. For that reason, @jayconrod came up with this solution, which I think is positively brilliant. You just use the go tool from @go_sdk and basically just call go build inside a repository rule, setting up symlinks such that the source is visible to the compiler, and the output binary goes into the correct place, which in the end (in this case) is @rules_docker_repository_tools//:bin/puller.

So, this PR makes is possible to support all architectures, without any extra work from maintainers, and have zero dependencies on external artifacts. I think that's a huge win.

@johnynek
Copy link
Member

I think it is a real bummer to say that making a set of rules closer to hermetic is out of scope. You may think it is fine to require go, the next set wants jq on the path, the next assumes something else. Any single one isn't a giant deal, but as I thought was an agreed upon principle in the bazel community: the existence of each of them adds a tax when composing rules.

@pcj pcj changed the title Starting to build puller/pusher instead of downloading Build puller/loader from source for repository rules May 13, 2022
@pcj
Copy link
Member

pcj commented May 13, 2022

@johnynek nice to hear from you, it's been a while since I sat next to you at bazelcon like 4 or 5 years ago! It is a misunderstanding that this PR compromises hermiticity. The compiler tool will have the same hash as the user requested @go_sdk//:bin/go, the source is at a known commit, and the deps are vendored. I don't see how jq is relevant. This isn't about using random tools expected to exist in the user's workstation PATH.

@sluongng
Copy link
Contributor

sluongng commented May 13, 2022

It's might be interesting to point out how go_repository is currently working. A go_repository target would go through the following steps during Bazel loading phase:

  1. Download archive (via different channels)
  2. gazelle is run over downloaded archive
  3. Patches are applied (if there are any)

For these to happen at Bazel loading phase, go_repository depends on 2 go binaries: fetch_repo and gazelle.

So for a very long time, these 2 binaries have been prepared like this https://github.com/bazelbuild/bazel-gazelle/blob/1f6da0ab55f094c2a33e21292f5e2db460b5d857/internal/go_repository_tools.bzl#L87-L101 where we invoked go install against current archive of @bazel_gazelle being setup in a mock GOPATH.

This PR essentially use that very same approach to prepare the go binaries that rules_docker needed.

The problems that we are trying to solve with this PR are:

  1. How to keep the binaries relatively up-to-date compare to the rest of the repository
  2. While keeping the cost of maintenance over rules_docker project relatively manageable

With that said, I think there are sufficient community concerns regarding this PR that I would suggest we slow down how this is adopted:

  1. Let's merge the code that would help compile the go binaries without using it. We continue to use the old pre-compile binaries after this PR was merged
  2. In a separate PR, let's introduce a flag use_precompiled_binaries that let's people optionally opt-out to compiling the binaries as part of rules_docker. This flag is default to True, effectively does not affect anybody for now.
  3. Prepare communication for a major release where we flip the default value of the flag. (Pinned issue, headline in README.md, announce in release notes etc...)
  4. Users who do not wish to opt-in to this after the major release should still have the option to bring their own pre-compiled binaries and used it. Ofc they will have to manage feature compatibility as well as updating the pre-compiled binaries themselves.

@gravypod @pcj wdyt?

@sluongng
Copy link
Contributor

To clarify a bit more on my suggestion:

I think this PR is changing a behavior that many users have been relying on. It's doing a bit too many things at once: introducing new code, new feature, new dependencies and removing old feature.

While that might be totally fine in an organization which is used to moving fast and ok with disruptive changes, in the opensource world there are less appetite for such move. So I have a feeling where if we can deliver this feature slower, in smaller incremental steps, with enough support for backward compatible during the transition period, it would be a much easier pill to swallow.

@ianoc
Copy link
Contributor

ianoc commented May 13, 2022

@sluongng Thats great context thank you, I hadn't realized the go rules do this inside themselves. I would still have reservations that they are the location controlling the rules(+ go version itself) and thus generally some more leeway should be given for horrible hacks for self-bootstrapping. However in general i think your general compromise sounds pretty good.

Another aspect, I did open a PR which I believe would solve this issue without having this approach with building go. Its a much smaller diff to add the capabilities to GH Actions to build these binaries. And the template would let us move the other direction and eliminate the go dependency of the rules if we wished in future. Both have a synchronization challenge, this with vendoring of dependencies (and thus if i'm not mistaken meaning we have 2 sets of dependencies these tools can be built with, what bazel would do it, and what the vendored copies are?), and the other that we have to trigger a separate release to have github build all the binaries and then update the repositories file. Both approaches avoid the historic issues with the current binaries i believe around updating + forking.

@pcj
Copy link
Member

pcj commented May 13, 2022

@sluongng Yes thanks for making it more clear how go_repository rule works. Note this is also how the proto_repository works (https://github.com/stackb/rules_proto/blob/master/rules/private/proto_repository_tools.bzl).

I'd say if we really want to feature-flag it, let's just add the flag now so it can be tested by the community. I'm not sure this is a "feature" though, all we are doing is changing the binary used by container_pull and container_load; the behavior is the same. If anyone would like to try it now, you can test it via https://github.com/pcj/rules_docker_1918.

We can also add a gazelle update-repos rule and make the go.mod file the source of truth, another small improvement that is long overdue.

@johnynek
Copy link
Member

@johnynek nice to hear from you, it's been a while since I sat next to you at bazelcon like 4 or 5 years ago! It is a misunderstanding that this PR compromises hermiticity. The compiler tool will have the same hash as the user requested @go_sdk//:bin/go, the source is at a known commit, and the deps are vendored. I don't see how jq is relevant. This isn't about using random tools expected to exist in the user's workstation PATH.

Indeed! I remember you as well. I hope you have been well since.

I apologize for my misunderstanding. My (clearly poor) understanding was that we were expecting go to be installed on the machine. As we are using these rules more, I became concerned if that were true, but I am happy to be wrong! Thanks for taking the time and being patient with me.

@alexeagle
Copy link
Collaborator

the next set wants jq on the path

off-topic, but that toolchain exists too: https://docs.aspect.build/aspect-build/bazel-lib/v0.9.6/docs/jq-docgen.html#jq

@uhthomas
Copy link
Collaborator

uhthomas commented Jun 7, 2022

I have some concerns with building anything during the analysis phase.

The rust ruleset has this exact same issue. The crate_universe rule uses a rust binary which may either be a prebuilt asset, or built during the analysis phase similar to this.

bazelbuild/rules_rust#1287

Go compiles much faster, but the problems are still the same. Repository rules are not cacheable, not guaranteed to be hermetic and may make analysis much slower. This is mostly Bazel's fault, I think. Most rulesets encounter this chicken-egg problem for repository_rules and there should be native support for it.

@sluongng
Copy link
Contributor

sluongng commented Jun 7, 2022

This is mostly Bazel's fault, I think. Most rulesets encounter this chicken-egg problem for repository_rules and there should be native support for it.

@uhthomas would you mind starting an issue in https://github.com/bazelbuild/bazel for this? Might be an interesting topic to have interleaving build-load-analysis in bazel (fairly complex problem as well).

@uhthomas
Copy link
Collaborator

uhthomas commented Jun 7, 2022

@github-actions
Copy link

This Pull Request has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days.
Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_docker!

@github-actions github-actions bot added the Can Close? Will close in 30 days unless there is a comment indicating why not label Dec 25, 2022
@uhthomas uhthomas removed the Can Close? Will close in 30 days unless there is a comment indicating why not label Jan 13, 2023
@grepwood
Copy link

I'm actually very much in favor of either this, or moving these extra binaries into GitHub releases. Where I'm using rules_docker, I don't have Internet access except GitHub access via local Artifactory, so I'm looking at either smuggling storage.googleapis.com artifacts into a manually managed Artifactory repo, or persuading our security team into opening a mirror to storage.googleapis.com.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants