Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: move build-push-ecr to Docker actions #41

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

boringcactus
Copy link
Member

@boringcactus boringcactus commented Nov 15, 2023

docker/build-push-action has a lot more functionality than is available in a bare docker build command, like caching with GitHub Actions and cleaner support for multiple tags. As such, some projects don't use mbta/actions/build-push-ecr, which means our Docker build process is fragmented. Moving this action to docker/build-push-action should let more complexity move to the shared actions.

This was also the intent of #28, which stalled out in the review process. It was a great foundation to build on - thanks a ton @thecristen! - but I've made a few additional changes:

  • As it stands, the docker-additional-args input is just arbitrary command line flags passed to docker build, and docker/build-push-action has no mechanism to provide arbitrary command line flags, since it wants to encapsulate all of the available complexity in its own inputs instead. Conveniently, however, GitHub is willing to list all the @mbta repositories that use this action, and only two of them pass docker-additional-args (Skate and OTP-deploy), and both of those are only using --build-arg, so with a bit of shell scripting it's possible to maintain backwards compatibility. It's not really pleasant, though, so I'd be inclined to encourage those projects to migrate to a more direct mechanism as soon as this gets released, so it can be removed promptly before anything else starts to use it.
  • docker/metadata-action, which I hadn't known about until reading build-push-ecr refactor using docker/build-push-action@v3 #28, has a lot of extremely cool functionality for applying tags conditionally. Glides, which applies an extra tag to prod deploys and uses that tag in the ECS task definition, has a ton of mess that can be mostly replaced with the docker/metadata-action tag logic. As such, it seems like it's worth trying to allow inputs to directly pass docker/metadata-action tag rules. The existing behavior of allowing a space-separated list in docker-additional-tags is more widely used than the existing docker-additional-args behavior, though, and it would be both more annoying and less valuable to migrate that department-wide, so I think it probably makes sense to support both behaviors indefinitely.
  • Tags are stuffed into the task output directly, via GitHub Actions's multiline strings in output files syntax, so there is no temporary file generated.
  • The dockerfile-path argument does not actually represent a path to a Dockerfile, but rather the path to the folder containing the Dockerfile. Since the current process is not docker build -f ${{ inputs.dockerfile-path }} but docker build ${{ inputs.dockerfile-path }}, the dockerfile-path input is properly provided to docker/build-push-action not as file: but as context:. The name is misleading, but that can't be changed backwards-compatibly.
  • Logs into the Docker registry with aws-actions/amazon-ecr-login instead of passing long-lived credentials to Docker directly.
  • Loads the container into the local Docker instance with load: true, and then pushes separately with docker push --all-tags. This is actually a really elegant solution to the fact that docker/build-push-action is broken in such a way that push: true and load: true are mutually exclusive, and I wish I'd found --all-tags in the docker push documentation earlier (although it may not have been there earlier).

Open questions:

  1. At what point should a composite action be replaced with a reusable workflow? I think this may be across that line - it seems like it could be helpful to have the nicer UI for nested workflow calls, given how much is going on in this process now - but I don't know where the line really is. I'd be at peace moving this to mbta/workflows, but everything would have to migrate manually. (That'd save us some backwards compatibility issues, though.)
  2. For some reason, aws-actions/amazon-ecr-login appears to technically leak the AWS account ID (via the ECR registry URL) in the GitHub Actions output. (Ask me on Slack for the link that shows this.) Does that matter?
  3. How much testing is it worth doing before releasing this? I think most repos pin to mbta/actions@v2 rather than a specific minor version, but if this does wind up breaking something, it's not all that difficult to pin to @v2.2.1 or whatever until we can get it fixed.
  4. resolved: not if we pull: true. Do we need the ability to disable the cache? I think we're mostly pretty good about starting with immutable images in our Dockerfiles, but if anything is doing FROM alpine:latest or what have you, it may pull from the cache before it pulls from the Docker Hub, in which case it'd always be at the old version of alpine:latest now that we're caching, causing this to technically be a breaking change. Several things have to go very wrong before that becomes a problem, though.

@paulswartz
Copy link
Member

@boringcactus thanks for taking another stab at this!

Open questions:

At what point should a composite action be replaced with a reusable workflow? I think this may be across that line - it seems like it could be helpful to have the nicer UI for nested workflow calls, given how much is going on in this process now - but I don't know where the line really is. I'd be at peace moving this to mbta/workflows, but everything would have to migrate manually. (That'd save us some backwards compatibility issues, though.)

I don't see why you couldn't also make a reusable workflow for people to adopt, but I think that would be a larger change as a resuable workflow can't be used in quite the same way as an action.

For some reason, aws-actions/amazon-ecr-login appears to technically leak the AWS account ID (via the ECR registry URL) in the GitHub Actions output. (Ask me on Slack for the link that shows this.) Does that matter?

I don't love it (cc @ianwestcott ) but I suppose if Amazon thinks it's okay I guess it's fine.

How much testing is it worth doing before releasing this? I think most repos pin to mbta/actions@v2 rather than a specific minor version, but if this does wind up breaking something, it's not all that difficult to pin to @v2.2.1 or whatever until we can get it fixed.

I'd definitely want to test it with some existing applications, especially ones that you've already called out as using some of the non-default features.

Do we need the ability to disable the cache? I think we're mostly pretty good about starting with immutable images in our Dockerfiles, but if anything is doing FROM alpine:latest or what have you, it may pull from the cache before it pulls from the Docker Hub, in which case it'd always be at the old version of alpine:latest now that we're caching, causing this to technically be a breaking change. Several things have to go very wrong before that becomes a problem, though.

Is it possible to make using the cache optional, such that teams would need to opt-in to using it? Another approach would be to make this a non-backwards compatible update (either by releasing v3, making an action with a different name, or creating the reusable workflow).

with:
images: ${{ inputs.docker-repo }}
tags: |
type=sha,priority=1000,prefix=git-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: does this generate the SHA tag in the same way the old script did?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands, it fetches the git SHA from the GitHub Actions context, which should still be the same thing since we aren't doing anything wacky like checking out a different commit than the workflow is actually running on. If we want to handle weird edge cases like that just in case, we can pass context: git, but I don't think there's any point in that.

- uses: docker/setup-buildx-action@v3
- uses: docker/build-push-action@v5
with:
load: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: you could add pull: true to ensure that even with the cache, Docker tries to pull down updated images. I think with that, we could enable caching by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should work, yeah.

@boringcactus
Copy link
Member Author

Reusable Workflow

I think the benefits of moving to a reusable workflow would only really be apparent if something in this process breaks and needs to be debugged, and migrating would be a minor nuisance in the most basic cases but somewhat more involved for our web apps that need to extract static assets from the container image to upload them to a CDN or send source maps to Sentry. Having both a reusable workflow and a composite action would involve either maintaining them both in parallel (bad) or just calling the action from the workflow (pointless). If we were starting from scratch today, I think it'd make more sense to use a reusable workflow here and then put in more standardized tooling around asset extraction (which might be helpful anyway), but the migration probably wouldn't be worth the effort.

That said, Glides's existing deployment workflow involves asset extraction, so I could experiment somewhat with how difficult it actually is to build that process around a reusable build-push-ecr workflow. That migration effort might even be somewhat containable - when deploy-ecs moved from user-based to role-based authentication in mbta/actions@v2, @krisrjohnson21 went around to a bunch of different repos to make PRs to update GitHub Actions workflows accordingly, so if we thought it made sense and I had the time for it I could do something similar. This would be a more involved migration, though, since it isn't a drop-in replacement, so the teams maintaining the code would need to be more involved in the review process. Kris, is there any advice you'd give based on your experience with department-wide GitHub Actions workflow migration?

Guidelines for reusable workflow vs composite action that make sense to me as of this exact moment:

  • If the task requires more than just actions/checkout and perhaps asdf-vm/actions/install before it can be run, use a composite action. For example, even if mbta/actions/dialyzer was a shell script instead of a JS action, it wouldn't work as a reusable workflow, since it needs the Mix dependencies to be present. (If we standardize our dependency caching, this could change, but I didn't even know mbta/actions/npm-install even existed, and only one repo appears to use it.)
  • If the task has a single nontrivial step, use a composite action. For example, there would be no benefit to having mbta/actions/notify-slack-deploy as a reusable workflow.
  • If the task can be run from a clean checkout of the repository and has multiple nontrivial steps which could conceivably fail and need to be debugged, use a reusable workflow. For example, mbta/workflows/deploy-ecs doesn't require anything beyond the code, and it can fail in the container build or in the deployment.

If these guidelines are valid, then the existing implementation of build-push-ecr arguably makes sense as a composite action, depending on whether or not the supplementary tagging is considered nontrivial, but this implementation of build-push-ecr splits some logic out into docker/metadata-action and thereby moves across the line into reusable workflow territory.

Testing

I've tested in a project that only uses the advanced tag manipulation, and that feature works properly, including the output selection based on priority. I've opened mbta/otp-deploy#29 to validate the build arg logic with multiple flags, so once that gets run, we'll be confident that that works.

Unfortunately, the GitHub dependency graph feature hides private repos and forks even if I have permission to see them, so I didn't see that we have a few more repos that also pass docker-additional-args, although conveniently they are still all --build-arg so there's no need to handle other types of additional args.

Only a few projects use docker/build-push-action directly, so migrating those onto this action should be fairly straightforward.

@thecristen
Copy link
Contributor

That said, Glides's existing deployment workflow involves asset extraction, so I could experiment somewhat with how difficult it actually is to build that process around a reusable build-push-ecr workflow.

I'd be excited to see how that turns out!

Not sure if it's a similar situation, but IIRC the primary reason Dotcom has its own bespoke setup is because, in the middle of our build-push-deploy steps, we have an extra step that extracts assets from the built image and uploads them to S3. So I'm not sure how I'd manage that with a reusable build-push-deploy action. 🤔

@boringcactus
Copy link
Member Author

With a reusable workflow that has built-in support for extracting some files from the container, the process of migrating Glides winds up being only somewhat complicated, and the resulting UI is a lot nicer for exploring the output of each individual step than the composite action would be. Anyone who isn't handling Sentry release configuration the exact way we are would have a simpler time than that.

I did change my mind about having the cache be optional - since Glides passes the Sentry release name into the container via a Docker build arg, any layer after that is useless to cache, and any layer before that is already cached because we build our container in the CI process via docker/build-push-action already.

With a composite action, passing load: true into docker/build-push-action means later steps in the same job can just use the built container image directly, but with a reusable workflow, there are no later steps in the same job, so we have to get a bit creative. My current solution is to upload a tarball of the extracted files as a GitHub Actions artifact, so that a later job can download that artifact and extract the tarball, but that only matches how Glides approaches asset extraction, so perhaps a different approach would be more broadly compatible. (Both dotcom and Skate have shell scripts that take in a Docker image name as a parameter, so it might work to have the reusable workflow offer to run a shell script like that, but perhaps environment configuration or something would be an obstacle, I'd need to explore further. It'd probably be fairly straightforward to adjust the dotcom and Skate shell scripts to work with an assets tarball transferred via GitHub Actions artifacts.)

@ianwestcott
Copy link
Contributor

For some reason, aws-actions/amazon-ecr-login appears to technically leak the AWS account ID (via the ECR registry URL) in the GitHub Actions output. (Ask me on Slack for the link that shows this.) Does that matter?

@boringcactus thanks for flagging this; I've opened #42 to address it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants