-
Notifications
You must be signed in to change notification settings - Fork 797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buildah 1.38.1 build "FROM oci-archive:" fails to find content generated inside a build #5952
Comments
Detected in the Image Mode pipelines which cannot use v1.38.1 for now. https://gitlab.com/redhat/rhel/bifrost/compose-bootc/-/merge_requests/208/diffs |
@henrywang do we have an easy reproducer? |
I think this is a regression from 25a3b38 This feature is really key as a way to generate chunked/reproducible containers and we've been using it for quite some time for rpm-ostree, and we have plans in the immediate future to promote this more for custom base images (ref https://gitlab.com/fedora/bootc/tracker/-/issues/32 and coreos/rpm-ostree#5221 specifically) Way back in coreos/rpm-ostree#4688 (comment) there was a soft commitment to support this. My understanding is that there are security implications (not mentioned in the commit message) that motivated 25a3b38 Can we elaborate on those now that the commit is public (I don't see a corresponding PR with discussion)? It seems to me that what we always wanted here is a way to write not to the build context, but just any internal filesystem (a tmpfs or a tempdir bind mount) whose lifetime is scoped to the entire build, right? |
The relevant CVE is GHSA-5vpc-35f4-r8w6 A core part of this was the ability to write to the host filesystem from mounts, which is why it was disabled - so what you desire is a large part of a high-severity CVE. I don't see us turning this back on. Adding an option with the understanding that enabling it is explicitly insecure and exposes the host to potential escape by malicious Dockerfile is a potential option, but that also feels very bad. |
Here's a trivial example of a "build" that actually just copies from a registry into a local oci: directory via skopeo - but the point is that an "oci" directory is something that any tool running in the container build can synthesize however it wants - without e.g. being subject to injected floating-timestamp content from buildah itself etc. but also more generally having total power over structuring the layout of the target container image.
|
Yes, agree adding an option isn't what we want here. Let's instead dig more into the final paragraph of my previous comment:
To flesh this out slightly...maybe we could use
Note if we take this step and formalize things more it actually gets a lot cleaner as the builder can know the ordering here and we can declare that the cached OCI directory is automatically consumed (since why would you leave it around)? |
Caches are not scoped to the lifetime of a given build. |
Yes I know - its lifetime is longer - which isn't needed here, but should still work. In my proposal you'd just end up with an empty directory/image as a cache; which actually we could detect that's the case and just discard. Do you have any alternative proposals? |
Maybe another possible fix would be to expand the lifetime of the overlayfs created from a I guess though a downside of this might be that it's different from what Docker is doing today, which could confuse people. But at the cost of getting more hacky, we could do that scope expansion only if we detect there's a |
I should emphasize here though again: our production builds use this feature today. Having a working version of this (and really ideally compatible with the existing Containerfile, although that wouldn't be a requirement) as relatively soon as possible (e.g. < 1 month) would be quite helpful. I think in the short term we can just hold off on applying the buildah update, but that's obviously not sustainable. Among other things, a big part of the entire point of this is that we can tell people the base image build is just "podman build" today, and after updating that no longer works. |
It looks like I guess the more I think about the more I'd say:
|
I would have (and have previously) suggested solving this at the pipeline level. Anyway, the reason the FROM fails is that the OCI archive is no longer being written to the actual build context directory by the previous stage, so it's literally not where the FROM instruction says it is. Unless it's a remote build being done via When we teach the |
It looks like that flag is there because the set of accepted flags for |
Can you elaborate on that? Previously in this comment you said:
In what we're doing here we aren't using buildah in any way because we want fine-grained control over the bit for bit output of the OCI archive. But your point more generally here I would phrase as "don't try to do it via Containerfile" - which definitely makes sense as a starting point, except I'd just reiterate that going from "podman build" to "run this script/Makefile" is a large leap. Actually today e.g. on MacOS (or more generally with podman-machine) this trick doesn't work; I didn't dig in but I think it has to do with us always copying the build context with podman-remote. But in theory, it could work on podman-remote/podman-machine cases, which is quite important because there's lots of infrastructure built on top of "podman-build". If we have a custom script/pipeline, then especially in podman-remote cases we'd have to do some careful juggling to make sure our build works there. Hmm...I guess in theory we could implement our build process as "spawn a container image with write access to containers-storage:" (or if one wants to isolate it more, "spawn a container imgae with write access to an empty tempdir, which at the end of the build should contain an oci directory with one image"). But we just went through a large effort to switch our Konflux builds over to the "stock" buildah task they expose. Reworking everywhere we do a build, from the documented local developer entrypoint to Konflux would not be a small thing for us. It's big enough that if I had to I'd rather spend that engineering time on implementing a more sustainable way to do this in buildah myself. |
Hmm...but here we're generating an entire OCI layout inside a build stage, we can't provide it as a build context at the start of a build. |
I'm pretty sure that at that point I was referring to "FROM oci-archive:", and if you're feeling charitable, depending on it.
Agreed, it's not as simple a story to tell.
I have personally tripped over the overlay driver's mount_program option having to be specified, or not specified, more than I'd like to have, to the point that I tend to want to avoid sharing the storage space. That's before you start worrying about different storage drivers. The |
Can you indicate level of support for the "short+medium term options" listed in this comment you have ranging from say "yes I'd implement" to "would take a patch" to "no"? Most especially the "short term" one of lifetime-extending only if we detect Actually let me just repeat it here, and expand/elaborate: Option lifetime-extend:
Advantages:
Option
|
Having just done the changes to make this an overlay, I can say that the number of places in the source code that this impacts would be high. I also don't relish creating more differences in behavior that a user has to remember and keep track of, so I would lean toward not doing this.
|
How would the syntax be incompatible? The leading |
It's more than a story, it's about the ecosystem. While there certainly are multiple build systems to build container images, Dockerfile is very, very popular as I think everyone here knows. And that means that actual infrastructure has been built on top of it. At Red Hat for example with Konflux there's been an investment in a few things like "hermetic" builds where the builds can't fetch arbitrary things from the network, only a restricted set of inputs. Also, there's policies around which container images can be used in tasks. Both of these things are today applied on top of Dockerfile/Containerfile (though there's also lockfiles for rpms and other source content needed). As soon as "build this project" becomes more freeform than that, it needs custom logic to match these types of policies. Another related thing is there are some parts of Dockerfile that function perfectly fine and don't need reinventing, such as e.g. The fact that we were able to fit custom container images into the "shape" of a Dockerfile and again continue to reuse existing things like |
There is no leading
The ideal for me here is an approach that doesn't depend on implementing custom syntax, and that doesn't break when we fix compliance bugs. |
Right, it would clearly make sense to try to get this into Docker at some point, though last I looked at that the ecosystem there is around buildx, which basically a whole lot of people who want to build containers in a more intelligent way are using. But maybe they'd be amenable to the One thing I realized in a conversation just now with the team is that actually because of this writable mount to the build context and some other reasons, we'd required the build to be run with So here's yet another, hopefully more targeted short term proposal: Change buildah to detect when the input build is privileged in this way (
or so |
For the longer term RFE, one idea I had in coreos/rpm-ostree#4688 (comment) was to add the concept of "builder images". This is not novel of course; it's similar to OpenShift S2I and Buildpaks. But having it built into buildah would generalize the rpm-ostree case to something much more impactful for the ecosystem. |
Since containers/buildah#5952 broke the Containerfile flow, let's add a workflow (that we wanted to support anyways) to operate outside of a container build. Signed-off-by: Colin Walters <[email protected]>
Since containers/buildah#5952 broke the Containerfile flow, let's add a workflow (that we wanted to support anyways) to operate outside of a container build. Signed-off-by: Colin Walters <[email protected]>
coreos/rpm-ostree#5268 will switch us over to more clearly supporting a flow that doesn't depend on this now removed feature. |
And as I am trying to document/test that PR, for the record a pain point I'm hitting that I think is illuminating in that I was recommending doing One thing I did in containers/bootc#919 that was some nontrivial code (that I can certainly rework to make usable here too) is dynamically mounting these things and not forcing the user to specify them. But even there I only needed to care about rootful; here it's a bit more complicated as really in the general case I want the code running in this container image to match the c/storage configuration of the host. I'm sure this has come up elsewhere? I guess most use cases of "build from a container" actually end up with a c/storage instance scoped to that container, isolated from the host storage; which we obviously want to support too, but would be annoying for many local dev cases. It's exactly this type of thing that we all just got For Free as being part of |
OK here's another proposal: Again since in this use case we already require privileges, we add a new |
Since containers/buildah#5952 broke the Containerfile flow, let's add a workflow (that we wanted to support anyways) to operate outside of a container build. Signed-off-by: Colin Walters <[email protected]>
We should probably split off longer term things into a distinct issue but it would seem to me to be strictly better to have a model where the Containerfile can control this and it's not something that needs to be passed via a new CLI argument. I see my proposal above of |
👍 |
Yes, for sure. But the point with the build models like S2I and Buildpaks is that you decouple your app code from the buildroot so they can be maintained independently/possibly by different organizations. So you wouldn't even have a Where it's useful for our case specifically is that there's less cargo-culting that needs to be done to get this working right. We can e.g. just implement the magic Of course, having built-in |
I am in favor of making the new security guard configurable. The CVE fix is valid but currently there is no alternative to the previous behavior. It can hit any user. So +1 from my end on the direction. I refrain from suggesting a better name :^ ) @nalind, how do you feel about it? |
I think that the ability to bypass "shouldn't be allowed to write to the build context" isn't the kind of thing we're supposed to be trusting a Containerfile's author on. |
Very fair point. If it had to be explicitly allowed by the build environment/builder/admin. |
I actually don't think the feature we desire (and were previously using) requires writing to the build context. IIUC all we really want to do is be able to take a file or directory generated in an earlier stage of a multi-stage build and slurp that into the Writing out that file to the build context and then using it in the form of I think Colin suggested something like this earlier:
I guess an example here would be something like: FROM quay.io/fedora/fedora-bootc:rawhide as rootfs
RUN dnf -y install htop
RUN bootc container lint
FROM quay.io/fedora/fedora-bootc:rawhide as builder
RUN --mount=from=rootfs,dst=/rootfs <<EORUN
rpm-ostree experimental compose build-chunked-oci --bootc --format-version=1 \
--rootfs=/rootfs --output /output/out.oci
EORUN
FROM oci://builder/output/out.oci
LABEL empty # not sure if this would be required or not Alernatively, something like this could work: FROM --mount=from=builder,src=/output/,dst=/output/ oci:/output/out.oci
LABEL empty # not sure if this would be required or not I think it's not too important the syntax (you container experts are probably better equipped to know what syntax would be ideal here). What is important (to us at least) is the use case. I know what we're describing here is a specialized use case. Maybe it's worth a new issue/RFE? |
I think I addressed this in my comment above - again we already require privileges (for other reasons) for this build type, so gating this capability on e.g. (But yes of course at a technical level there's no reason this should require privileges, but we're just hung up on a syntax to tell the Anyways...I have to say on one hand it's good that this happened now and not months from now, that would have been dramatically more painful. And supporting this outside of a Podman-specific Containerfile feature is something we had to do regardless. While we are doing that work to decouple, bottom line though it'd still be really, really useful to have a short term change to re-enable (again gated on privileges) and e.g. emitted a warning "this feature may be removed in a year" or something. |
I may be missing something; but narrowing the scope of this exact issue/RFE would be useful. What's not clear to me is if the feature we desire HERE actually requires privileges. The workflow I describe in my previous comment shouldn't. I guess another way to write the example (to remove the tooling from the equation, to try to add clarity): FROM quay.io/fedora/fedora:rawhide as curler
RUN curl https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/41.20250117.3.0/x86_64/fedora-coreos-41.20250117.3.0-ostree.x86_64.ociarchive > /output/out.ociarchive
FROM oci-archive://curler/output/out.ociarchive
LABEL empty # not sure if this would be required or not In this scenario, I don't see the need for privilege at all. Am I missing something? |
I don't think suggestions that require changing the details of how we interpret containers-transports(5) locations are going to be part of the solution here, particularly when we've already established that the current acceptance of them has to have a limited shelf life. |
OK. That's acceptable, but I feel like one of the options from #5952 (comment) may not invalidate that?
i.e. no new container-transports parsing required in that example, but support for a way to mount in the output from a previous stage.
hmm. I missed that part. Are you implying that |
They really shouldn't, no. Pathnames don't really make sense in remote builds. |
To do that, when we have to allow for multiple users simultaneously, some read-only, any writing, we have to mount the overlay over the build context directory before any stage accesses the directory tree in any way, so that ADD, COPY, and RUN instructions that read from the location reflect changes that have been made by RUN instructions that modify that location in different stages that could be running concurrently. Fixing up the handling of paths in certain FROM values is still necessary because "." is the directory from which we were invoked, and that's not where data being "written" to the build context will be, but there's still potential for problems there. #5975 tries to get this workflow going again. |
IIUC #5975 deals with the build context directory, which is how we were using this before, but taking a step back and looking at what we want to do wouldn't being able to
|
I think everyone agrees it would, I proposed something similar above. But the concern here is that it requires new syntax which doesn't exist in Dockerfile AIUI, see this comment and the one below. |
AlmaLinux is facing the same issue since podman version
|
quay.io/buildah/stable:v1.38.1
failed to build image withFROM oci-archive:./out.ociarchive
and got error:quay.io/buildah/stable:v1.38.0
does not have this issue.The text was updated successfully, but these errors were encountered: