Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buildfetch'd image.yaml always used instead of the one from the config #3616

Closed
mtalexan opened this issue Sep 14, 2023 · 9 comments · Fixed by #3618
Closed

buildfetch'd image.yaml always used instead of the one from the config #3616

mtalexan opened this issue Sep 14, 2023 · 9 comments · Fixed by #3618

Comments

@mtalexan
Copy link
Contributor

Bug Report

When doing a build where a prior buildfetch'd commit is present, the image.yaml from the fetched build in the builds folder overwrites the tmp/image.jsong that was generated from the current build right before it gets used.

Using buildfetch always results in the fetched image.yaml being used instead of the one from the config for the next build command run.

Environment

What operating system is being used to run coreos-assembler?
Container image quay.io/coreos-assembler/coreos-assembler:latest generated on 2023.09.11 (digest: 2d1ee8a77d31).

What operating system is being assembled?
Customized fedora-coreos-config variant.

Is coreos-assembler running in Podman or Docker?
Podman, though it's replicable in Docker as well.

If Podman, is coreos-assembler running privileged or unprivileged?
Privileged

Expected Behavior

The image.yaml from the current coreos config being built (merged on top of the image-default.yaml built into coreos-assembler to ensure it meets the minimum requirements) should be used for builds even when fetching a prior build.

Actual Behavior

The image.yaml from the fetched build overwrites the tmp/image.json generated from the current config right before it gets used.

Reproduction Steps

In the following, build ID 4cdf84b is a build from before the #3483 change added composefs to image-default.yaml and made it a mandatory key in the image.yaml.

  1. cosa init ....my-config...
  2. cosa buildfetch --artifact=ostree --url file:///...path.../artifacts_of_4cdf8fb
  3. cosa fetch
  4. cosa build --skip-config-archive --version=5b7b124.202309014163002 --parent=4cdf8fb
  5. See it fail with error:
Committing cosa-image-json: /srv/tmp/override/imagejson ... 8d747fe0544979ba75f72416f84a8b3d3792978740ce14577bbab6135cda6d8f
fatal: Unhandled composefs setting: null

Other Information

I've traced this issue specifically to the build command.

In cmd-build, line 163-171 determine that the a prior build exists in the form of the buildfetch'd build, which results in line 189 being run to import the ostree commit for the priot build. This leads to cmdlib.py import_ostree_commit being called, which calls extract_image_json to overwrite the tmp/image.json with the one from the imported ostree commit right before it returns.

Unfortunately this all happens in cmd-build after the prepare_build was called, which is what reads the config's image.yaml, applies it on top of image-default.yaml, and writes it to tmp/image.json.

@mtalexan
Copy link
Contributor Author

I'm working on the fix. cmd-build shouldn't extract the image.json from the prior build when it imports the prior ostree since we already generated the "correct" one from the config being built.

@mtalexan
Copy link
Contributor Author

mtalexan commented Sep 14, 2023

Better reproducer example:

  1. Pick any config
  2. Run a complete build and save the entire working directory somewhere
  3. Create a new working directory
  4. cosa init ... with the same config as before
  5. Make a change to the image.yaml of the config being built
  6. cosa buildfetch ...the other build...
  7. cosa fetch
  8. cosa build
  9. Examine the /usr/lib/coreos-assembler/image.json within the newly created ostree commit. and/or the working directory's tmp/image.json and/or tmp/override/imagejson/usr/lib/coreos-assembler/image.json
  10. Notice the change you made in step 5 is not present.

EDIT: Clarified that "Complete the build" means cosa fetch and cosa build

@dustymabe
Copy link
Member

dustymabe commented Sep 14, 2023

I posted this over in #3618 (comment) but I'll post it here too:

For the buildextend-* then, it works like an enforcement that the image.json/yaml being used is the one from the original build and not one that's been modified since.

I think we intentionally embed the image.json/image.yaml into the OSTree commit so that it's canonical for the later stages. i.e. the later stages are not intended to pick it up from the config but rather from the OCI Archive/OSTree commit.

If you are running cosa build ostree again in your buildfetched directory and it's telling you there is no new build then there is probably a bug in our our "no new build" detection logic.

TL;DR if you change the image.yaml, you need to cosa build ostree again to pick up the change and that's intentional.

@mtalexan
Copy link
Contributor Author

I posted this over in #3618 (comment) but I'll post it here too:

For the buildextend-* then, it works like an enforcement that the image.json/yaml being used is the one from the original build and not one that's been modified since.

I think we intentionally embed the image.json/image.yaml into the OSTree commit so that it's canonical for the later stages. i.e. the later stages are not intended to pick it up from the config but rather from the OCI Archive/OSTree commit.

If you are running cosa build ostree again in your buildfetched directory and it's telling you there is no new build then there is probably a bug in our our "no new build" detection logic.

TL;DR if you change the image.yaml, you need to cosa build ostree again to pick up the change and that's intentional.

The specific reproducer here is running cosa build ostree (as an implicit dependency when it calls cosa build) after calling buildfetch. That's actually where it fails.

Sorry I wasn't more clear about that, I've updated the reproducer example.

@dustymabe
Copy link
Member

OK looking at your reproducer steps:

  1. Pick any config
  2. Run a complete build and save the entire working directory somewhere
  3. Create a new working directory
  4. cosa init ... with the same config as before
  5. Make a change to the image.yaml of the config being built
  6. cosa buildfetch ...the other build...
  7. cosa fetch
  8. cosa build
  9. Examine the /usr/lib/coreos-assembler/image.json within the newly created ostree commit. and/or the working directory's tmp/image.json and/or tmp/override/imagejson/usr/lib/coreos-assembler/image.json
  10. Notice the change you made in step 5 is not present.

Are you sure that step 8 isn't skipping the build because there is nothing to do?

Can you change step 8 to cosa build --force and see if it still reproduces?

Also if you are doing a fresh build why do you need the buildfetch? Is it to preserve history (i.e. parent commit)?

@mtalexan
Copy link
Contributor Author

Are you sure that step 8 isn't skipping the build because there is nothing to do?

Yes, I'm sure it's doing the build, I added a bunch of debugging to the build to track down this problem and I can see it's generating the different/correct image.json file, and then overwriting it with the one loaded from the previous build (the buildfetch-d one).

Can you change step 8 to cosa build --force and see if it still reproduces?

Sure, I can change that to be sure, but it's a completely different config commit I'm building that has a modified manifest.yaml.

Also if you are doing a fresh build why do you need the buildfetch? Is it to preserve history (i.e. parent commit)?

Yup, exactly. The actual build command I want to run is cosa build --parent=<prior build id> but I simplified it for this reproducer.

@dustymabe
Copy link
Member

I'm able to reproduce this problem with something like:

mkdir fcos-test
cd fcos-test
cosa init
cosa buildfetch --stream=testing-devel --artifact=ostree
edit image-base.yaml and add `nameserver=8.8.8.8` to `extra-kargs:`
cosa fetch && cosa build --parent=38.20230926.20.0
cosa run
grep nameserver /proc/cmdline

The key part here is the buildfetch with --artifact=ostree. Strictly --artifact=ostree isn't necessary for this case which is why we've probably never observed this problem before.

@mtalexan
Copy link
Contributor Author

It turns out the --artifact=ostree part of the buildfetch is important to replicating this bug. (from discussion on Matrix)

@mtalexan
Copy link
Contributor Author

mtalexan commented Sep 28, 2023

So the functionality is that when you run buildfetch --artifact=ostree, it downloads the json metdata files for the build, and the ociarchive file for the build.

In the cmd-build* scripts, it calls the shared prepare_build function to set up the initial ostree repo and generate the image.json file from the image.yaml in the config and the image-default.yaml in the scripts (among other things). It then checks to see if there's an existing build in the builds/ folder and imports it if there is. This build can be either a "partial commit", which is metadata-only, or a full commit, which includes the ociarchive of the ostree commit. If it finds the full commit, it calls the python import_ostree_commit function to import it into the ostree repo, including extracting the image.json from it and overwriting the one that was generated.

This begs the question, why are we extracting the image.json from the ostree commit we imported?
Well it turns out that back in commit 816ebaed49d3e1efb2ba06e52969453ddad06d61 when the image.json was added to the ostree commit, the original design of the import_ostree_commit python function was that importing an ostree commit would restore everything in the working directory necessary to be able to rebuild that commit without the config. So given that design, the expected use would be to call it after creating the ostree repo to import it into, but nothing else.

This leaves us with a conundrum. The import_ostree_commit has to be called after the ostree repo is created, which is in prepare_build, but before the image.json (and some other files) are generated from the config, which is also in prepare_build.

The options therefore are:

  1. Remove the extraction of the image.json from the import_ostree_commit, changing the intent of the function to only what the name suggests.
  2. Put import_ostree_commit and associated logic in prepare_build
  3. Break up the prepare_build function so it does the setup and ostree repo creation separately from the image.json (etc) generation and call the import_ostree_commit between the two.
  4. Modify import_ostree_commit so we can tell it not to overwrite for cmd-build
  5. Make a copy of the image.json before doing the import_ostree_commit in cmd-build, and then restore it afterward the import is done.
  6. Modify import_ostree_commit so it can be told not to overwrite the image.json if it's already present.

Option 1 seems reasonable when looking only at cmd-build, no one actually appears to need the image.json extracted from the imported ostree commit when calling build, but in cmd-buildextend-metal (for example) there are comments explicitly expecting the image.json to be overwritten. The logic behind this in cmd-buildextend-metal is commented to say it wants to overwrite any changes that have happened in the config since build was run, which seems to suggest there's an expected use case of being able to do cosa buildfetch --artifacts=ostree && cosa buildextend-metal without any cosa build.

Option 2 doesn't really work, because some variables are collected during the import of the prior build that get used in unique ways later in some of the cmd-build* scripts. It would be difficult to get these variables back from the prepare_build function call.

Option 3 would be the correct way to fix the order of operations, but the prepare_build is used in a lot of scripts. It's also not completely clear whether it's reasonable to separate some generation logic (creating the ostree repo), from the rest of the generation logic (image.json, etc).

Option 4 is what I implemented in #3618. It's an ultra-low-impact partial version of Option 1, and allows cmd-build to tell the import_ostree_commit not to extract the image.json. This would still allow the buildfetch --artifacts=ostree && buildextend-* to work, but wouldn't let you do buildfetch --artifacts=ostree && build without a config present (which theoretically works currently?).

Option 5 is just an alternative for Option 4 that avoids changing common code.

Option 6 is Option 4, but allows the buildfetch --artifacts=ostree && build to still work without a config. It won't overwrite, which always causes problems, but it will create if there's nothing there. It would be most in line with the original design intent of import_ostree_commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants