Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter known PBS release by the PBS-specific release tag #21710

Merged
merged 17 commits into from
Dec 11, 2024

Conversation

tdyas
Copy link
Contributor

@tdyas tdyas commented Dec 3, 2024

#21441 requests the ability to filter known versions of Python Build Standalone distribution by the PBS-specific release tag. This PR adds a new --python-build-standalone-release-constraints option to provide that filtering via a release constraint expression.

A constraint expression looks similar to Python build constraints; for example, >=20241201. Multiple constraints can be specified separated by commas; multiple constraints are AND'ed together.

The generator script for the known versions database has been updated to add a tag to each known version. The entire list as been regenerated.

Closes #21441.

Copy link
Contributor Author

@tdyas tdyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cburroughs: Any thoughts on how user-supplied PBS versions should have the release tag specified?

@@ -0,0 +1,73 @@
# Copyright 2024 Pants project contributors (see CONTRIBUTORS.md).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file could probably be moved to more library-like location in the source tree. For now, I'll just leave it here since this is the only user.

@tdyas
Copy link
Contributor Author

tdyas commented Dec 3, 2024

@cburroughs: Any thoughts on how user-supplied PBS versions should have the release tag specified?

We could probably parse it from the supplied URL. Of course, we could also probably parse out the Python version (which the user has to supply separately from the URL) as well at that point and not have the user specify it as they now must do.

@cburroughs
Copy link
Contributor

@cburroughs: Any thoughts on how user-supplied PBS versions should have the release tag specified?

We could probably parse it from the supplied URL. Of course, we could also probably parse out the Python version (which the user has to supply separately from the URL) as well at that point and not have the user specify it as they now must do.

Hmm, like in the case of using the

3.1.2|macos_x86_64|6d0f18cd84b918c7b3edd0203e75569e0c7caecb1367bbbe409b44e28514f5be|42813|https://<URL>

syntax? I'm not sure. These could plausibly be file:// urls today or something else that doesn't have the same pattern at all? Maybe if you want to use the release tags you have to follow the same pattern as upstream and parse it out of the url.

Some regex prior art: astral-sh/python-build-standalone#127 (comment)

@tdyas
Copy link
Contributor Author

tdyas commented Dec 4, 2024

syntax? I'm not sure. These could plausibly be file:// urls today or something else that doesn't have the same pattern at all? Maybe if you want to use the release tags you have to follow the same pattern as upstream and parse it out of the url.

The parsing could be best effort. If we can extract a release tag from the URL then we use it and match against it. If not, then we just do not apply the release tag filter to that PBS release (i.e., the filter always includes the release).

Copy link
Contributor

@cburroughs cburroughs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with this in example-python and verified I got different openssl versions (aka different PBS releases) so the basic structure looks good. I'm not sure how best to handle "unsatisfied" constraints. Like if one asks for something that does not exist like:

[python-build-standalone-python-provider]
release_constraints="==20240000"

It isn't an error, which may be surprising.

@tdyas
Copy link
Contributor Author

tdyas commented Dec 4, 2024

I played around with this in example-python and verified I got different openssl versions (aka different PBS releases) so the basic structure looks good. I'm not sure how best to handle "unsatisfied" constraints. Like if one asks for something that does not exist like:

[python-build-standalone-python-provider]
release_constraints="==20240000"

It isn't an error, which may be surprising.

If there are no user-supplied versions, then this should error since it should find no matching PBS releases (but will add a test for this behavior). But yeah, if we "accept" any user-supplied version without a parseable release tag, then non-conforming versions would be accepted.

Maybe Pants should warn or error (configurable?) if it cannot parse the release tag out of a PBS release URL?

@cburroughs
Copy link
Contributor

Maybe Pants should warn or error (configurable?) if it cannot parse the release tag out of a PBS release URL?

I think that would be consistent with most of the other Pants warn/error configuration knobs.

Although FWIW in this case I was messing around with the bundled json encoded versions, so I don't think that would be the cause of the ==does-not-exist issue.

@tdyas
Copy link
Contributor Author

tdyas commented Dec 9, 2024

f there are no user-supplied versions, then this should error since it should find no matching PBS releases (but will add a test for this behavior). But yeah, if we "accept" any user-supplied version without a parseable release tag, then non-conforming versions would be accepted.

Some of the PBS versions in versions_info.json did not have tag data added by the script so they were "accepted" as conforming. I will filter them out for now.

@tdyas
Copy link
Contributor Author

tdyas commented Dec 9, 2024

Latest commits makes sure all known versions have a tag to match against.

@tdyas tdyas marked this pull request as ready for review December 9, 2024 19:32
@tdyas tdyas requested review from benjyw and huonw December 9, 2024 19:32
@tdyas
Copy link
Contributor Author

tdyas commented Dec 9, 2024

Inference of the release tag is implemented in #21739 (which stacks on top of this PR).

Copy link
Contributor

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving the PBS provider!

tdyas and others added 2 commits December 9, 2024 18:16
@tdyas tdyas force-pushed the pbs/release-constraint branch from c65b0d7 to 69e2aee Compare December 10, 2024 16:36
@tdyas
Copy link
Contributor Author

tdyas commented Dec 10, 2024

Thanks for improving the PBS provider!

@huonw: I've updated the PR based on your comments. Please let me know if there is anything else!

Copy link
Contributor

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code itself looks great, but I have two potential bigger-picture things:

  • The PBS versions Pants has scraped are now explicitly part of the 'public API'.
  • We currently only record one release tag for each python version, so release_constraints are somewhat redundant with interpreter_constraints.

I think the current generate_urls.py script isn't super-careful about how it records assert versions: it looks like it seems to only include the most recent release for each version(?).

This may lead to Pants scraping new versions being a breaking change for some users.

For instance, it seems all of these releases include 3.12.7:

If we had scraped on 2024-10-10, we would record the 20241008 assets only (e.g. asset URLs like https://github.com/indygreg/python-build-standalone/releases/download/20241008/cpython-3.12.7%2B20241008-aarch64-unknown-linux-gnu-install_only_stripped.tar.gz).

Thus, someone using Pants 2.24 might write interpreter_constraints == ["==3.12.7"], release_constraints = "==20241008" (e.g. they identified that 20241002 and 20241016 had bugs).

Then, a while later, we re-run the scraping, which picks up 20241016 and thus Pants 2.25 only has those assets (e.g. https://github.com/indygreg/python-build-standalone/releases/download/20241016/cpython-3.12.7%2B20241016-aarch64-unknown-linux-gnu-install_only_stripped.tar.gz).

The user upgrading from Pants 2.24 to Pants 2.25 will experience breakage because release_constraints = "==20241008" will no longer be satisfied: there's no longer 3.12.7 assets with that tag.


To solve this, I think we can record more assets into versions_info.json: every single release for every single version.

However, I think this can and should be a follow-up issue and follow-up work building on the groundwork done in this PR: this provider is experimental for a reason! We don't need to get every change perfect before landing.

@tdyas
Copy link
Contributor Author

tdyas commented Dec 11, 2024

To solve this, I think we can record more assets into versions_info.json: every single release for every single version.

I agree. Good points about the problems which ensue if all of the release data is not recorded.

@tdyas tdyas merged commit 2ad1796 into pantsbuild:main Dec 11, 2024
24 checks passed
@tdyas tdyas deleted the pbs/release-constraint branch December 11, 2024 15:36
@huonw
Copy link
Contributor

huonw commented Dec 11, 2024

I filed #21748

tdyas added a commit that referenced this pull request Dec 18, 2024
…21776)

As first mentioned by @huonw in
#21710 (review)
and more fully discussed in
#21748, the addition of
filtering PBS releases by the PBS release tag means that users may now
see a breaking change if Pants scrapes newer versions of PBS releases,
replaces some of the existing "known versions" data with the
newly-scraped release metadata, and the user selects an older release
via a PBS release constraint.

In that case, the user would see an error because Pants would be unable
to select that specific PBS version anymore due to no longer having any
metadata regarding that PBS release tag.

The solution is for Pants to store metadata for _all_ PBS release tags.
Newer versions will only add to the PBS "known versions" data and not
replace nor delete any metadata. With all metadata available, a user who
selects a particular PBS release via the release constraints will still
continue to see that release be selected. And users who want the latest
matching PBS release will continue to get the latest PBS release (known
to Pants).

Closes #21748.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mechanism to control python-build-standalone "release" version
3 participants