Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace GitPython with pygit2 #2120

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Replace GitPython with pygit2 #2120

wants to merge 13 commits into from

Conversation

mgorny
Copy link

@mgorny mgorny commented Nov 6, 2024

Checklist

  • Added a news entry
  • Regenerated schema JSON if schema altered (python conda_smithy/schema.py)

Replace the use of GitPython package with pygit2. The latter seems to have better git support, in particular it supports the newer index versions 3 and 4. Since it is backed by the libgit2 library that is also used by Cargo, it seems to have the best chances of being updated for compatibility with new git versions.

Admittedly, the API feels very low-level. In particular, it is necessary to explicitly request writing changes to index back, and explicitly reread it when it's modified externally (e.g. via another pygit2.Repository instance, as in tests). On the plus side, it does not invoke git at all -- everything is done by the library.

Fixes #2116


So far focused on feedstock_io.py and its tests. I need to figure out how to test the changes to other files properly, given that the tests mock the entire git.Repo.clone_from call.

@mgorny mgorny requested a review from a team as a code owner November 6, 2024 16:16
@mgorny mgorny marked this pull request as draft November 6, 2024 16:16
@mgorny mgorny force-pushed the pygit2 branch 2 times, most recently from f6dadb8 to e90a067 Compare November 6, 2024 16:21
conda_smithy/feedstock_io.py Outdated Show resolved Hide resolved
Replace the use of GitPython package with pygit2.  The latter seems
to have better git support, in particular it supports the newer index
versions 3 and 4.  Since it is backed by the libgit2 library that is
also used by Cargo, it seems to have the best chances of being updated
for compatibility with new git versions.

Admittedly, the API feels very low-level.  In particular, it is
necessary to explicitly request writing changes to index back,
and explicitly reread it when it's modified externally (e.g. via another
`pygit2.Repository` instance, as in tests).  On the plus side, it does
not invoke `git` at all -- everything is done by the library.

Fixes conda-forge#2116
Remove the `search_parent_directories` kwarg that's never been used,
and instead always enable searching parent directories for better
cross-version pygit2 compatibility.
@mgorny mgorny marked this pull request as ready for review November 11, 2024 14:20
Copy link
Member

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an API change and is not allowed under semantic versioning. Please put back this functionality.

@mgorny
Copy link
Author

mgorny commented Nov 11, 2024

Restored, and instead added backwards compatibility for pygit2 < 1.14.

@mgorny
Copy link
Author

mgorny commented Nov 11, 2024

@beckermr, another use is in conda_smithy.feedstocks.feedstocks_repos() function. This function is used only in conda_smithy.feedstocks.feedstocks_yaml(), and FWICS the latter isn't exposed anywhere in the CLI. On top of that, feedstocks_repos() is currently broken:

In [10]: list(conda_smithy.feedstocks.feedstocks_repos(None, "/home/mgorny/git/conda"))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[10], line 1
----> 1 list(conda_smithy.feedstocks.feedstocks_repos(None, "/home/mgorny/git/conda"))

File ~/git/conda-smithy/conda_smithy/feedstocks.py:197, in feedstocks_repos(organization, feedstocks_directory, pull_up_to_date, randomise, regexp)
    195 for feedstock in feedstocks:
    196     repo = git.Repo(feedstock.directory)
--> 197     upstream = repo.remotes.upstream
    199     if pull_up_to_date:
    200         print("Fetching ", feedstock.package)

File ~/miniforge3/envs/conda-smithy/lib/python3.12/site-packages/git/util.py:1198, in IterableList.__getattr__(self, attr)
   1196         return item
   1197 # END for each item
-> 1198 return list.__getattribute__(self, attr)

AttributeError: 'IterableList' object has no attribute 'upstream'

FWICS git.Repo.remotes was always a list, so I don't think this code ever worked. Should I migrate it and fix it, or remove both this function and feedstocks_yaml()?

@beckermr
Copy link
Member

We need to migrate and fit.

@mgorny
Copy link
Author

mgorny commented Nov 11, 2024

Ah, sorry — I was wrong, it wasn't broken. I've just realized it expected a remote called upstream, my bad.

environment.yml Outdated Show resolved Hide resolved
@mgorny
Copy link
Author

mgorny commented Nov 11, 2024

Another question: do we need SSH support for feedstock_tokens? FWICS the default is to use the repository over HTTPS. If we were to support SSH as well, there are two snags:

  1. libgit2 on Conda has a bug that results in SSH support being disabled. I've filed Actually enabling SSH support while building libgit2-feedstock#80 to fix that.
  2. Since libgit2 is quite low-level by design, SSH support requires a bit of effort. I'm not sure how much yet, but the example is a bit worrying.

Of course, another option is to call git directly for the push (much like GitPython does).

@beckermr
Copy link
Member

Anything that is currently supported in the code needs to be supported in this PR. We cannot have API changes like this.

@mgorny
Copy link
Author

mgorny commented Nov 11, 2024

Anything that is currently supported in the code needs to be supported in this PR. We cannot have API changes like this.

Is it okay to call git push as a subprocess then? That would guarantee the best compatibility (and be pretty much what GitPython did anyway). There's also a matter of whether to call git commit, or reimplement getting author/committer name from git config.

@beckermr
Copy link
Member

Subprocesses are fine, but we should be mindful of how complex this might get. The point of this PR is to support new git index versions. IDK anything about those. Are these in use? How have we not hit bugs for this before new?

@mgorny
Copy link
Author

mgorny commented Nov 11, 2024

Subprocesses will probably be less complex than doing everything via libgit2. Another option is to stick to GitPython for some of the code, at least until it actually breaks for someone.

From what I understand, to hit this issue you need to use new enough git to clone the repository. It is also possible that the repository itself must have some characteristics that actually trigger the use of new index format. Maybe I was just unlucky that the first feedstock that I've cloned triggered this, or it is possible that more people will hit this as they upgrade their git to new versions and clone new feedstocks.

@beckermr
Copy link
Member

I'd rather have one tool used and defer to subprocesses. Let's use a subprocess.

@mgorny
Copy link
Author

mgorny commented Nov 11, 2024

I've updated the remaining code. However, I still need to update the tests. I'll do that tomorrow. However, feel free to point out if you don't like some of the changes and would prefer a different approach.

)
if search_parent_directories:
path = pygit2.discover_repository(path)
if path is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if path is none? Should we raise?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original behavior was to return repo, i.e. None then, and I've preserved that. In other words, if anything fails (pygit2 is not installed, there is no repo, opening repo fails for some reason), the function returns None, as it used to.

Comment on lines +89 to +90
clone = pygit2.clone_repository(repo.clone_url, clone_directory)
clone.remotes.delete("origin")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So wait, now we can clone? I thought we were using git subprocesses for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, libgit2 can clone directly. However, as noted in the PR discussion this (as implemented here) works for public HTTPS repos only. Using libgit2 to clone via SSH would require fixed libgit2 on conda and explicit credential handling which would be a lot of code and not 100% equivalent to external git anyway.

@jaimergp
Copy link
Member

I think we just need a news item now!

@mgorny
Copy link
Author

mgorny commented Nov 12, 2024

Ok, I'm done updating tests. I think the code is ready now.

@jaimergp
Copy link
Member

Looks like some tests are still using gitpython @mgorny

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

conda smithy rerender -c auto failing with AssertionError on git repositories with index v3
3 participants