Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo is unnecessarily large #158

Open
m3dwards opened this issue Nov 7, 2023 · 7 comments
Open

Repo is unnecessarily large #158

m3dwards opened this issue Nov 7, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@m3dwards
Copy link
Contributor

m3dwards commented Nov 7, 2023

I noticed while creating branches that it was taking a while and after a quick look it seems it's because the repo is 247mb.

I ran the following commands to list the largest blobs and it looks like some builds were accidentally committed early on:

git rev-list --objects --all |
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
  sed -n 's/^blob //p' |
  sort --numeric-sort --key=2
# for example this blob was added and deleted a minute later
git whatchanged --all --find-object=6507a7347f3b151262807d43af4114d287b0d446

The following is a SO comment and post that discusses techniques for removing blobs from history: https://stackoverflow.com/questions/2100907/how-to-remove-delete-a-large-file-from-commit-history-in-the-git-repository/61602985#61602985

As these files appeared to have been committed and pushed in error I would support their removal from the history.

@m3dwards m3dwards added the bug Something isn't working label Nov 7, 2023
@carlaKC
Copy link
Contributor

carlaKC commented Nov 8, 2023

cc @okjodom @sr-gi, I think it's worthwhile doing a once off cleanup?

@sr-gi
Copy link
Member

sr-gi commented Nov 8, 2023

I do agree. It's not worth having an unnecessary big repo because of files that were pushed on an accident

@okjodom
Copy link
Collaborator

okjodom commented Nov 8, 2023

+1 on cleanup

@okjodom
Copy link
Collaborator

okjodom commented Nov 8, 2023

What do you think of an interactive rebase to drop PRs #9 and #62 ?

@sr-gi
Copy link
Member

sr-gi commented Nov 8, 2023

That goes over my head git-wise, but I'll be ok with doing so if possible

@okjodom okjodom self-assigned this Nov 8, 2023
@okjodom
Copy link
Collaborator

okjodom commented Nov 8, 2023

having a go at it

@okjodom
Copy link
Collaborator

okjodom commented Nov 8, 2023

I just experimented with this on a fresh clone of the repo

Interactive rebase to remove commits 72c4f11 then b87a0ae .. 1a75d06, followed by further rewrite to remove associated blobs was my starting step.

git rebase --interactive 4086f94` to drop `b87a0ae` .. `1a75d06` and `72c4f11

For blob clean up, git-filter-repo from the Stack Overflow thread work effectively. From the SO discussion, this tool provides the same capabilities as git filter-branch

  • to remove activity-generator blobs
    python3 git-filter-repo --invert-paths --path-match activity-generator --force
  • to remove js blobs
    python3 git-filter-repo --invert-paths --path-match js --force

This results in blob set
2.blobs.after.txt

whereas before, the list of blobs was
2.blobs.before.txt

I used the original rev-list command to list blobs in repo

git rev-list --objects --all |
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
  sed -n 's/^blob //p' |
  sort --numeric-sort --key=2 | file.txt

From here, I'm not sure how we'd pus this revised history to upstream and get forks, clones, to receive the same.

@okjodom okjodom removed their assignment Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants