Skip to content

postprocess: add --gc-cache to garbage collect any cache entries not used since the last --gc-cache call#1554

Open
kkysen wants to merge 5 commits intomasterfrom
kkysen/postprocess-gc-cache
Open

postprocess: add --gc-cache to garbage collect any cache entries not used since the last --gc-cache call#1554
kkysen wants to merge 5 commits intomasterfrom
kkysen/postprocess-gc-cache

Conversation

@kkysen
Copy link
Contributor

@kkysen kkysen commented Jan 20, 2026

When there are input changes (like transpiler, refactorer, prompt, etc. changes), the cache becomes outdated and must be recalculated, but the previous entries aren't deleted. This tries to solve that. It tracks which cache entries are still actively tested/used (this is always done in llm-cache/.gc), and then --gc-cache deletes everything else. So the normal intended usage is to:

  • Run rm -f llm-cache/.gc.
  • Run all tests, updating the cache with new entries.
  • Run with --gc-cache to remove the outdated, unused entries.

I tried to do this in a simple way, but it does seem pretty necessary once testing is added to CI and others will need to do the same on their own, instead of me manually deleting the right outdated cache entries. If there are better/simpler ways to do this, that would also be great.

@kkysen kkysen force-pushed the kkysen/postprocess-fail-slow branch 2 times, most recently from f28d26b to a97239b Compare January 26, 2026 23:53
brk pushed a commit to Aarno-Labs/c2rust that referenced this pull request Jan 29, 2026
brk pushed a commit to Aarno-Labs/c2rust that referenced this pull request Jan 29, 2026
@kkysen kkysen force-pushed the kkysen/postprocess-fail-slow branch from a97239b to 6fc147a Compare February 13, 2026 10:04
@kkysen kkysen force-pushed the kkysen/postprocess-gc-cache branch from 2612c20 to ae20c1e Compare February 13, 2026 10:04
Copy link
Contributor Author

@kkysen kkysen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've detached --gc-cache from CommentTransferOptions now (it was never really needed there) and switched to an mtime-based version, but I haven't gotten around to fully separating --gc-cache into a separate script yet, which requires some more work, as part of the script will have to be deduplicated/set up to be imported.

@kkysen kkysen force-pushed the kkysen/postprocess-gc-cache branch from ae20c1e to 8d0f2ee Compare February 20, 2026 22:30
@kkysen kkysen changed the base branch from kkysen/postprocess-fail-slow to master February 20, 2026 22:30
@kkysen kkysen force-pushed the kkysen/postprocess-gc-cache branch from 8d0f2ee to 4efc99c Compare February 20, 2026 22:40
Copy link
Contributor Author

@kkysen kkysen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now independent of #1553 and rebased on master, so it can merge independently.

Also, the mtime and ctime-based implementation was broken because apparently on Linux, ctime is change time, not creation time, so I've switched to a purely mtime-based implementation now and tested it.

@kkysen kkysen force-pushed the kkysen/postprocess-gc-cache branch from 4efc99c to f67295e Compare February 20, 2026 22:42
@kkysen kkysen force-pushed the kkysen/postprocess-gc-cache branch 2 times, most recently from 3421a3c to 4c3b60a Compare February 20, 2026 23:03
Comment on lines +156 to +157
if args.gc_cache:
cache.gc_sweep()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to separate this into a separate script, we'll have to separate out this cache.gc_sweep() call, the cache creation (cache = getattr(DirectoryCache, args.cache_scope)()), and thus args.cache_scope as well. So we'd have to duplicate a bunch of stuff here, or refactor things such that we can import them and only them (like only --cache-scope, not the other arguments from build_arg_parser).

…t used since the last `--gc-cache` call

When there are input changes (like transpiler, refactorer, prompt, etc. changes),
the cache becomes outdated and must be recalculated, but the previous entries aren't deleted.
This tries to solve that.  It tracks which cache entries are still actively tested/used
(this is always done in `llm-cache/.gc`), and then `--gc-cache` deletes everything else.
So the normal intended usage is to:
* Run `rm -f llm-cache/.gc`.
* Run all tests, updating the cache with new entries.
* Run with `--gc-cache` to remove the outdated, unused entries.
Instead of storing paths in `.gc`, `.gc` is empty and just stores an mtime,
before which everything should be deleted.
When there's a cache hit, update the mtime,
and use that to know which files are newer than `.gc` and should be kept.

Note that `.gc` is only created once when it doesn't exist
and then never touched/modified again, as we use its mtime.
ctime is change time on Linux, not creation time, so we can't use that.
…tries

I didn't realize we had unused entries, but these seem to be legitimately unused.
The tests (`pytest` and `json-c`) still work without an LLM afterward.
@kkysen kkysen force-pushed the kkysen/postprocess-gc-cache branch from 8d1554c to 7a4ca7f Compare February 26, 2026 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants