feat: cache solution hash #153

xslingcn · 2026-01-17T07:21:20Z

Pre-compute and cache solution hash for reduced apply() overhead.

Summary by CodeRabbit

Refactor
- Solution objects are now immutable after creation, improving stability.
- Hash computation is memoized for faster repeated access.
- Stable hashing and equality semantics added, enabling safe use in sets and as dictionary keys for reliable lookups and deduplication.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-17T07:21:29Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Solution is now immutable and precomputes a deterministic content hash at initialization. The hash is cached in a private attribute; hash() returns the cached value. __hash__ and __eq__ are implemented to use this cached content hash.

Changes

Cohort / File(s)	Summary
Solution class immutability and hashing `flashinfer_bench/data/solution.py`	Adds `model_config: ConfigDict(use_attribute_docstrings=True, frozen=True)`, a private `_hash_cache: PrivateAttr`, `model_post_init` to initialize `_hash_cache`, a deterministic `_compute_hash()` that includes `name`, `definition`, `spec.language`, `spec.entry_point`, `spec.dependencies`, and all source file paths/contents, memoized `hash()` returning `_hash_cache`, and `__hash__`/`__eq__`. Imports `Any`, `ConfigDict`, `PrivateAttr`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 I baked a hash and tucked it away,
Frozen in place to brighten the day,
One swift peek — no extra fray,
Cached and ready, come what may,
Hop, tiny bytes, and romp in play! 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: cache solution hash' directly and concisely summarizes the main change: implementing hash caching for the Solution model to reduce apply() overhead.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-01-17T07:21:36Z

Summary of Changes

Hello @xslingcn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a performance optimization by refactoring the Solution data model to pre-compute and cache its content hash. By making Solution instances immutable and storing the hash upon initialization, subsequent operations that require the solution's hash or involve comparing Solution objects will benefit from reduced computational overhead, leading to more efficient processing.

Highlights

Immutability for Solution Model: The Solution Pydantic model has been configured as immutable by setting frozen=True in model_config, ensuring that instances of Solution cannot be modified after creation.
Cached Solution Hash: A private attribute _hash_cache has been introduced to store the pre-computed hash of the solution content, avoiding redundant hash calculations.
Refactored Hash Computation: The logic for computing the solution's hash has been moved into a new private method _compute_hash, improving code organization.
Automatic Hash Caching: The model_post_init hook is now used to automatically compute and store the solution's hash in _hash_cache immediately after a Solution object is initialized.
Optimized Hashing and Equality: The __hash__ and __eq__ methods have been overridden to leverage the cached _hash_cache, providing faster object hashing and equality checks for Solution instances.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

vercel · 2026-01-17T07:21:45Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
flashinfer-bench	Ready	Preview, Comment	Jan 18, 2026 2:27pm

gemini-code-assist

Code Review

This pull request refactors the Solution model to pre-compute and cache its hash, which is a great optimization. The implementation correctly uses pydantic's frozen=True and model_post_init hook for this.

I've found a critical issue in the hash computation logic where some fields that affect the solution's behavior (destination_passing_style and target_hardware) are omitted. This could lead to incorrect caching and runtime errors. I've also suggested an improvement to the documentation of the hashing function to improve maintainability.

Please review the comments for details and suggestions.

flashinfer_bench/data/solution.py

Ubospica

Providing hash cache is very helpful for avoiding dispatching overhead. Thanks for the contribution!

flashinfer_bench/data/solution.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@flashinfer_bench/data/solution.py`:
- Around line 215-216: There are two duplicate method declarations for def
hash(self) -> str: causing a syntax/IndentationError; remove the extra duplicate
so only one def hash(self) -> str: remains (keep the intended implementation
body under that single definition), ensure indentation is correct for the method
block and run the formatter/linters; look for the duplicate "hash" method in the
Solution class to remove the second declaration.

flashinfer_bench/data/solution.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

flashinfer_bench/data/solution.py (1)

200-213: Add unambiguous boundaries in _compute_hash() input stream.

At Line 203–211, concatenating strings without separators can make distinct tuples produce the same byte stream (e.g., "ab","c" vs "a","bc"), which can lead to incorrect cache hits. Add a delimiter or length prefix per field.

💡 Suggested fix

     def _compute_hash(self) -> str:
         """Compute a deterministic hash of the solution content."""
         h = hashlib.sha1()
+        def _update(s: str) -> None:
+            b = s.encode()
+            h.update(len(b).to_bytes(8, "big"))
+            h.update(b)
         for s in (
             self.name,
             self.definition,
             self.spec.language,
             self.spec.entry_point,
             *self.spec.dependencies,
             *(part for src in self.sources for part in (src.path, src.content)),
         ):
-            h.update(s.encode())
+            _update(s)
 
         return h.hexdigest()

🤖 Fix all issues with AI agents

In `@flashinfer_bench/data/solution.py`:
- Around line 253-256: The __eq__ implementation in Solution currently returns
equality based solely on _hash_cache which can misreport distinct objects as
equal; update Solution.__eq__ to first check isinstance(other, Solution) then if
_hash_cache differs return False, but if hashes match perform a full content
comparison (e.g., compare the model payload/serialized representation and any
non-hashed metadata such as author and description) to ensure true equality
rather than relying only on _hash_cache.

♻️ Duplicate comments (1)

flashinfer_bench/data/solution.py (1)

215-247: Remove the duplicated docstring/return in hash().

Line 232–247 repeats the docstring and return, which is dead code and likely a merge artifact. This can also trigger lint warnings.

🧹 Suggested fix

     def hash(self) -> str:
         """Return the memoized deterministic hash of the solution content.
@@
         str
             A SHA1 hash (40 hex characters) uniquely identifying this solution's content.
         """
         return self._hash_cache
-        """Return the memoized deterministic hash of the solution content.
-
-        This hash is computed from all fields that affect the solution's behavior:
-        name, definition, language, entry point, dependencies, and all source file
-        paths and contents. This ensures that any meaningful change to the solution
-        results in a different hash.
-
-        The hash is used for caching build artifacts, allowing solutions with the same
-        hash to reuse the same cached build result.
-
-        Returns
-        -------
-        str
-            A SHA1 hash (40 hex characters) uniquely identifying this solution's content.
-        """
-        return self._hash_cache

coderabbitai · 2026-01-18T14:28:37Z

flashinfer_bench/data/solution.py

+    def __eq__(self, other: object) -> bool:  # pragma: no cover - trivial wrapper
+        if not isinstance(other, Solution):
+            return NotImplemented
+        return self._hash_cache == other._hash_cache


⚠️ Potential issue | 🟠 Major

Avoid equality based solely on hash.

At Line 253–256, __eq__ treats hash equality as content equality. Hash collisions (or non-hashed fields like author/description) could make distinct solutions compare equal. Consider comparing full model content after a hash match.

✅ Suggested fix

def __eq__(self, other: object) -> bool: # pragma: no cover - trivial wrapper if not isinstance(other, Solution): return NotImplemented - return self._hash_cache == other._hash_cache + if self._hash_cache != other._hash_cache: + return False + return self.model_dump() == other.model_dump()

🤖 Prompt for AI Agents

In `@flashinfer_bench/data/solution.py` around lines 253 - 256, The __eq__ implementation in Solution currently returns equality based solely on _hash_cache which can misreport distinct objects as equal; update Solution.__eq__ to first check isinstance(other, Solution) then if _hash_cache differs return False, but if hashes match perform a full content comparison (e.g., compare the model payload/serialized representation and any non-hashed metadata such as author and description) to ensure true equality rather than relying only on _hash_cache.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

flashinfer_bench/data/solution.py (1)

113-199: Nested models BuildSpec and SourceFile are not frozen; cached hash can go stale if they are mutated.

Solution uses ConfigDict(frozen=True), but BuildSpec and SourceFile inherit from BaseModelWithDocstrings which does not set frozen=True. This means fields like spec.target_hardware, spec.dependencies, and sources[i].content can be mutated post-init via methods like .append(), .pop(), or direct assignment, causing _hash_cache, __hash__, and __eq__ to become inconsistent.

While the codebase currently has no such mutations, this design is fragile and relies on developers avoiding mutations of nested objects. Either freeze BuildSpec and SourceFile by setting frozen=True in their model_config, or remove the cached hash optimization and compute hashes on-demand.

♻️ Duplicate comments (1)

flashinfer_bench/data/solution.py (1)
237-240: Avoid equality based solely on cached hash.

Hash equality can misclassify distinct solutions (collisions or fields not included in the hash). Consider a full content comparison after a hash match.
✅ Suggested fix
     def __eq__(self, other: object) -> bool:  # pragma: no cover - trivial wrapper
         if not isinstance(other, Solution):
             return NotImplemented
-        return self._hash_cache == other._hash_cache
+        if self._hash_cache != other._hash_cache:
+            return False
+        return self.model_dump() == other.model_dump()

🧹 Nitpick comments (1)

flashinfer_bench/data/solution.py (1)

200-213: Consider sorting dependencies and sources before hashing for order-independence.

If list order is not semantically meaningful, different ordering of the same content will produce different hashes and reduce cache hits. Sorting by dependency and source path makes the hash stable across equivalent inputs.

♻️ Proposed change

-        for s in (
+        for s in (
             self.name,
             self.definition,
             self.spec.language,
             self.spec.entry_point,
-            *self.spec.dependencies,
-            *(part for src in self.sources for part in (src.path, src.content)),
+            *sorted(self.spec.dependencies),
+            *(
+                part
+                for src in sorted(self.sources, key=lambda s: s.path)
+                for part in (src.path, src.content)
+            ),
         ):

Ubospica

LGTM. Thanks!

xslingcn added 2 commits January 16, 2026 23:18

upd

cc8fc39

upd

d400ed3

vercel bot deployed to Preview – flashinfer-bench January 17, 2026 07:21 View deployment

gemini-code-assist bot reviewed Jan 17, 2026

View reviewed changes

flashinfer_bench/data/solution.py Show resolved Hide resolved

Ubospica reviewed Jan 18, 2026

View reviewed changes

flashinfer_bench/data/solution.py Outdated Show resolved Hide resolved

Ubospica changed the title ~~refactor: cache solution hash~~ feat: cache solution hash Jan 18, 2026

Update flashinfer_bench/data/solution.py

e330cd2

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

vercel bot deployed to Preview – flashinfer-bench January 18, 2026 13:56 View deployment

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

flashinfer_bench/data/solution.py Outdated Show resolved Hide resolved

xslingcn mentioned this pull request Jan 18, 2026

[Tracking] Reduce apply() overhead & improve Adapter usability #115

Open

Update flashinfer_bench/data/solution.py

8977467

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

vercel bot deployed to Preview – flashinfer-bench January 18, 2026 14:25 View deployment

upd

812fe01

vercel bot deployed to Preview – flashinfer-bench January 18, 2026 14:27 View deployment

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

Ubospica approved these changes Jan 19, 2026

View reviewed changes

Ubospica merged commit 15c9b4a into main Jan 19, 2026
15 checks passed

Ubospica deleted the cache-hash branch January 19, 2026 07:17

feat: cache solution hash #153

feat: cache solution hash #153

Uh oh!

Conversation

xslingcn commented Jan 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist bot commented Jan 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

vercel bot commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Ubospica left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Ubospica left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xslingcn commented Jan 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 17, 2026 •

edited

Loading

vercel bot commented Jan 17, 2026 •

edited

Loading