Skip to content

Conversation

@xslingcn
Copy link
Collaborator

@xslingcn xslingcn commented Jan 17, 2026

Pre-compute and cache solution hash for reduced apply() overhead.

Summary by CodeRabbit

  • Refactor
    • Solution objects are now immutable after creation, improving stability.
    • Hash computation is memoized for faster repeated access.
    • Stable hashing and equality semantics added, enabling safe use in sets and as dictionary keys for reliable lookups and deduplication.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 17, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Solution is now immutable and precomputes a deterministic content hash at initialization. The hash is cached in a private attribute; hash() returns the cached value. __hash__ and __eq__ are implemented to use this cached content hash.

Changes

Cohort / File(s) Summary
Solution class immutability and hashing
flashinfer_bench/data/solution.py
Adds model_config: ConfigDict(use_attribute_docstrings=True, frozen=True), a private _hash_cache: PrivateAttr, model_post_init to initialize _hash_cache, a deterministic _compute_hash() that includes name, definition, spec.language, spec.entry_point, spec.dependencies, and all source file paths/contents, memoized hash() returning _hash_cache, and __hash__/__eq__. Imports Any, ConfigDict, PrivateAttr.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 I baked a hash and tucked it away,
Frozen in place to brighten the day,
One swift peek — no extra fray,
Cached and ready, come what may,
Hop, tiny bytes, and romp in play! 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: cache solution hash' directly and concisely summarizes the main change: implementing hash caching for the Solution model to reduce apply() overhead.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @xslingcn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a performance optimization by refactoring the Solution data model to pre-compute and cache its content hash. By making Solution instances immutable and storing the hash upon initialization, subsequent operations that require the solution's hash or involve comparing Solution objects will benefit from reduced computational overhead, leading to more efficient processing.

Highlights

  • Immutability for Solution Model: The Solution Pydantic model has been configured as immutable by setting frozen=True in model_config, ensuring that instances of Solution cannot be modified after creation.
  • Cached Solution Hash: A private attribute _hash_cache has been introduced to store the pre-computed hash of the solution content, avoiding redundant hash calculations.
  • Refactored Hash Computation: The logic for computing the solution's hash has been moved into a new private method _compute_hash, improving code organization.
  • Automatic Hash Caching: The model_post_init hook is now used to automatically compute and store the solution's hash in _hash_cache immediately after a Solution object is initialized.
  • Optimized Hashing and Equality: The __hash__ and __eq__ methods have been overridden to leverage the cached _hash_cache, providing faster object hashing and equality checks for Solution instances.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@vercel
Copy link
Contributor

vercel bot commented Jan 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
flashinfer-bench Ready Ready Preview, Comment Jan 18, 2026 2:27pm

Request Review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Solution model to pre-compute and cache its hash, which is a great optimization. The implementation correctly uses pydantic's frozen=True and model_post_init hook for this.

I've found a critical issue in the hash computation logic where some fields that affect the solution's behavior (destination_passing_style and target_hardware) are omitted. This could lead to incorrect caching and runtime errors. I've also suggested an improvement to the documentation of the hashing function to improve maintainability.

Please review the comments for details and suggestions.

Copy link
Collaborator

@Ubospica Ubospica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Providing hash cache is very helpful for avoiding dispatching overhead. Thanks for the contribution!

@Ubospica Ubospica changed the title refactor: cache solution hash feat: cache solution hash Jan 18, 2026
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@flashinfer_bench/data/solution.py`:
- Around line 215-216: There are two duplicate method declarations for def
hash(self) -> str: causing a syntax/IndentationError; remove the extra duplicate
so only one def hash(self) -> str: remains (keep the intended implementation
body under that single definition), ensure indentation is correct for the method
block and run the formatter/linters; look for the duplicate "hash" method in the
Solution class to remove the second declaration.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
flashinfer_bench/data/solution.py (1)

200-213: Add unambiguous boundaries in _compute_hash() input stream.

At Line 203–211, concatenating strings without separators can make distinct tuples produce the same byte stream (e.g., "ab","c" vs "a","bc"), which can lead to incorrect cache hits. Add a delimiter or length prefix per field.

💡 Suggested fix
     def _compute_hash(self) -> str:
         """Compute a deterministic hash of the solution content."""
         h = hashlib.sha1()
+        def _update(s: str) -> None:
+            b = s.encode()
+            h.update(len(b).to_bytes(8, "big"))
+            h.update(b)
         for s in (
             self.name,
             self.definition,
             self.spec.language,
             self.spec.entry_point,
             *self.spec.dependencies,
             *(part for src in self.sources for part in (src.path, src.content)),
         ):
-            h.update(s.encode())
+            _update(s)
 
         return h.hexdigest()
🤖 Fix all issues with AI agents
In `@flashinfer_bench/data/solution.py`:
- Around line 253-256: The __eq__ implementation in Solution currently returns
equality based solely on _hash_cache which can misreport distinct objects as
equal; update Solution.__eq__ to first check isinstance(other, Solution) then if
_hash_cache differs return False, but if hashes match perform a full content
comparison (e.g., compare the model payload/serialized representation and any
non-hashed metadata such as author and description) to ensure true equality
rather than relying only on _hash_cache.
♻️ Duplicate comments (1)
flashinfer_bench/data/solution.py (1)

215-247: Remove the duplicated docstring/return in hash().

Line 232–247 repeats the docstring and return, which is dead code and likely a merge artifact. This can also trigger lint warnings.

🧹 Suggested fix
     def hash(self) -> str:
         """Return the memoized deterministic hash of the solution content.
@@
         str
             A SHA1 hash (40 hex characters) uniquely identifying this solution's content.
         """
         return self._hash_cache
-        """Return the memoized deterministic hash of the solution content.
-
-        This hash is computed from all fields that affect the solution's behavior:
-        name, definition, language, entry point, dependencies, and all source file
-        paths and contents. This ensures that any meaningful change to the solution
-        results in a different hash.
-
-        The hash is used for caching build artifacts, allowing solutions with the same
-        hash to reuse the same cached build result.
-
-        Returns
-        -------
-        str
-            A SHA1 hash (40 hex characters) uniquely identifying this solution's content.
-        """
-        return self._hash_cache

Comment on lines +253 to +256
def __eq__(self, other: object) -> bool: # pragma: no cover - trivial wrapper
if not isinstance(other, Solution):
return NotImplemented
return self._hash_cache == other._hash_cache
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid equality based solely on hash.

At Line 253–256, __eq__ treats hash equality as content equality. Hash collisions (or non-hashed fields like author/description) could make distinct solutions compare equal. Consider comparing full model content after a hash match.

✅ Suggested fix
     def __eq__(self, other: object) -> bool:  # pragma: no cover - trivial wrapper
         if not isinstance(other, Solution):
             return NotImplemented
-        return self._hash_cache == other._hash_cache
+        if self._hash_cache != other._hash_cache:
+            return False
+        return self.model_dump() == other.model_dump()
🤖 Prompt for AI Agents
In `@flashinfer_bench/data/solution.py` around lines 253 - 256, The __eq__
implementation in Solution currently returns equality based solely on
_hash_cache which can misreport distinct objects as equal; update
Solution.__eq__ to first check isinstance(other, Solution) then if _hash_cache
differs return False, but if hashes match perform a full content comparison
(e.g., compare the model payload/serialized representation and any non-hashed
metadata such as author and description) to ensure true equality rather than
relying only on _hash_cache.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
flashinfer_bench/data/solution.py (1)

113-199: Nested models BuildSpec and SourceFile are not frozen; cached hash can go stale if they are mutated.

Solution uses ConfigDict(frozen=True), but BuildSpec and SourceFile inherit from BaseModelWithDocstrings which does not set frozen=True. This means fields like spec.target_hardware, spec.dependencies, and sources[i].content can be mutated post-init via methods like .append(), .pop(), or direct assignment, causing _hash_cache, __hash__, and __eq__ to become inconsistent.

While the codebase currently has no such mutations, this design is fragile and relies on developers avoiding mutations of nested objects. Either freeze BuildSpec and SourceFile by setting frozen=True in their model_config, or remove the cached hash optimization and compute hashes on-demand.

♻️ Duplicate comments (1)
flashinfer_bench/data/solution.py (1)

237-240: Avoid equality based solely on cached hash.

Hash equality can misclassify distinct solutions (collisions or fields not included in the hash). Consider a full content comparison after a hash match.

✅ Suggested fix
     def __eq__(self, other: object) -> bool:  # pragma: no cover - trivial wrapper
         if not isinstance(other, Solution):
             return NotImplemented
-        return self._hash_cache == other._hash_cache
+        if self._hash_cache != other._hash_cache:
+            return False
+        return self.model_dump() == other.model_dump()
🧹 Nitpick comments (1)
flashinfer_bench/data/solution.py (1)

200-213: Consider sorting dependencies and sources before hashing for order-independence.

If list order is not semantically meaningful, different ordering of the same content will produce different hashes and reduce cache hits. Sorting by dependency and source path makes the hash stable across equivalent inputs.

♻️ Proposed change
-        for s in (
+        for s in (
             self.name,
             self.definition,
             self.spec.language,
             self.spec.entry_point,
-            *self.spec.dependencies,
-            *(part for src in self.sources for part in (src.path, src.content)),
+            *sorted(self.spec.dependencies),
+            *(
+                part
+                for src in sorted(self.sources, key=lambda s: s.path)
+                for part in (src.path, src.content)
+            ),
         ):

Copy link
Collaborator

@Ubospica Ubospica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@Ubospica Ubospica merged commit 15c9b4a into main Jan 19, 2026
15 checks passed
@Ubospica Ubospica deleted the cache-hash branch January 19, 2026 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants