-
Notifications
You must be signed in to change notification settings - Fork 17
feat: cache solution hash #153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughSolution is now immutable and precomputes a deterministic content hash at initialization. The hash is cached in a private attribute; Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @xslingcn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a performance optimization by refactoring the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the Solution model to pre-compute and cache its hash, which is a great optimization. The implementation correctly uses pydantic's frozen=True and model_post_init hook for this.
I've found a critical issue in the hash computation logic where some fields that affect the solution's behavior (destination_passing_style and target_hardware) are omitted. This could lead to incorrect caching and runtime errors. I've also suggested an improvement to the documentation of the hashing function to improve maintainability.
Please review the comments for details and suggestions.
Ubospica
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Providing hash cache is very helpful for avoiding dispatching overhead. Thanks for the contribution!
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@flashinfer_bench/data/solution.py`:
- Around line 215-216: There are two duplicate method declarations for def
hash(self) -> str: causing a syntax/IndentationError; remove the extra duplicate
so only one def hash(self) -> str: remains (keep the intended implementation
body under that single definition), ensure indentation is correct for the method
block and run the formatter/linters; look for the duplicate "hash" method in the
Solution class to remove the second declaration.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
flashinfer_bench/data/solution.py (1)
200-213: Add unambiguous boundaries in_compute_hash()input stream.At Line 203–211, concatenating strings without separators can make distinct tuples produce the same byte stream (e.g.,
"ab","c"vs"a","bc"), which can lead to incorrect cache hits. Add a delimiter or length prefix per field.💡 Suggested fix
def _compute_hash(self) -> str: """Compute a deterministic hash of the solution content.""" h = hashlib.sha1() + def _update(s: str) -> None: + b = s.encode() + h.update(len(b).to_bytes(8, "big")) + h.update(b) for s in ( self.name, self.definition, self.spec.language, self.spec.entry_point, *self.spec.dependencies, *(part for src in self.sources for part in (src.path, src.content)), ): - h.update(s.encode()) + _update(s) return h.hexdigest()
🤖 Fix all issues with AI agents
In `@flashinfer_bench/data/solution.py`:
- Around line 253-256: The __eq__ implementation in Solution currently returns
equality based solely on _hash_cache which can misreport distinct objects as
equal; update Solution.__eq__ to first check isinstance(other, Solution) then if
_hash_cache differs return False, but if hashes match perform a full content
comparison (e.g., compare the model payload/serialized representation and any
non-hashed metadata such as author and description) to ensure true equality
rather than relying only on _hash_cache.
♻️ Duplicate comments (1)
flashinfer_bench/data/solution.py (1)
215-247: Remove the duplicated docstring/return inhash().Line 232–247 repeats the docstring and return, which is dead code and likely a merge artifact. This can also trigger lint warnings.
🧹 Suggested fix
def hash(self) -> str: """Return the memoized deterministic hash of the solution content. @@ str A SHA1 hash (40 hex characters) uniquely identifying this solution's content. """ return self._hash_cache - """Return the memoized deterministic hash of the solution content. - - This hash is computed from all fields that affect the solution's behavior: - name, definition, language, entry point, dependencies, and all source file - paths and contents. This ensures that any meaningful change to the solution - results in a different hash. - - The hash is used for caching build artifacts, allowing solutions with the same - hash to reuse the same cached build result. - - Returns - ------- - str - A SHA1 hash (40 hex characters) uniquely identifying this solution's content. - """ - return self._hash_cache
| def __eq__(self, other: object) -> bool: # pragma: no cover - trivial wrapper | ||
| if not isinstance(other, Solution): | ||
| return NotImplemented | ||
| return self._hash_cache == other._hash_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid equality based solely on hash.
At Line 253–256, __eq__ treats hash equality as content equality. Hash collisions (or non-hashed fields like author/description) could make distinct solutions compare equal. Consider comparing full model content after a hash match.
✅ Suggested fix
def __eq__(self, other: object) -> bool: # pragma: no cover - trivial wrapper
if not isinstance(other, Solution):
return NotImplemented
- return self._hash_cache == other._hash_cache
+ if self._hash_cache != other._hash_cache:
+ return False
+ return self.model_dump() == other.model_dump()🤖 Prompt for AI Agents
In `@flashinfer_bench/data/solution.py` around lines 253 - 256, The __eq__
implementation in Solution currently returns equality based solely on
_hash_cache which can misreport distinct objects as equal; update
Solution.__eq__ to first check isinstance(other, Solution) then if _hash_cache
differs return False, but if hashes match perform a full content comparison
(e.g., compare the model payload/serialized representation and any non-hashed
metadata such as author and description) to ensure true equality rather than
relying only on _hash_cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
flashinfer_bench/data/solution.py (1)
113-199: Nested models BuildSpec and SourceFile are not frozen; cached hash can go stale if they are mutated.
SolutionusesConfigDict(frozen=True), butBuildSpecandSourceFileinherit fromBaseModelWithDocstringswhich does not setfrozen=True. This means fields likespec.target_hardware,spec.dependencies, andsources[i].contentcan be mutated post-init via methods like.append(),.pop(), or direct assignment, causing_hash_cache,__hash__, and__eq__to become inconsistent.While the codebase currently has no such mutations, this design is fragile and relies on developers avoiding mutations of nested objects. Either freeze
BuildSpecandSourceFileby settingfrozen=Truein theirmodel_config, or remove the cached hash optimization and compute hashes on-demand.
♻️ Duplicate comments (1)
flashinfer_bench/data/solution.py (1)
237-240: Avoid equality based solely on cached hash.Hash equality can misclassify distinct solutions (collisions or fields not included in the hash). Consider a full content comparison after a hash match.
✅ Suggested fix
def __eq__(self, other: object) -> bool: # pragma: no cover - trivial wrapper if not isinstance(other, Solution): return NotImplemented - return self._hash_cache == other._hash_cache + if self._hash_cache != other._hash_cache: + return False + return self.model_dump() == other.model_dump()
🧹 Nitpick comments (1)
flashinfer_bench/data/solution.py (1)
200-213: Consider sorting dependencies and sources before hashing for order-independence.If list order is not semantically meaningful, different ordering of the same content will produce different hashes and reduce cache hits. Sorting by dependency and source path makes the hash stable across equivalent inputs.
♻️ Proposed change
- for s in ( + for s in ( self.name, self.definition, self.spec.language, self.spec.entry_point, - *self.spec.dependencies, - *(part for src in self.sources for part in (src.path, src.content)), + *sorted(self.spec.dependencies), + *( + part + for src in sorted(self.sources, key=lambda s: s.path) + for part in (src.path, src.content) + ), ):
Ubospica
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Pre-compute and cache solution hash for reduced apply() overhead.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.