-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize qubit hash for Set operations #6908
base: main
Are you sure you want to change the base?
Conversation
Improves amortized `Set` operations perf by around 50%, though with the caveat that sets with qudits of different dimensions but the same index will always have the same key (not just the same bucket), and thus have to check `__eq__`, causing degenerate perf impact. It seems unlikely that anyone would intentionally do this though. ```python s = set() for q in cirq.GridQubit.square(100): s = s.union({q}) ```
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6908 +/- ##
=======================================
Coverage 97.86% 97.86%
=======================================
Files 1084 1084
Lines 94290 94308 +18
=======================================
+ Hits 92280 92298 +18
Misses 2010 2010 ☔ View full report in Codecov by Sentry. |
# This approach seems to perform better than traditional "random" hash in `Set` | ||
# operations for typical circuits, as it reduces bucket collisions. Caveat: it does not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you evaluate this reduction in bucket collisions? Would be good to show this explicitly before we decide to abandon the standard tuple hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test code is up in the description. It's about 50% faster with this implementation.
One note is that it seems like it's only faster for copy-on-change ops like s = s.union({q})
. It doesn't seem to have any effect when we operate on sets mutably like s |= {q}
. But given most of our stuff is immutable, we see a lot more of the former in our codebase.
cirq-core/cirq/devices/grid_qubit.py
Outdated
square_index = max(abs_row, abs_col) | ||
inner_square_side_len = square_index * 2 - 1 | ||
outer_square_side_len = inner_square_side_len + 2 | ||
inner_square_area = inner_square_side_len**2 | ||
if abs_row == square_index: | ||
offset = 0 if row < 0 else outer_square_side_len | ||
i = inner_square_area + offset + (col + square_index) | ||
else: | ||
offset = (2 * outer_square_side_len) + (0 if col < 0 else inner_square_side_len) | ||
i = inner_square_area + offset + (row + (square_index - 1)) | ||
self._hash = hash(i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is almost 3x slower than the current tuple hash, which is quite a big regression so unless we can really show that this reduces hash collisions I'm not sure we would want to make this change.
In [1]: def tuple_hash(row, col, d):
...: return hash((row, col, d))
...:
In [2]: def square_hash(row, col, d):
...: if row == 0 and col == 0:
...: return 0
...: abs_row = abs(row)
...: abs_col = abs(col)
...: square_index = max(abs_row, abs_col)
...: inner_square_side_len = square_index * 2 - 1
...: outer_square_side_len = inner_square_side_len + 2
...: inner_square_area = inner_square_side_len**2
...: if abs_row == square_index:
...: offset = 0 if row < 0 else outer_square_side_len
...: i = inner_square_area + offset + (col + square_index)
...: else:
...: offset = (2 * outer_square_side_len) + (0 if col < 0 else inner_square_side_len)
...: i = inner_square_area + offset + (row + (square_index - 1))
...: return hash(i)
...:
In [3]: %timeit [tuple_hash(r, c, d) for r in range(20) for c in range(20) for d in [2, 3, 4]]
151 µs ± 427 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [4]: %timeit [square_hash(r, c, d) for r in range(20) for c in range(20) for d in [2, 3, 4]]
437 µs ± 2.37 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not married to it. It was something I noticed when looking into creating very wide circuits and got nerd sniped. It's a reasonable optimization for copy-on-change operations on large sets. But if we want to stick to the existing approach, I'd say it's completely justifiable.
Improves amortized
Set
operations perf by around 50%, though with the caveat that sets with qudits of different dimensions but the same index will always have the same key (not just the same bucket), and thus have to check__eq__
, causing degenerate perf impact. It seems unlikely that anyone would intentionally do this though.Fixes #6886, if we decide to do this.