-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#18360] YSQL: Fix wrong results for IndexScan yb_hash_code(i) > -1 #22551
base: master
Are you sure you want to change the base?
Conversation
@@ -2316,7 +2316,7 @@ YbBindHashKeys(YbScanDesc ybScan) | |||
Assert(YbIsHashCodeSearch(key)); | |||
YbBound bound = { | |||
.type = YB_YQL_BOUND_VALID, | |||
.value = key->sk_argument | |||
.value = (lc->data.int_value<0)?0:key->sk_argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please leave a space around all the operators. Please look around the codebase for examples of how ListCell
instances are accessed.
Can you achieve the same thing by only checking something on key->sk_argument
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made changes to depend only on key->sk_argument.
Updated changes as per comment to depend on the decision-making only on key->sk_argument. If the value is greater than the maximum int value, then it is safe to assume that it has come originally from a negative integer.
✅ Deploy Preview for infallible-bardeen-164bc9 ready!Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify site configuration. |
@@ -2316,7 +2316,7 @@ YbBindHashKeys(YbScanDesc ybScan) | |||
Assert(YbIsHashCodeSearch(key)); | |||
YbBound bound = { | |||
.type = YB_YQL_BOUND_VALID, | |||
.value = key->sk_argument | |||
.value = (key->sk_argument > INT_MAX) ? 0 : key->sk_argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the case where it's below INT_MIN
?
Why is this >
and not >=
?
Is there a test case against key->sk_argument
being equal to INT_MAX
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key->sk_argument is of type Datum, meaning it will never hold a negative integer value, so it will never be below INT_MIN rather it will never be below 0 (INT_MAX = 2147483647 (for 32-bit Integers)
INT_MAX = 9,223,372,036,854,775,807 (for 64-bit Integers) INT_MIN = –2147483648 (for 32-bit Integers)
INT_MIN = –9,223,372,036,854,775,808 (for 64-bit Integers))
Why would we want to set the key value to 0 when it holds value = INT_MAX? INT_MAX is a valid positive integer a user can put in the SQL query, this change aims to handle those positive values that came by converting a negative integer to an unsigned integer.
As of now, I have only added the test cases present on the bug as you can see in file [yb_hash_code.sql] , Should I add the case where the user is passing INT_MAX as input instead of a negative integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanujnay112 Please let me know if any further changes or testing needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now, I have only added the test cases present on the bug as you can see in file [yb_hash_code.sql] , Should I add the case where the user is passing INT_MAX as input instead of a negative integer?
Yes, please add more test cases as requested. Please add a test case for INT_MAX + 1 and INT_MAX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanujnay112 INT_MAX is not accepted by yugabyte sql command line and since this value can be different depending on whether machine is 32 bit or 64 , it can not be hard coded in test cases. Can you suggest some way to integrate this test in here?
Also I tested locally with yb_hash_code(i) > 2^31 - 1 (INT_MAX for my machine) and yb_hash_code(i) > 2^31 (INT_MAX+1) and they both resulted in 0 rows while yb_hash_code(i) > -1 resulted in 2 rows. Since lc->data.int_value is of type signed int , when value bigger than INT_MAX is passed , it is not read as it is but only 32 bits are read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please add those as test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanujnay112 Added the tests as suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @parkhi, I've been looking into other yb_hash_code-related issue, basically, yb_hash_code(hkey) < 65536
was reported not working properly.
I triaged the report and went to GitHub to check if it was reported before and found this work in progress.
Your fix is in right place, but you are missing a few points.
First of all, the yb_hash_code returns a value from the range [0..65535]. If compared to an out of range constant, it is either "all" or "nothing", depending on the operation and whether the constant is above or below the range.
It seems convenient to have out of bound condition checked when bound is applied to the total range in the YbApplyStartBound and YbApplyEndBound functions. The "Start" and "End" there defines the side, and out of bound value won't change the range, the function would just return true or false if the condition is respectively "all" or "nothing".
Second point (minor) is that uint64_t is not the best data type for the YbBound.value. It basically takes the value from the query, which is signed. The "long" value would allow typecast without data loss.
The title should be in this format:
|
Done |
INSERT INTO tt VALUES (1, 2); | ||
EXPLAIN (COSTS OFF, TIMING OFF, SUMMARY OFF, ANALYZE)/*+IndexScan(tt)*/ SELECT * FROM tt WHERE yb_hash_code(i) > -1; | ||
/*+IndexScan(tt)*/ SELECT * FROM tt WHERE yb_hash_code(i) > -1; | ||
/*+SeqScan(tt)*/ SELECT * FROM tt WHERE yb_hash_code(i) > -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested test cases:
yb_hash_code(i) compared to various constant out of the [0..65535] range: negative values as well as positive values around max(uint16), max(int32), max(uint32), etc.
Use various strict and non-strict inequality operators.
Issue #18360 : [YSQL] Wrong results for IndexScan yb_hash_code(i) > -1
RootCause : Hash code keys are stored in datatype ListCell which allows negative intergers as data while normal keys are stored in datatype ScanKey which accepts data in the format of uint64_t. When a negative number like -1 is passed in the sql query , it gets converted into uint64_t type as 18446744073709551615. The query internally becomes :
/+IndexScan(tt)/ SELECT * FROM tt WHERE yb_hash_code(i) > 18446744073709551615 and code tries to filter out records accordingly which fails.