Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#18360] YSQL: Fix wrong results for IndexScan yb_hash_code(i) > -1 #22551

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

parkhi
Copy link

@parkhi parkhi commented May 27, 2024

Issue #18360 : [YSQL] Wrong results for IndexScan yb_hash_code(i) > -1

RootCause : Hash code keys are stored in datatype ListCell which allows negative intergers as data while normal keys are stored in datatype ScanKey which accepts data in the format of uint64_t. When a negative number like -1 is passed in the sql query , it gets converted into uint64_t type as 18446744073709551615. The query internally becomes :

/+IndexScan(tt)/ SELECT * FROM tt WHERE yb_hash_code(i) > 18446744073709551615 and code tries to filter out records accordingly which fails.

@CLAassistant
Copy link

CLAassistant commented May 27, 2024

CLA assistant check
All committers have signed the CLA.

@@ -2316,7 +2316,7 @@ YbBindHashKeys(YbScanDesc ybScan)
Assert(YbIsHashCodeSearch(key));
YbBound bound = {
.type = YB_YQL_BOUND_VALID,
.value = key->sk_argument
.value = (lc->data.int_value<0)?0:key->sk_argument
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave a space around all the operators. Please look around the codebase for examples of how ListCell instances are accessed.

Can you achieve the same thing by only checking something on key->sk_argument?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made changes to depend only on key->sk_argument.

Updated changes as per comment to depend on the decision-making only on key->sk_argument. If the value is greater than the maximum int value, then it is safe to assume that it has come originally from a negative integer.
Copy link

netlify bot commented May 30, 2024

Deploy Preview for infallible-bardeen-164bc9 ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 2125f72
🔍 Latest deploy log https://app.netlify.com/sites/infallible-bardeen-164bc9/deploys/6687c254e0d3ee0008e45472
😎 Deploy Preview https://deploy-preview-22551--infallible-bardeen-164bc9.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@@ -2316,7 +2316,7 @@ YbBindHashKeys(YbScanDesc ybScan)
Assert(YbIsHashCodeSearch(key));
YbBound bound = {
.type = YB_YQL_BOUND_VALID,
.value = key->sk_argument
.value = (key->sk_argument > INT_MAX) ? 0 : key->sk_argument
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the case where it's below INT_MIN?
Why is this > and not >=?
Is there a test case against key->sk_argument being equal to INT_MAX?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key->sk_argument is of type Datum, meaning it will never hold a negative integer value, so it will never be below INT_MIN rather it will never be below 0 (INT_MAX = 2147483647 (for 32-bit Integers)
INT_MAX = 9,223,372,036,854,775,807 (for 64-bit Integers) INT_MIN = –2147483648 (for 32-bit Integers)
INT_MIN = –9,223,372,036,854,775,808 (for 64-bit Integers)
)

Why would we want to set the key value to 0 when it holds value = INT_MAX? INT_MAX is a valid positive integer a user can put in the SQL query, this change aims to handle those positive values that came by converting a negative integer to an unsigned integer.

As of now, I have only added the test cases present on the bug as you can see in file [yb_hash_code.sql] , Should I add the case where the user is passing INT_MAX as input instead of a negative integer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanujnay112 Please let me know if any further changes or testing needed here.

Copy link
Contributor

@tanujnay112 tanujnay112 Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, I have only added the test cases present on the bug as you can see in file [yb_hash_code.sql] , Should I add the case where the user is passing INT_MAX as input instead of a negative integer?

Yes, please add more test cases as requested. Please add a test case for INT_MAX + 1 and INT_MAX.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanujnay112 INT_MAX is not accepted by yugabyte sql command line and since this value can be different depending on whether machine is 32 bit or 64 , it can not be hard coded in test cases. Can you suggest some way to integrate this test in here?
Also I tested locally with yb_hash_code(i) > 2^31 - 1 (INT_MAX for my machine) and yb_hash_code(i) > 2^31 (INT_MAX+1) and they both resulted in 0 rows while yb_hash_code(i) > -1 resulted in 2 rows. Since lc->data.int_value is of type signed int , when value bigger than INT_MAX is passed , it is not read as it is but only 32 bits are read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please add those as test cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanujnay112 Added the tests as suggested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @parkhi, I've been looking into other yb_hash_code-related issue, basically, yb_hash_code(hkey) < 65536 was reported not working properly.
I triaged the report and went to GitHub to check if it was reported before and found this work in progress.
Your fix is in right place, but you are missing a few points.
First of all, the yb_hash_code returns a value from the range [0..65535]. If compared to an out of range constant, it is either "all" or "nothing", depending on the operation and whether the constant is above or below the range.
It seems convenient to have out of bound condition checked when bound is applied to the total range in the YbApplyStartBound and YbApplyEndBound functions. The "Start" and "End" there defines the side, and out of bound value won't change the range, the function would just return true or false if the condition is respectively "all" or "nothing".

Second point (minor) is that uint64_t is not the best data type for the YbBound.value. It basically takes the value from the query, which is signed. The "long" value would allow typecast without data loss.

@hari90
Copy link
Contributor

hari90 commented Jun 18, 2024

The title should be in this format:

[#18360] YSQL: Fix wrong results for IndexScan yb_hash_code(i) > -1

@parkhi parkhi changed the title PR for fixing issue #18360 [#18360] YSQL: Fix wrong results for IndexScan yb_hash_code(i) > -1 Jul 3, 2024
@parkhi
Copy link
Author

parkhi commented Jul 3, 2024

The title should be in this format:

[#18360] YSQL: Fix wrong results for IndexScan yb_hash_code(i) > -1

Done

INSERT INTO tt VALUES (1, 2);
EXPLAIN (COSTS OFF, TIMING OFF, SUMMARY OFF, ANALYZE)/*+IndexScan(tt)*/ SELECT * FROM tt WHERE yb_hash_code(i) > -1;
/*+IndexScan(tt)*/ SELECT * FROM tt WHERE yb_hash_code(i) > -1;
/*+SeqScan(tt)*/ SELECT * FROM tt WHERE yb_hash_code(i) > -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested test cases:
yb_hash_code(i) compared to various constant out of the [0..65535] range: negative values as well as positive values around max(uint16), max(int32), max(uint32), etc.
Use various strict and non-strict inequality operators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants