-
Notifications
You must be signed in to change notification settings - Fork 884
-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] [Java] CudfException on conversion of data between Arrow and Cudf #16794
Comments
Here are my findings so far:
I can narrow this repro down further with a Java test shortly. |
I should revise this: It wouldn't be productive to convert this into a standalone Java test. The problem is specifically caused because the result of the UDF in the repro is returning an Arrow I have done a fair bit of digging into the problem, and bisecting the changes. The crux of my findings concerns the effect of this part of the changes in #16590: - auto result = cudf::to_arrow(*tview, state->get_column_metadata(*tview));
+ auto got_arrow_schema = cudf::to_arrow_schema(*tview, state->get_column_metadata(*tview));
+ cudf::jni::set_nullable(got_arrow_schema.get());
+ auto got_arrow_array = cudf::to_arrow_host(*tview);
+ auto batch =
+ arrow::ImportRecordBatch(&got_arrow_array->array, got_arrow_schema.get()).ValueOrDie();
+ auto result = arrow::Table::FromRecordBatches({batch}).ValueOrDie(); I can confirm that rolling back to using It seems odd, but something in the way On the face of it, the schemas of the tables constructed in both methods (i.e. I'm still hopeful that we should be able to remedy this with a change in either |
Description
The following exception is seen when CUDF JNI bindings are used to convert CUDF data to Arrow format, and then back to CUDF:
This was found during integration tests with https://github.com/NVIDIA/spark-rapids and https://github.com/NVIDIA/spark-rapids-jni.
I have narrowed it down to when #16590 was merged. Prior versions of CUDF that don't have this commit seem to work fine.
Repro
We don't yet have a narrow repro that uses on CUDF/JNI. I will include the
pyspark
repro here, and replace it with something smaller, once we have it:Running this Pyspark script causes the CudfException to occur, and the query to fail.
Expected behaviour
One would expect that type-conversions not fail between CUDF and Arrow.
The text was updated successfully, but these errors were encountered: