-
Notifications
You must be signed in to change notification settings - Fork 7k
[data] Monotonically increasing id #59290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Rishi Chandra <[email protected]>
Signed-off-by: Rishi Chandra <[email protected]>
Signed-off-by: Rishi Chandra <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces add_unique_id to add a monotonically increasing ID column to a dataset, which is a valuable addition. The implementation is clear and follows the described logic from Spark. However, I've identified a significant correctness issue in the handling of pyarrow.Table objects, which results in creating a nested list column instead of a flat integer column. I've also provided suggestions to improve efficiency, remove unreachable code, and enhance the clarity of the documentation. The tests are well-structured, but they may not be catching the pyarrow bug I've pointed out.
Signed-off-by: Rishi Chandra <[email protected]>
|
awesome! thanks a bunch for this contribution. will let @gvspraveen find someone to shepherd this. |
|
Thanks for the contribution. @bveeramani to shepherd this. |
|
hey @rishic3, thanks a bunch for the contribution! if you're interested in chatting more with the contributors, feel free to join our community sync -- https://docs.google.com/forms/d/e/1FAIpQLSeYWjNExnr6gbhO5rpM0i6wm4TBTdsm3y5S0LR8Syzk_2gelQ/viewform |
Signed-off-by: Rishi Chandra <[email protected]>
|
Previous commit implemented as a dataset method (see e0fc987). Pushed an update that instead implements as an expression as per original issue intended. |
Signed-off-by: Rishi Chandra <[email protected]>
Description
Implements monotonically increasing ID expression. This closely follows the Spark implementation https://github.com/apache/spark/blob/9bbdc0743034b40a904ca87a08da4e0bf2b1386c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/MonotonicallyIncreasingID.scala.
Example usage:
Related issues
Closes #57806