[data] Monotonically increasing id #59290

rishic3 · 2025-12-09T06:36:33Z

Description

Implements monotonically increasing ID expression. This closely follows the Spark implementation https://github.com/apache/spark/blob/9bbdc0743034b40a904ca87a08da4e0bf2b1386c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/MonotonicallyIncreasingID.scala.

Example usage:

from ray.data.expressions import monotonically_increasing_id

ds = ray.data.range(100)
ds = ds.with_column("uid", monotonically_increasing_id())
train, test = ds_with_id.streaming_train_test_split(test_size=0.25, split_type="hash", hash_column="uid")

Related issues

Closes #57806

Signed-off-by: Rishi Chandra <[email protected]>

gemini-code-assist

Code Review

This pull request introduces add_unique_id to add a monotonically increasing ID column to a dataset, which is a valuable addition. The implementation is clear and follows the described logic from Spark. However, I've identified a significant correctness issue in the handling of pyarrow.Table objects, which results in creating a nested list column instead of a flat integer column. I've also provided suggestions to improve efficiency, remove unreachable code, and enhance the clarity of the documentation. The tests are well-structured, but they may not be catching the pyarrow bug I've pointed out.

python/ray/data/dataset.py

Signed-off-by: Rishi Chandra <[email protected]>

python/ray/data/dataset.py

richardliaw · 2025-12-10T00:12:05Z

awesome! thanks a bunch for this contribution. will let @gvspraveen find someone to shepherd this.

gvspraveen · 2025-12-10T01:43:23Z

Thanks for the contribution. @bveeramani to shepherd this.

richardliaw · 2025-12-10T03:01:29Z

hey @rishic3, thanks a bunch for the contribution! if you're interested in chatting more with the contributors, feel free to join our community sync -- https://docs.google.com/forms/d/e/1FAIpQLSeYWjNExnr6gbhO5rpM0i6wm4TBTdsm3y5S0LR8Syzk_2gelQ/viewform

Signed-off-by: Rishi Chandra <[email protected]>

rishic3 · 2025-12-10T06:50:25Z

Previous commit implemented as a dataset method (see e0fc987). Pushed an update that instead implements as an expression as per original issue intended.

Signed-off-by: Rishi Chandra <[email protected]>

rishic3 added 3 commits December 8, 2025 21:57

monotonically increasing id

e0fc987

Signed-off-by: Rishi Chandra <[email protected]>

fix signature

d06875f

Signed-off-by: Rishi Chandra <[email protected]>

cleanups

2f1ffbc

Signed-off-by: Rishi Chandra <[email protected]>

rishic3 requested a review from a team as a code owner December 9, 2025 06:36

rishic3 mentioned this pull request Dec 9, 2025

[data] Support expression for monotonically increasing id #57806

Open

gemini-code-assist bot reviewed Dec 9, 2025

View reviewed changes

python/ray/data/dataset.py Outdated Show resolved Hide resolved

python/ray/data/dataset.py Outdated Show resolved Hide resolved

python/ray/data/dataset.py Outdated Show resolved Hide resolved

address comments

60459af

Signed-off-by: Rishi Chandra <[email protected]>

cursor bot reviewed Dec 9, 2025

View reviewed changes

python/ray/data/dataset.py Outdated Show resolved Hide resolved

ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Dec 9, 2025

gvspraveen requested a review from bveeramani December 10, 2025 01:42

implement as expression

9ea1575

Signed-off-by: Rishi Chandra <[email protected]>

add test to bazel

9fa308c

Signed-off-by: Rishi Chandra <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[data] Monotonically increasing id #59290

[data] Monotonically increasing id #59290

Uh oh!

rishic3 commented Dec 9, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

richardliaw commented Dec 10, 2025

Uh oh!

gvspraveen commented Dec 10, 2025

Uh oh!

richardliaw commented Dec 10, 2025

Uh oh!

rishic3 commented Dec 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[data] Monotonically increasing id #59290

Are you sure you want to change the base?

[data] Monotonically increasing id #59290

Uh oh!

Conversation

rishic3 commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

richardliaw commented Dec 10, 2025

Uh oh!

gvspraveen commented Dec 10, 2025

Uh oh!

richardliaw commented Dec 10, 2025

Uh oh!

rishic3 commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rishic3 commented Dec 9, 2025 •

edited

Loading

rishic3 commented Dec 10, 2025 •

edited

Loading