Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move EnforceDistribution into datafusion-physical-optimizer crate #14190

Merged
merged 29 commits into from
Jan 22, 2025

Conversation

logan-keede
Copy link
Contributor

@logan-keede logan-keede commented Jan 18, 2025

Which issue does this PR close?

Closes #14186 #14185 #14184.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate labels Jan 18, 2025
Fix circular dependency
@alamb alamb changed the title Move crates Move `Encrates Jan 18, 2025
@alamb alamb changed the title Move `Encrates Move EnforceDistribution into datafusion-physical-optimizer crate Jan 18, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @logan-keede

datafusion/physical-optimizer/Cargo.toml Outdated Show resolved Hide resolved
use datafusion_physical_plan::repartition::RepartitionExec;
use datafusion_physical_plan::sorts::sort::SortExec;
use datafusion_physical_plan::sorts::sort_preserving_merge::SortPreservingMergeExec;
// use datafusion_physical_plan::union::UnionExec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we can clean these up too

Copy link
Contributor Author

@logan-keede logan-keede Jan 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it because I thought it would need to be moved here sooner or later, but sure it can be cleaned.
PS: you are talking about comments, right?

@xudong963 xudong963 self-requested a review January 20, 2025 06:37
@logan-keede logan-keede requested a review from alamb January 20, 2025 22:24
@logan-keede
Copy link
Contributor Author

This PR now fixes #14185 #14186 #14184.
moving tests to core_integration tests and organising it into suggested structure is still pending, but for now cargo build works.

@berkaysynnada
Copy link
Contributor

Hi @logan-keede. Thank you for working on this exhaustive issue. The join_selection rule has been moved to the physical_optimizer crate, but enforce_sorting, enforce_distribution, and other utilities are still left in core for now. This will leave our fork in a very unstable state (and perhaps many others as well).

I’d like to push this PR (and eventually all rules) to the finish line as soon as possible. Are you facing any challenges that are slowing progress? Or would you mind if I push some commits to help move things forward?

@logan-keede
Copy link
Contributor Author

@berkaysynnada This PR is pretty much in its last stretch, I was facing some problems with tests, but I think I have got it now, additionally @buraksenn should be pushing some commits to fix EnforceSorting tests soon.

I do not mind if you still want to help. I would certainly ask you if I get stuck again.

Thanks,
Logan

@logan-keede
Copy link
Contributor Author

To move tests to datafusion/core/tests/physical_optimizer,
I choose to make a few required function public, I am not sure if that is detrimental to code quality or not.
Alternatively, we could keep those tests with the original file(EnforceDistribution).
Except that, I think it is mostly complete.
Thanks,
Logan

@logan-keede
Copy link
Contributor Author

cc @alamb @xudong963

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datafusion-testing is using the commit: 36283d195c728f26b16b517ba999fd62509b6649 (https://github.com/apache/datafusion-testing/commits/36283d195c728f26b16b517ba999fd62509b6649/)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dup with: https://github.com/apache/datafusion/pull/14219/files and #14219 is ready to merge. Maybe you can do a rebase after it merges.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dup with: https://github.com/apache/datafusion/pull/14219/files and #14219 is ready to merge. Maybe you can do a rebase after it merges.

I'll get these PR's, and resolve any conflicts.

@berkaysynnada
Copy link
Contributor

@logan-keede I've resolved the conflicts and made some further simplifications to remove duplicated utilities. If you're okay with it, I'll send a few commits directly to your branch and then merge this PR.

Thanks for the collaboration!

Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is ready. I’ve spent quite some time on this, and with the completion of #14235, we'll have completed the entire migration.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @berkaysynnada for pushing this along and @logan-keede for starting the process. Let's merge this PR in asap to avoid accumulating conflicts

@@ -0,0 +1,861 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how you have split out the tests here

@@ -41,6 +41,7 @@ datafusion-common = { workspace = true, default-features = true }
datafusion-execution = { workspace = true }
datafusion-expr = { workspace = true }
datafusion-expr-common = { workspace = true, default-features = true }
datafusion-functions-aggregate = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to avoid this dependency if possible -- I will like to make a follow on PR to remove it -- similar to how I did in #14134

However when I just used the stub a few tests started failing on me

@alamb alamb merged commit 2938fb1 into apache:main Jan 22, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented Jan 22, 2025

Here is a PR that proposes to consolidate the tests:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move EnforceDistribution into datafusion-physical-optimizer crate
5 participants