Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Extract remaining physical optimizer out of core #11502

Open
9 of 18 tasks
jayzhan211 opened this issue Jul 17, 2024 · 8 comments
Open
9 of 18 tasks

[EPIC] Extract remaining physical optimizer out of core #11502

jayzhan211 opened this issue Jul 17, 2024 · 8 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@jayzhan211
Copy link
Contributor

jayzhan211 commented Jul 17, 2024

Is your feature request related to a problem or challenge?

Historically DataFusion was one (very) large crate datafusion, and as it grew bigger we extracted various functionality into separate crates. This leads to both faster compile times (as the crates can be compiled in parallel) as well easier to navigate code (as the crates force a cleaner dependency separation)

This project tracks moving the physical optimizers:

Describe the solution you'd like

Pull physical optimizer rules to datafusion-physical-optimizer.

There are some tests that have dependencies on datasource, which they are not possible to move out of core at this moment. We may need to leave those tests inside core for now.

Describe alternatives you've considered

No response

Additional context

No response

@jayzhan211 jayzhan211 added the enhancement New feature or request label Jul 17, 2024
@alamb alamb changed the title [EPIC] Extract physical optimizer out of core [EPIC] Extract remaining physical optimizer out of core Jul 18, 2024
@jonathanc-n
Copy link
Contributor

@jayzhan211 @lewiszlw How do we plan on moving the rest of the functions. If we move the tests separately many of the tests depend on the traits in the functions that are being moved. Due to this it would make sense to import together, but then problems with the datasource and other util dependencies will pop up again. Do you guys have any solutions in mind?

@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Nov 8, 2024

We may need to move datasource out of core.

We probably need to revisit this #10782

@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Nov 8, 2024

Try to move optimization that doesn't have datasource dependency like

These are good first issues

  • enforce_distribution
  • enforce_sorting
  • join_selection
  • pruning
  • replace_with_order_preserving_variants
  • sanity_checker
  • sort_pushdown
  • enforce_distribution
  • join_selection

Require datasource

  • projection_pushdown (Require CsvExec)

If we only need datasource in test, leave the test in core

@jayzhan211 jayzhan211 added the good first issue Good for newcomers label Nov 8, 2024
@jonathanc-n
Copy link
Contributor

@jayzhan211 Would you like to open a ticket for moving datasource out of core, so others are more aware of this transition?

@jayzhan211
Copy link
Contributor Author

@jayzhan211 Would you like to open a ticket for moving datasource out of core, so others are more aware of this transition?

Not sure how do achieve this yet, the high level idea is tracked in #10782

@josh0yeh
Copy link

josh0yeh commented Nov 9, 2024

take

@alamb
Copy link
Contributor

alamb commented Jan 10, 2025

@alamb
Copy link
Contributor

alamb commented Jan 18, 2025

I filed a last few tickets

Once those are moved I think we can move the last remnants of the physical optimizer and close this ticket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants