-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DISCUSSION] 2024 Q4 / 2025 Q1 Roadmap #13274
Comments
BTW my personal plans over the next few months are likely going to be focus on consolidating some of the gains / improvements we have made recently. That includes:
Improve the project's documentation
Performance wise I plan to
|
I am not sure if this is the place for it but I have been putting a lot of work into |
For anyone else following along, |
I may want to help delta / iceberg integration, I think they are quite important. But I will work on performance task first |
@jayzhan211 I agree, they are very important. Unfortunately, we have been held up because of the crates using different versions of datafusion. The idea was to converge on 42 - which iceberg and hudi currently use but deltalake (which we already have an integration for) is on 41 and hasnt been able to upgrade yet. It looks like they are skipping version 42 now and will use 43 - so hopefully this is resolved soon. Here is some relevant work |
@jayzhan211 and to be more explicit on my release plans, i did not plan on releasing until iceberg and hudi were added. |
LogicalType is important too #12622 |
I think another potentially very interesting approach here will be to use the The idea there would be to wrap the delta / iceberge in a stable ABI (aka the FFI bindings) so we could call delta.rs / iceberg which used a different version of DataFusion from |
On the python side, getting better integration with the python delta-rs package was the entire reason for pushing for the FFI bindings. I have branches ready to go for For the pure rust implementations, I think it would be best to not cross the |
I think as soon as DataFusion 43.0.0 is released we'll be able to test it out:
It should be quite sweet |
I don't know if this discussion is the place we want to track work in the other related projects, but my top goals for 2024 Q4 are:
|
One thing thats not clear to me with the FFI approach is who the intended owner of the bindings are - should it be Of course in the short term it could be prototyped in |
The version of DataFusion used in the bindings has to match the client program ( One thing we might be able to do is have a separate crate like |
@matthewmturner Do you have anything in mind moving forward for integrating the rest of the data lakes? (such as a list of what needs to be done moving forward) |
Yes, now that DataFusion v43 has been released I am hoping that the rust implementations of the three main data lake formats (Deltalake / Iceberg / Hudi) update to that version. Then I will:
I am interested in the FFI bindings but I don't anticipate working on that prior to the current release I am planning. |
I have had a few days to reflect , and I personally think making it easy to integrate DataFusion into the "open data lake" stack might be my top priority over the coming months @julienledem wrote up a very nice piece descsribing this The advent of the Open Data Lake In my mind, the specific work this entails stuff like
More to come |
This is what I had in mind: Thanks for the link to the dft one. That is a good one |
@alamb i will work on that next. will ping you when ready for review. |
I filed to try and organize my thoughts here better |
Add's a dedicated executor for running CPU bound work on the FlightSQL server. There is interest from the [DataFusion community](apache/datafusion#13274 (comment)) for this, it was already on our [roadmap](#197) and I think the DFT FlightSQL server is a great place to have a reference implementation. Initial inspiration and context can be found [here](https://thenewstack.io/using-rustlangs-async-tokio-runtime-for-cpu-bound-tasks/). Most of the initial implementation was copied from [here](https://github.com/influxdata/influxdb3_core/blob/6fcbb004232738d55655f32f4ad2385523d10696/executor/src/lib.rs) with some tweaks for our current setup. In particular we dont have metrics yet in the FlightSQL server implementation (but it is on the [roadmap](#210)) - I expect to do a follow on where metrics are integrated.
Is your feature request related to a problem or challenge?
The last roadmap discussion we had seems to have worked out well to galvanize and get us organized around some common goals
Describe the solution you'd like
Let's collect any projects that people think they are likely to spend time on or projects that the broader community would really like to see done and write them down!
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: