-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release DataFusion 44.0.0
#13334
Comments
I would love to get this one in: Some things to highlight (that are already merged) |
Also this hopefully: |
@Omega359 and @andygrove suggested #13525 (comment) that for this release we
Perhaps for 44.0.0 we can try with own subprojects (Ballista, Comet, DF Python, DF Ray) |
I started gathering a list of items we think we should fix prior to the release in the description |
Does anyone have any opinion about holding the DataFusion 44 release for the next major arrow release? That would fix a bunch of the StringView issues, but would likely delay DataFusion 44 for a few more weeks |
It's soon to be the holiday season so I'm all for cooking the release a bit more. |
I would personally love to see DataFusion 44.0.0 be lauded as "super stable" and have few upgrade issues (we would largely achieve this by testing upgrades with other projects prior to release) The downside is that the changeset in DataFusion 44.0.0 might be larger |
I'd like to see an upgrade guide for the 44 release (and am willing to take the lead on this). I am trying to upgrade Comet now and am running into some issues. I am +1 for waiting for the next Arrow release. |
I filed #13702 |
How would everyone feel about increasing the length of release votes from 3 days to 7 days to give downstream projects more time to test the release? Sometimes the vote starts on a Friday and passes on a Monday and we can't expect everyone to be working weekends. |
In my opinion, we would ideally do the testing before we make the release candidate (as the overhead of making RCs is non trivial). However, I am not opposed to extending the voting timeline if that would get us more testing time |
Testing before we create the RC makes sense |
I have been thinking about this upgrade and the next arrow release Specifically I think we need this upgrade in order to resolve some of the issues in 44 related to string view However, what I would like to propose is:
The downside of this plan is
The upsides are
If someone is critically waiting on the string view fixes, we can also contemplate making another arrow incremental release |
I am also going to coordinate with the delta-rs folks to try and test upgrading prior to release: |
Starting an issue and branch in |
I'll give my 2 cents from my experience regarding how other popular project handle this issue In Node.js (I'm a core collaborator) before each release we run the tests of the most popular packages with the new node version and making sure we are not breaking them, if we are we let them know. this is really helpful to avoid breaking the entire world this process is a CITGM, you can watch this talk for more info |
I am +1 for releasing DF 44 without the next major version of Arrow. However, there are some bug fixes and performance improvements in Arrow that I would like to use so would like to see a new minor release (and am happy to help with this if needed). |
Hi - would like to get #13647 out as soon as is practical - a small fix but does unblock my use of Thanks! |
|
I think we should start pushing to make this release before it accumulates more API changes such as @andygrove are you willing to try and make this release later this week? Maybe we can rally people to agree on what we need for the next release and try and get the upgrades lines up |
(Just for what it's worth, I appreciate smaller releases more often compared to bigger ones less often, even if it means more more times I need to fix downstream stuff due to breaks, presumably it usually also means less big breaks. And also faster time to get my and other's improvements and fixes into test/use, and thus faster finding of unintended regressions when they happen. That said, I appreciate there's a bunch of factors including the work needed etc, just wanted to offer my 2 cents 😄) |
@alamb Sure, but I didn't have time yet to help with creating a migration guide, other than the notes I made on #13702 |
The main thing we are waiting on for Comet is #13778 |
I agree -- thank you @Blizzara In this case I think we are running into bandwidth / maintainer limitations -- I would really like to make sure our releases are less jarring Does anyone have time to help test with delta.rs? |
https://github.com/lakehq/sail has imported over 3,800 tests from the Spark code base. With each DataFusion upgrade, we have found that any bugs/breaks likely cause one of our tests to fail. If helpful, we're happy to test DataFusion with Sail and include Sail as part of the testing process for each release! |
@shehabgamin that would be awesome, thanks! |
@findepi Great will do! Also, over time, we're happy to work on porting any tests into DataFusion that would make sense to have in DataFusion. We'll integrate Sail with the latest commit in the DataFusion main branch and report back our findings. When should Sail complete testing by? We do have bandwidth this week and next week. We’d be happy to prioritize testing this on our side, just lemme know expectations around timing! |
Agreed, let's do this |
I am personally hoping we are ready to make an RC from main by early next week
100% agree -- a good place to start might be any tests that you find fail in your project but pass all DataFusion tests |
Just fyi, I may not be online much next week, not that I need to be involved in cutting the RC, but will likely have time the following week to help, if needed. |
What I plan to do is try and clean up any remaining outstanding PRs and get them merged over the next few days and I'll try and prepare the RC asap Any help people can give to help figure out how to get the approved PRs merged would be super helpful: |
Given that many changes will be merged in after pre-release testing, I think it makes sense to have people run their tests one more time once there is a release candidate. This should be effortless since further breaking changes should be minimal or non-existent. Otherwise, this defeats the purpose of pre-release testing because we cannot guarantee the stability of the final release without validating the accumulated changes. I'm getting started now with testing on Sail. Using the current latest commit from main branch: EDIT: Adding #13855 |
Test complete: #13855 (comment) |
I am looking at #13510 now, and then plan to make a PR with release notes, verison update, etc Update:
|
@alamb The miri issue does not seem to be resolved for Comet - #13835 (comment) Is there anything I should have done other than set |
That sounds like it is the right thing to me 🤔
|
Is your feature request related to a problem or challenge?
Tracking ticket for next release, also a place to track desired inclusions
Last release was https://crates.io/crates/datafusion/43.0.0 November 8th, 2024 so next major release would be around December 8, 2024
Prior release tickets:
Items to possibly fix before release
Invalid comparison operation: Utf8 == Utf8View
error during LEFT ANTI JOIN #13510Improved upgrade experience:
Signature::nullary
in 44.0.0 easier / less confusing #13763DocBuilder
migration in44.0.0
easier #13764recursive
dependency an optional feature #13766Downstream project testing:
The text was updated successfully, but these errors were encountered: