We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The size of datafusion's binary has grown significantly in the last few releases
This likely leads to higher compile times as well as larger overall binary size
datafusion-cli
main
43.0.0
42.0.0
41.0.0
40.0.0
39.0.0
The sizes are measured like this:
git checkout version cd datafusion-cli cargo build --release du -h target/release/datafusion-cli
Also, people such as @g3blv have noticed that the WASM build has increased 50%: #9834 (comment)
I would like to reduce the binary size of DataFusion if possible
At least I would like to understand where the code size comes from and offer hints about how to reduce the size if needed
A common source of code size is templated functions (as that generates multiple copies of the same function(s)).
Here is some fascianting information from running cargo bloat -p datafusion
cargo bloat -p datafusion
File .text Size Crate Name 0.1% 0.3% 79.7KiB blake2 blake2::Blake2bVarCore::compress 0.1% 0.2% 70.7KiB blake2 blake2::Blake2sVarCore::compress 0.1% 0.2% 67.1KiB sqlparser <sqlparser::ast::Statement as core::fmt::Display>::fmt 0.1% 0.2% 61.4KiB blake3 _blake3_hash4_neon 0.1% 0.2% 56.4KiB chrono_tz <chrono_tz::timezones::Tz as chrono_tz::timezone_impl::TimeSpans>::timespans 0.1% 0.2% 44.7KiB arrow_cast <i64 as lexical_write_integer::api::ToLexical>::to_lexical 0.1% 0.1% 42.8KiB arrow_cast arrow_cast::cast::cast_with_options 0.0% 0.1% 35.9KiB rand <rand_chacha::chacha::ChaCha12Core as rand_core::block::BlockRngCore>::generate 0.0% 0.1% 34.9KiB arrow_cast lexical_parse_float::slow::parse_mantissa 0.0% 0.1% 33.1KiB arrow_cast lexical_parse_float::parse::parse_complete 0.0% 0.1% 33.1KiB arrow_cast lexical_parse_float::parse::parse_complete 0.0% 0.1% 29.0KiB regex_automata regex_automata::hybrid::search::find_fwd 0.0% 0.1% 27.6KiB blake3 blake3::portable::compress_in_place 0.0% 0.1% 27.1KiB aho_corasick aho_corasick::automaton::try_find_fwd 0.0% 0.1% 25.2KiB sqlparser <sqlparser::ast::Expr as core::fmt::Display>::fmt 0.0% 0.1% 23.8KiB datafusion_common datafusion_common::scalar::ScalarValue::iter_to_array 0.0% 0.1% 23.7KiB datafusion_common datafusion_common::scalar::ScalarValue::iter_to_array 0.0% 0.1% 23.7KiB datafusion_physical_expr datafusion_common::scalar::ScalarValue::iter_to_array 0.0% 0.1% 23.7KiB datafusion_functions_aggregate datafusion_common::scalar::ScalarValue::iter_to_array 0.0% 0.1% 22.0KiB arrow_cast <u64 as lexical_write_integer::api::ToLexical>::to_lexical 36.7% 97.4% 27.7MiB And 139272 smaller methods. Use -n N to show more. 37.7% 100.0% 28.4MiB .text section size, the file size is 75.4MiB
No response
The text was updated successfully, but these errors were encountered:
print_functions_docs print_functions_config
binaries can be moved out from the main release
Sorry, something went wrong.
Some good experiments are https://github.com/johnthagen/min-sized-rust?tab=readme-ov-file#optimize-libstd-with-xargo
with this profile
[profile.release] codegen-units = 1 strip = true panic = "abort"
The cli size went from 94.3MiB down to 48.6MiB 🤔
94.3MiB
48.6MiB
That is a very cool page 🤔
No branches or pull requests
Is your feature request related to a problem or challenge?
The size of datafusion's binary has grown significantly in the last few releases
This likely leads to higher compile times as well as larger overall binary size
datafusion-cli
binarymain
at 57d130943.0.0
42.0.0
41.0.0
40.0.0
39.0.0
The sizes are measured like this:
git checkout version cd datafusion-cli cargo build --release du -h target/release/datafusion-cli
Also, people such as @g3blv have noticed that the WASM build has increased 50%:
#9834 (comment)
Describe the solution you'd like
I would like to reduce the binary size of DataFusion if possible
At least I would like to understand where the code size comes from and offer hints about how to reduce the size if needed
Describe alternatives you've considered
A common source of code size is templated functions (as that generates multiple copies of the same function(s)).
Here is some fascianting information from running
cargo bloat -p datafusion
Additional context
No response
The text was updated successfully, but these errors were encountered: