Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datafusion binary size has been getting bigger #13816

Open
Tracked by #13813
alamb opened this issue Dec 17, 2024 · 3 comments
Open
Tracked by #13813

Datafusion binary size has been getting bigger #13816

alamb opened this issue Dec 17, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Dec 17, 2024

Is your feature request related to a problem or challenge?

The size of datafusion's binary has grown significantly in the last few releases

This likely leads to higher compile times as well as larger overall binary size

version size of datafusion-cli binary
main at 57d1309 92M
43.0.0 87M
42.0.0 83M
41.0.0 72M
40.0.0 69M
39.0.0 68M

The sizes are measured like this:

git checkout version
cd datafusion-cli
cargo build --release
du -h target/release/datafusion-cli

Also, people such as @g3blv have noticed that the WASM build has increased 50%:
#9834 (comment)

Describe the solution you'd like

I would like to reduce the binary size of DataFusion if possible

At least I would like to understand where the code size comes from and offer hints about how to reduce the size if needed

Describe alternatives you've considered

A common source of code size is templated functions (as that generates multiple copies of the same function(s)).

Here is some fascianting information from running cargo bloat -p datafusion

 File  .text    Size                          Crate Name
 0.1%   0.3% 79.7KiB                         blake2 blake2::Blake2bVarCore::compress
 0.1%   0.2% 70.7KiB                         blake2 blake2::Blake2sVarCore::compress
 0.1%   0.2% 67.1KiB                      sqlparser <sqlparser::ast::Statement as core::fmt::Display>::fmt
 0.1%   0.2% 61.4KiB                         blake3 _blake3_hash4_neon
 0.1%   0.2% 56.4KiB                      chrono_tz <chrono_tz::timezones::Tz as chrono_tz::timezone_impl::TimeSpans>::timespans
 0.1%   0.2% 44.7KiB                     arrow_cast <i64 as lexical_write_integer::api::ToLexical>::to_lexical
 0.1%   0.1% 42.8KiB                     arrow_cast arrow_cast::cast::cast_with_options
 0.0%   0.1% 35.9KiB                           rand <rand_chacha::chacha::ChaCha12Core as rand_core::block::BlockRngCore>::generate
 0.0%   0.1% 34.9KiB                     arrow_cast lexical_parse_float::slow::parse_mantissa
 0.0%   0.1% 33.1KiB                     arrow_cast lexical_parse_float::parse::parse_complete
 0.0%   0.1% 33.1KiB                     arrow_cast lexical_parse_float::parse::parse_complete
 0.0%   0.1% 29.0KiB                 regex_automata regex_automata::hybrid::search::find_fwd
 0.0%   0.1% 27.6KiB                         blake3 blake3::portable::compress_in_place
 0.0%   0.1% 27.1KiB                   aho_corasick aho_corasick::automaton::try_find_fwd
 0.0%   0.1% 25.2KiB                      sqlparser <sqlparser::ast::Expr as core::fmt::Display>::fmt
 0.0%   0.1% 23.8KiB              datafusion_common datafusion_common::scalar::ScalarValue::iter_to_array
 0.0%   0.1% 23.7KiB              datafusion_common datafusion_common::scalar::ScalarValue::iter_to_array
 0.0%   0.1% 23.7KiB       datafusion_physical_expr datafusion_common::scalar::ScalarValue::iter_to_array
 0.0%   0.1% 23.7KiB datafusion_functions_aggregate datafusion_common::scalar::ScalarValue::iter_to_array
 0.0%   0.1% 22.0KiB                     arrow_cast <u64 as lexical_write_integer::api::ToLexical>::to_lexical
36.7%  97.4% 27.7MiB                                And 139272 smaller methods. Use -n N to show more.
37.7% 100.0% 28.4MiB                                .text section size, the file size is 75.4MiB

Additional context

No response

@comphead
Copy link
Contributor

print_functions_docs
print_functions_config

binaries can be moved out from the main release

@comphead
Copy link
Contributor

Some good experiments are https://github.com/johnthagen/min-sized-rust?tab=readme-ov-file#optimize-libstd-with-xargo

with this profile

[profile.release]
codegen-units = 1
strip = true
panic = "abort"

The cli size went from 94.3MiB down to 48.6MiB 🤔

@alamb
Copy link
Contributor Author

alamb commented Dec 22, 2024

That is a very cool page 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants