[Propeller] Exploration and Integration into CachyOS (Blocked by LLVM 18) #343

ptr1337 · 2024-12-02T18:54:55Z

Propeller is a profile-guided, post-link optimization framework developed by Google to enhance the performance of large-scale applications. It operates by relinking binaries based on precise runtime profiles, enabling optimizations that are challenging to achieve during the initial compilation phase.

Propeller does following:

Basic Block Reordering
Function Reordering
Function Splitting

So, basically it has equalities with BOLT.

Propellor should be applied after the final AutoFDO compilation and then needs to be profiled and compiled again. The workflow would look like following:

Compile Kernel with AUTOFDO_CLANG
boot into and profile the Kernel with AUTOFDO_CLANG
Convert the AutoFDO Profile
Compile the Kernel with the AutoFDO Profile passed and enable AUTOFDO_CLANG
Boot into the AutoFDO profiled Kernel with PROPELLER_CLANG enabled and profile the kernel
Convert the profile with following command:

create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \
     --format=propeller --propeller_output_module_name \
     --out=<propeller_profile_prefix>_cc_profile.txt \
     --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt

Compile the kernel with the AutoFDO and Propeller Profile passed:

CLANG_AUTOFDO_PROFILE=<autofdo_profile> CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix> CLANG_AUTOFDO_PROFILE=<profile_file>

Adding additionally Propeller support on top of AutoFDO generally brings one additional compilation, which brings in total 3 compilation each architecture.
We need to recheck, if we can reuse the Propellor Profile for the v4 archtitecture, since this is the much less used architecture.

According to the paper from Google, the performance benefit at common application on top of ThinLTO and PGO vary between 1-8%:

Propeller has been deployed in production environments at Google, with tens of millions of cores executing Propeller-optimized code. Evaluations on internal warehouse-scale applications have demonstrated performance improvements ranging from 1.1% to 8% beyond existing optimizations like Profile-Guided Optimization (PGO) and ThinLTO. For instance, compiler tools such as Clang have shown a 7% performance increase, while MySQL has improved by 1%.

Paper: https://research.google/pubs/propeller-a-profile-guided-relinking-optimizer-for-warehouse-scale-applications/?utm_source=chatgpt.com

The text was updated successfully, but these errors were encountered:

1Naim changed the title ~~[Propeller] Exploration and Integration into CachyOS (Blocked by LLVM 19 currently)~~ [Propeller] Exploration and Integration into CachyOS (Blocked by LLVM 18) Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Propeller] Exploration and Integration into CachyOS (Blocked by LLVM 18) #343

[Propeller] Exploration and Integration into CachyOS (Blocked by LLVM 18) #343

ptr1337 commented Dec 2, 2024 •

edited

Loading

[Propeller] Exploration and Integration into CachyOS (Blocked by LLVM 18) #343

[Propeller] Exploration and Integration into CachyOS (Blocked by LLVM 18) #343

Comments

ptr1337 commented Dec 2, 2024 • edited Loading

ptr1337 commented Dec 2, 2024 •

edited

Loading