Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use greg7mdp/parallel-hashmap for faster execution #2468

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

atetubou
Copy link
Contributor

@atetubou atetubou commented Jun 21, 2024

This implements option 1 in #2462 (comment).

This uses https://github.com/greg7mdp/parallel-hashmap instead of unordered_map for faster ninja execution.
Some more context is in https://crbug.com/346910589.

This makes noop build for chromium

  • 12% faster on Linux
  • 2% slower on Windows
  • 96% faster on macOS

Linux

$ hyperfine --warmup 1 \
   '~/ninja_master -C out/non_component build:buildflag_header_h' \
   '~/ninja_faster_hash -C out/non_component build:buildflag_header_h'
Benchmark 1: ~/ninja_master -C out/non_component build:buildflag_header_h
  Time (mean ± σ):      5.251 s ±  0.027 s    [User: 4.641 s, System: 0.608 s]
  Range (min … max):    5.222 s …  5.309 s    10 runs
 
Benchmark 2: ~/ninja_faster_hash -C out/non_component build:buildflag_header_h
  Time (mean ± σ):      4.689 s ±  0.022 s    [User: 4.055 s, System: 0.633 s]
  Range (min … max):    4.648 s …  4.737 s    10 runs
 
Summary
  ~/ninja_faster_hash -C out/non_component build:buildflag_header_h ran
    1.12 ± 0.01 times faster than ~/ninja_master -C out/non_component build:buildflag_header_h

Windows

Benchmark 1: C:\Users\tikuta\source\repos\ninja-build\ninja\ninja_master -C out/non_component build:buildflag_header_h
  Time (mean ± σ):     13.161 s ±  0.105 s    [User: 10.657 s, System: 2.194 s]
  Range (min … max):   13.005 s … 13.328 s    10 runs

Benchmark 2: C:\Users\tikuta\source\repos\ninja-build\ninja\ninja_faster_hash -C out/non_component build:buildflag_header_h
  Time (mean ± σ):     13.206 s ±  0.210 s    [User: 10.653 s, System: 2.194 s]
  Range (min … max):   12.951 s … 13.578 s    10 runs

Summary
  C:\Users\tikuta\source\repos\ninja-build\ninja\ninja_master -C out/non_component build:buildflag_header_h ran
    1.00 ± 0.02 times faster than C:\Users\tikuta\source\repos\ninja-build\ninja\ninja_faster_hash -C out/non_component build:buildflag_header_h

macOS

$ hyperfine --warmup 1 '~/ninja_master -C out/non_component build:buildflag_header_h' '~/ninja_faster_hash -C out/non_component build:buildflag_header_h'
Benchmark 1: ~/ninja_master -C out/non_component build:buildflag_header_h
  Time (mean ± σ):     14.360 s ±  0.706 s    [User: 9.006 s, System: 1.681 s]
  Range (min … max):   12.832 s … 15.297 s    10 runs

Benchmark 2: ~/ninja_faster_hash -C out/non_component build:buildflag_header_h
  Time (mean ± σ):      7.340 s ±  0.260 s    [User: 6.263 s, System: 0.828 s]
  Range (min … max):    6.948 s …  7.799 s    10 runs

Summary
  ~/ninja_faster_hash -C out/non_component build:buildflag_header_h ran
    1.96 ± 0.12 times faster than ~/ninja_master -C out/non_component build:buildflag_header_h

@atetubou atetubou changed the title Use parallel_hashmap for faster execution Use parallel-hashmap for faster execution Jun 24, 2024
@atetubou atetubou changed the title Use parallel-hashmap for faster execution Use greg7mdp/parallel-hashmap for faster execution Jun 24, 2024
@atetubou
Copy link
Contributor Author

@digit-google @jhasse how do you think about this PR instead of #2462?

@atetubou atetubou marked this pull request as ready for review June 24, 2024 07:12
@digit-google
Copy link
Contributor

Thanks, I have tested this CL with a local Fuchsia build on Linux, I get a consistent 2% (x1.02) speedup there.

Can you fuse all commits into a single one though? I'll add a few inline comments too.

I suggest adding a top-level .clang-format-ignore file to avoid editors to reformat third_party/*. Unfortunately, it seems that this file is ignored by git-clang-format for now, so that's not a hard-requirement.

@atetubou atetubou force-pushed the copy_parallel_hashmap branch from c1a74de to de4f281 Compare June 25, 2024 06:53
@atetubou
Copy link
Contributor Author

Can you fuse all commits into a single one though? I'll add a few inline comments too.

Done.

I suggest adding a top-level .clang-format-ignore file to avoid editors to reformat third_party/*.

Done.

@atetubou atetubou force-pushed the copy_parallel_hashmap branch from de4f281 to 52d0c4a Compare June 25, 2024 06:56
@digit-google
Copy link
Contributor

Thanks, the latest commit is much better, but please change the commit message to something informative. This is what will be submitted into the final git repository, so the current message will be useless for people looking at the git log in the future. And the github conversation here, and its description message at the top, will essentially be lost in the wind after that.

@jhasse , what do you think?

This implements option 1 in ninja-build#2462 (comment).

This uses https://github.com/greg7mdp/parallel-hashmap instead of unordered_map for faster ninja execution.
Some more context is in https://crbug.com/346910589.

This makes noop build for chromium
* 12% faster on Linux
* 2% slower on Windows
* 96% faster on macOS

```
$ hyperfine --warmup 1 \
   '~/ninja_master -C out/non_component build:buildflag_header_h' \
   '~/ninja_faster_hash -C out/non_component build:buildflag_header_h'
Benchmark 1: ~/ninja_master -C out/non_component build:buildflag_header_h
  Time (mean ± σ):      5.251 s ±  0.027 s    [User: 4.641 s, System: 0.608 s]
  Range (min … max):    5.222 s …  5.309 s    10 runs

Benchmark 2: ~/ninja_faster_hash -C out/non_component build:buildflag_header_h
  Time (mean ± σ):      4.689 s ±  0.022 s    [User: 4.055 s, System: 0.633 s]
  Range (min … max):    4.648 s …  4.737 s    10 runs

Summary
  ~/ninja_faster_hash -C out/non_component build:buildflag_header_h ran
    1.12 ± 0.01 times faster than ~/ninja_master -C out/non_component build:buildflag_header_h
```

```
Benchmark 1: C:\Users\tikuta\source\repos\ninja-build\ninja\ninja_master -C out/non_component build:buildflag_header_h
  Time (mean ± σ):     13.161 s ±  0.105 s    [User: 10.657 s, System: 2.194 s]
  Range (min … max):   13.005 s … 13.328 s    10 runs

Benchmark 2: C:\Users\tikuta\source\repos\ninja-build\ninja\ninja_faster_hash -C out/non_component build:buildflag_header_h
  Time (mean ± σ):     13.206 s ±  0.210 s    [User: 10.653 s, System: 2.194 s]
  Range (min … max):   12.951 s … 13.578 s    10 runs

Summary
  C:\Users\tikuta\source\repos\ninja-build\ninja\ninja_master -C out/non_component build:buildflag_header_h ran
    1.00 ± 0.02 times faster than C:\Users\tikuta\source\repos\ninja-build\ninja\ninja_faster_hash -C out/non_component build:buildflag_header_h
```

```
$ hyperfine --warmup 1 '~/ninja_master -C out/non_component build:buildflag_header_h' '~/ninja_faster_hash -C out/non_component build:buildflag_header_h'
Benchmark 1: ~/ninja_master -C out/non_component build:buildflag_header_h
  Time (mean ± σ):     14.360 s ±  0.706 s    [User: 9.006 s, System: 1.681 s]
  Range (min … max):   12.832 s … 15.297 s    10 runs

Benchmark 2: ~/ninja_faster_hash -C out/non_component build:buildflag_header_h
  Time (mean ± σ):      7.340 s ±  0.260 s    [User: 6.263 s, System: 0.828 s]
  Range (min … max):    6.948 s …  7.799 s    10 runs

Summary
  ~/ninja_faster_hash -C out/non_component build:buildflag_header_h ran
    1.96 ± 0.12 times faster than ~/ninja_master -C out/non_component build:buildflag_header_h
```
@atetubou atetubou force-pushed the copy_parallel_hashmap branch from 52d0c4a to d91e07c Compare July 3, 2024 08:59
@atetubou
Copy link
Contributor Author

atetubou commented Jul 3, 2024

Thank you for your review, I updated commit message.

sesse pushed a commit to sesse/ninja that referenced this pull request Nov 1, 2024
This is much faster than std::unordered_map, and also slightly faster
than phmap::flat_hash_map that was included in PR ninja-build#2468.
It is MIT-licensed, and we just include the .h file wholesale.

I haven't done a detailed test of all the various unordered_maps
out there, but this is the overall highest-ranking contender on

  https://martin.ankerl.com/2022/08/27/hashmap-bench-01/

except for ankerl::unordered_dense::map, which requires C++17.

For a no-op build of Chromium (Linux, Zen 2),
this reduces time spent from 5.14 to 4.62 seconds.
sesse pushed a commit to sesse/ninja that referenced this pull request Nov 1, 2024
This is much faster than std::unordered_map, and also slightly faster
than phmap::flat_hash_map that was included in PR ninja-build#2468.
It is MIT-licensed, and we just include the .h file wholesale.

I haven't done a detailed test of all the various unordered_maps
out there, but this is the overall highest-ranking contender on

  https://martin.ankerl.com/2022/08/27/hashmap-bench-01/

except for ankerl::unordered_dense::map, which requires C++17.

For a no-op build of Chromium (Linux, Zen 2),
this reduces time spent from 5.14 to 4.62 seconds.
sesse pushed a commit to sesse/ninja that referenced this pull request Nov 1, 2024
This is much faster than std::unordered_map, and also slightly faster
than phmap::flat_hash_map that was included in PR ninja-build#2468.
It is MIT-licensed, and we just include the .h file wholesale.

I haven't done a detailed test of all the various unordered_maps
out there, but this is the overall highest-ranking contender on

  https://martin.ankerl.com/2022/08/27/hashmap-bench-01/

except for ankerl::unordered_dense::map, which requires C++17.

For a no-op build of Chromium (Linux, Zen 2),
this reduces time spent from 5.14 to 4.62 seconds.
sesse pushed a commit to sesse/ninja that referenced this pull request Nov 4, 2024
This is much faster than std::unordered_map, and also slightly faster
than phmap::flat_hash_map that was included in PR ninja-build#2468.
It is MIT-licensed, and we just include the .h file wholesale.

I haven't done a detailed test of all the various unordered_maps
out there, but this is the overall highest-ranking contender on

  https://martin.ankerl.com/2022/08/27/hashmap-bench-01/

except for ankerl::unordered_dense::map, which requires C++17.

For a no-op build of Chromium (Linux, Zen 2),
this reduces time spent from 5.14 to 4.62 seconds.
digit-google pushed a commit to digit-google/ninja that referenced this pull request Nov 4, 2024
This is much faster than std::unordered_map, and also slightly faster
than phmap::flat_hash_map that was included in PR ninja-build#2468.
It is MIT-licensed, and we just include the .h file wholesale.

I haven't done a detailed test of all the various unordered_maps
out there, but this is the overall highest-ranking contender on

  https://martin.ankerl.com/2022/08/27/hashmap-bench-01/

except for ankerl::unordered_dense::map, which requires C++17.

For a no-op build of Chromium (Linux, Zen 2),
this reduces time spent from 5.14 to 4.62 seconds.
sesse pushed a commit to sesse/ninja that referenced this pull request Nov 4, 2024
This is much faster than std::unordered_map, and also slightly faster
than phmap::flat_hash_map that was included in PR ninja-build#2468.
It is MIT-licensed, and we just include the .h file wholesale.

I haven't done a detailed test of all the various unordered_maps
out there, but this is the overall highest-ranking contender on

  https://martin.ankerl.com/2022/08/27/hashmap-bench-01/

except for ankerl::unordered_dense::map, which requires C++17.

For a no-op build of Chromium (Linux, Zen 2),
this reduces time spent from 5.14 to 4.62 seconds.
sesse pushed a commit to sesse/ninja that referenced this pull request Nov 5, 2024
This is much faster than std::unordered_map, and also slightly faster
than phmap::flat_hash_map that was included in PR ninja-build#2468.
It is MIT-licensed, and we just include the .h file wholesale.

I haven't done a detailed test of all the various unordered_maps
out there, but this is the overall highest-ranking contender on

  https://martin.ankerl.com/2022/08/27/hashmap-bench-01/

except for ankerl::unordered_dense::map, which requires C++17.

For a no-op build of Chromium (Linux, Zen 2),
this reduces time spent from 5.14 to 4.62 seconds.
sesse pushed a commit to sesse/ninja that referenced this pull request Nov 5, 2024
This is much faster than std::unordered_map, and also slightly faster
than phmap::flat_hash_map that was included in PR ninja-build#2468.
It is MIT-licensed, and we just include the .h file wholesale.

I haven't done a detailed test of all the various unordered_maps
out there, but this is the overall highest-ranking contender on

  https://martin.ankerl.com/2022/08/27/hashmap-bench-01/

except for ankerl::unordered_dense::map, which requires C++17.

For a no-op build of Chromium (Linux, Zen 2),
this reduces time spent from 5.14 to 4.62 seconds.
@jhasse
Copy link
Collaborator

jhasse commented Nov 7, 2024

That it's slower on Windows is a bit unfortunate. I'm not sure if adding a third party hash map is worth it for a 12 % speed up - the macOS speed up is nice though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants