Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use FastBroadcast.jl #143

Merged
merged 7 commits into from
Aug 19, 2024
Merged

Use FastBroadcast.jl #143

merged 7 commits into from
Aug 19, 2024

Conversation

ranocha
Copy link
Collaborator

@ranocha ranocha commented Aug 18, 2024

Closes #126

Some performance benchmarks similar to #135 (comment)

Before

julia> trixi_include("examples/hyperbolic_serre_green_naghdi_1d/hyperbolic_serre_green_naghdi_conservation.jl", io = devnull, N = 1_000);
[ Info: You just called `trixi_include`. Julia may now compile the code, please be patient.

julia> @benchmark solve(ode, RDPK3SpFSAL35(); save_everystep = false, callback = callbacks)
BenchmarkTools.Trial: 23 samples with 1 evaluation.
 Range (min  max):  206.044 ms  398.889 ms  ┊ GC (min  max): 0.00%  46.52%
 Time  (median):     210.281 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   218.234 ms ±  39.461 ms  ┊ GC (mean ± σ):  4.12% ±  9.64%

  █▄▁
  ███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁
  206 ms           Histogram: frequency by time          399 ms <

 Memory estimate: 55.27 MiB, allocs estimate: 1586311.

julia> trixi_include("examples/hyperbolic_serre_green_naghdi_1d/hyperbolic_serre_green_naghdi_conservation.jl", io = devnull, N = 2_000);
[ Info: You just called `trixi_include`. Julia may now compile the code, please be patient.

julia> @benchmark solve(ode, RDPK3SpFSAL35(); save_everystep = false, callback = callbacks)
BenchmarkTools.Trial: 8 samples with 1 evaluation.
 Range (min  max):  644.031 ms  833.443 ms  ┊ GC (min  max): 0.66%  21.71%
 Time  (median):     649.244 ms               ┊ GC (median):    0.66%
 Time  (mean ± σ):   671.120 ms ±  65.665 ms  ┊ GC (mean ± σ):  3.84% ±  7.50%

  █ █
  █▇█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
  644 ms           Histogram: frequency by time          833 ms <

 Memory estimate: 172.13 MiB, allocs estimate: 5050532.

julia> trixi_include("examples/hyperbolic_serre_green_naghdi_1d/hyperbolic_serre_green_naghdi_conservation.jl", io = devnull, N = 3_000);
[ Info: You just called `trixi_include`. Julia may now compile the code, please be patient.

julia> @benchmark solve(ode, RDPK3SpFSAL35(); save_everystep = false, callback = callbacks)
BenchmarkTools.Trial: 4 samples with 1 evaluation.
 Range (min  max):  1.433 s   1.438 s  ┊ GC (min  max): 1.18%  1.19%
 Time  (median):     1.435 s             ┊ GC (median):    1.03%
 Time  (mean ± σ):   1.435 s ± 2.288 ms  ┊ GC (mean ± σ):  1.01% ± 0.21%

  █                 █                           █        █
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁█ ▁
  1.43 s        Histogram: frequency by time        1.44 s <

 Memory estimate: 377.09 MiB, allocs estimate: 11179547.

julia> trixi_include("examples/hyperbolic_serre_green_naghdi_1d/hyperbolic_serre_green_naghdi_conservation.jl", io = devnull, N = 4_000);
[ Info: You just called `trixi_include`. Julia may now compile the code, please be patient.

julia> @benchmark solve(ode, RDPK3SpFSAL35(); save_everystep = false, callback = callbacks)
BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min  max):  2.512 s   2.521 s  ┊ GC (min  max): 0.97%  0.91%
 Time  (median):     2.517 s             ┊ GC (median):    0.94%
 Time  (mean ± σ):   2.517 s ± 6.450 ms  ┊ GC (mean ± σ):  0.94% ± 0.04%

  █                                                      █
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  2.51 s        Histogram: frequency by time        2.52 s <

 Memory estimate: 659.02 MiB, allocs estimate: 19624765.

julia> trixi_include("examples/hyperbolic_serre_green_naghdi_1d/hyperbolic_serre_green_naghdi_conservation.jl", io = devnull, N = 5_000);
[ Info: You just called `trixi_include`. Julia may now compile the code, please be patient.

julia> @benchmark solve(ode, RDPK3SpFSAL35(); save_everystep = false, callback = callbacks)
BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min  max):  4.702 s   4.712 s  ┊ GC (min  max): 0.75%  0.81%
 Time  (median):     4.707 s             ┊ GC (median):    0.78%
 Time  (mean ± σ):   4.707 s ± 7.052 ms  ┊ GC (mean ± σ):  0.78% ± 0.04%

  █                                                      █
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  4.7 s         Histogram: frequency by time        4.71 s <

 Memory estimate: 1023.24 MiB, allocs estimate: 30531275.

With FastBroadcast.jl:

julia> trixi_include("examples/hyperbolic_serre_green_naghdi_1d/hyperbolic_serre_green_naghdi_conservation.jl", io = devnull, N = 1_000);
[ Info: You just called `trixi_include`. Julia may now compile the code, please be patient.

julia> @benchmark solve(ode, RDPK3SpFSAL35(); save_everystep = false, callback = callbacks)
BenchmarkTools.Trial: 30 samples with 1 evaluation.
 Range (min  max):  162.611 ms  310.515 ms  ┊ GC (min  max): 1.16%  46.98%
 Time  (median):     165.715 ms               ┊ GC (median):    1.18%
 Time  (mean ± σ):   170.553 ms ±  26.495 ms  ┊ GC (mean ± σ):  3.71% ±  8.43%

  ██▃
  ███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁
  163 ms           Histogram: frequency by time          311 ms <

 Memory estimate: 54.14 MiB, allocs estimate: 1566822.

julia> trixi_include("examples/hyperbolic_serre_green_naghdi_1d/hyperbolic_serre_green_naghdi_conservation.jl", io = devnull, N = 2_000);
[ Info: You just called `trixi_include`. Julia may now compile the code, please be patient.

julia> @benchmark solve(ode, RDPK3SpFSAL35(); save_everystep = false, callback = callbacks)
BenchmarkTools.Trial: 10 samples with 1 evaluation.
 Range (min  max):  505.914 ms  529.077 ms  ┊ GC (min  max): 1.19%  1.18%
 Time  (median):     511.600 ms               ┊ GC (median):    1.18%
 Time  (mean ± σ):   514.138 ms ±   7.899 ms  ┊ GC (mean ± σ):  1.12% ± 0.14%

  █  █  █     █ ██     █     █                          █     █
  █▁▁█▁▁█▁▁▁▁▁█▁██▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁█ ▁
  506 ms           Histogram: frequency by time          529 ms <

 Memory estimate: 170.31 MiB, allocs estimate: 5018991.

julia> trixi_include("examples/hyperbolic_serre_green_naghdi_1d/hyperbolic_serre_green_naghdi_conservation.jl", io = devnull, N = 3_000);
[ Info: You just called `trixi_include`. Julia may now compile the code, please be patient.

julia> @benchmark solve(ode, RDPK3SpFSAL35(); save_everystep = false, callback = callbacks)
BenchmarkTools.Trial: 5 samples with 1 evaluation.
 Range (min  max):  1.124 s   1.145 s  ┊ GC (min  max): 1.08%  1.02%
 Time  (median):     1.128 s             ┊ GC (median):    1.07%
 Time  (mean ± σ):   1.131 s ± 8.178 ms  ┊ GC (mean ± σ):  1.10% ± 0.09%

  █      █   █   █                                       █
  █▁▁▁▁▁▁█▁▁▁█▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.12 s        Histogram: frequency by time        1.14 s <

 Memory estimate: 374.37 MiB, allocs estimate: 11132609.

julia> trixi_include("examples/hyperbolic_serre_green_naghdi_1d/hyperbolic_serre_green_naghdi_conservation.jl", io = devnull, N = 4_000);
[ Info: You just called `trixi_include`. Julia may now compile the code, please be patient.

julia> @benchmark solve(ode, RDPK3SpFSAL35(); save_everystep = false, callback = callbacks)
BenchmarkTools.Trial: 3 samples with 1 evaluation.
 Range (min  max):  1.978 s   1.990 s  ┊ GC (min  max): 1.01%  1.08%
 Time  (median):     1.988 s             ┊ GC (median):    1.08%
 Time  (mean ± σ):   1.985 s ± 6.361 ms  ┊ GC (mean ± σ):  1.06% ± 0.05%

  █                                          █           █
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.98 s        Histogram: frequency by time        1.99 s <

 Memory estimate: 655.46 MiB, allocs estimate: 19563351.

julia> trixi_include("examples/hyperbolic_serre_green_naghdi_1d/hyperbolic_serre_green_naghdi_conservation.jl", io = devnull, N = 5_000);
[ Info: You just called `trixi_include`. Julia may now compile the code, please be patient.

julia> @benchmark solve(ode, RDPK3SpFSAL35(); save_everystep = false, callback = callbacks)
BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min  max):  3.157 s    3.180 s  ┊ GC (min  max): 1.05%  1.06%
 Time  (median):     3.168 s              ┊ GC (median):    1.06%
 Time  (mean ± σ):   3.168 s ± 15.603 ms  ┊ GC (mean ± σ):  1.06% ± 0.01%

  █                                                       █
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  3.16 s         Histogram: frequency by time        3.18 s <

 Memory estimate: 1018.79 MiB, allocs estimate: 30454484.

To get values that are closer to our paper, we have to omit the analysis callback (which allocates quite a lot):

julia> trixi_include("examples/hyperbolic_serre_green_naghdi_1d/hyperbolic_serre_green_naghdi_conservation.jl", io = devnull, N = 1_000);
[ Info: You just called `trixi_include`. Julia may now compile the code, please be patient.

julia> @benchmark solve(ode, RDPK3SpFSAL35(); save_everystep = false)
BenchmarkTools.Trial: 63 samples with 1 evaluation.
 Range (min  max):  76.745 ms  83.645 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     80.823 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   80.531 ms ±  1.448 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                 █      ▂ ▂▅▂ ▂    ▂
  ▅▁▁▅▁▁▁▁▁▁█▅▅▁▅▁▁▅▁█▁▅▅▁▅▁▁█▅███▅█▁▅█▅█████▁██▅▅██▁▁▁▁▁▁▅▅▅ ▁
  76.7 ms         Histogram: frequency by time        83.1 ms <

 Memory estimate: 823.79 KiB, allocs estimate: 16682.

julia> trixi_include("examples/hyperbolic_serre_green_naghdi_1d/hyperbolic_serre_green_naghdi_conservation.jl", io = devnull, N = 2_000);
[ Info: You just called `trixi_include`. Julia may now compile the code, please be patient.

julia> @benchmark solve(ode, RDPK3SpFSAL35(); save_everystep = false)
BenchmarkTools.Trial: 21 samples with 1 evaluation.
 Range (min  max):  240.631 ms  247.424 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     243.653 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   243.883 ms ±   1.989 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁    ▁ ▁  ▁▁     ▁ ▁▁▁    ▁█  ▁    ▁   ▁      ▁█  ▁        ▁▁
  █▁▁▁▁█▁█▁▁██▁▁▁▁▁█▁███▁▁▁▁██▁▁█▁▁▁▁█▁▁▁█▁▁▁▁▁▁██▁▁█▁▁▁▁▁▁▁▁██ ▁
  241 ms           Histogram: frequency by time          247 ms <

 Memory estimate: 1.47 MiB, allocs estimate: 27287.

julia> trixi_include("examples/hyperbolic_serre_green_naghdi_1d/hyperbolic_serre_green_naghdi_conservation.jl", io = devnull, N = 3_000);
[ Info: You just called `trixi_include`. Julia may now compile the code, please be patient.

julia> @benchmark solve(ode, RDPK3SpFSAL35(); save_everystep = false)
BenchmarkTools.Trial: 10 samples with 1 evaluation.
 Range (min  max):  542.508 ms  555.255 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     547.203 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   547.475 ms ±   3.929 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █  █    █   █     █       █      █ █ █                      █
  █▁▁█▁▁▁▁█▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁█▁▁▁▁▁▁█▁█▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  543 ms           Histogram: frequency by time          555 ms <

 Memory estimate: 2.19 MiB, allocs estimate: 40682.

julia> trixi_include("examples/hyperbolic_serre_green_naghdi_1d/hyperbolic_serre_green_naghdi_conservation.jl", io = devnull, N = 4_000);
[ Info: You just called `trixi_include`. Julia may now compile the code, please be patient.

julia> @benchmark solve(ode, RDPK3SpFSAL35(); save_everystep = false)
BenchmarkTools.Trial: 6 samples with 1 evaluation.
 Range (min  max):  951.352 ms  964.108 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     955.896 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   957.041 ms ±   5.164 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █     █        █            █                      █        █
  █▁▁▁▁▁█▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁█ ▁
  951 ms           Histogram: frequency by time          964 ms <

 Memory estimate: 2.88 MiB, allocs estimate: 52925.

julia> trixi_include("examples/hyperbolic_serre_green_naghdi_1d/hyperbolic_serre_green_naghdi_conservation.jl", io = devnull, N = 5_000);
[ Info: You just called `trixi_include`. Julia may now compile the code, please be patient.

julia> @benchmark solve(ode, RDPK3SpFSAL35(); save_everystep = false)
BenchmarkTools.Trial: 4 samples with 1 evaluation.
 Range (min  max):  1.585 s    1.625 s  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.591 s              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.598 s ± 18.259 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █ █          █                                          █
  █▁█▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.59 s         Histogram: frequency by time        1.62 s <

 Memory estimate: 3.60 MiB, allocs estimate: 66113.

However, this is still significantly slower (between 5% and 25%).

@ranocha ranocha requested a review from JoshuaLampert August 18, 2024 17:46
@ranocha
Copy link
Collaborator Author

ranocha commented Aug 18, 2024

Simulations could be much faster if SciML/RecursiveArrayTools.jl#400 was resolved...

@ranocha
Copy link
Collaborator Author

ranocha commented Aug 18, 2024

@JoshuaLampert Do you know why you set the lower compat of RecursiveArrayTools.jl to 0.3.3 in #118? This is involved in compatibility issues on Julia 1.9...

Copy link

codecov bot commented Aug 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.76%. Comparing base (36c8f28) to head (cf9f159).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #143   +/-   ##
=======================================
  Coverage   97.75%   97.76%           
=======================================
  Files          22       23    +1     
  Lines        1741     1747    +6     
=======================================
+ Hits         1702     1708    +6     
  Misses         39       39           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JoshuaLampert
Copy link
Owner

JoshuaLampert commented Aug 18, 2024

Thanks a lot! The performance improvement already looks quite nice.

To get values that are closer to our paper, we have to omit the analysis callback (which allocates quite a lot):

Can you appraise whether the AnalysisCallback needs to allocate so much or do you think we can optimize it (not necessarily in this PR)?

However, this is still significantly slower (between 5% and 25%).

Do you have an idea where we loose the remaining 5 to 25%?

Do you recommend using FastBroadcast.jl for the other equations, too? Probably the performance gain would not be too high because the solving the elliptic equations is the most expensive, but I take what I can get. If you think it could help, I could do that in a future PR.

@JoshuaLampert Do you know why you set the lower compat of RecursiveArrayTools.jl to 0.3.3 in #118? This is involved in compatibility issues on Julia 1.9...

I don't remember in detail, but judging from the commit history in that PR and the respective CI runs, it looks like it was because RecursiveArrays.jl >v3.3 doesn't support Julia v1.9. This comes from SciML/RecursiveArrayTools.jl#324, which was included in v3.4 of RecursiveArrays.jl.

@JoshuaLampert
Copy link
Owner

JoshuaLampert commented Aug 18, 2024

I don't remember in detail, but judging from the commit history in that PR and the respective CI runs, it looks like it was because RecursiveArrays.jl >v3.3 doesn't support Julia v1.9. This comes from SciML/RecursiveArrayTools.jl#324, which was included in v3.4 of RecursiveArrays.jl.

This, of course, only explains why I did't choose it to be higher. I don't remember why I didn't choose it to be lower.

Project.toml Outdated Show resolved Hide resolved
@ranocha
Copy link
Collaborator Author

ranocha commented Aug 19, 2024

To get values that are closer to our paper, we have to omit the analysis callback (which allocates quite a lot):

Can you appraise whether the AnalysisCallback needs to allocate so much or do you think we can optimize it (not necessarily in this PR)?

At least the modified energy/entropy allocates a full space vector every time it is called

@ranocha
Copy link
Collaborator Author

ranocha commented Aug 19, 2024

Do you have an idea where we loose the remaining 5 to 25%?

We compute the derivative of the bathymetry b_x from scratch every time. Moreover, we have some additional overhead from storing eta instead of h and D instead of b etc. Maybe we could get rid of some of the overhead, but we would have to analyze the split form in terms of the variables we use in this repo

@ranocha
Copy link
Collaborator Author

ranocha commented Aug 19, 2024

Do you recommend using FastBroadcast.jl for the other equations, too? Probably the performance gain would not be too high because the solving the elliptic equations is the most expensive, but I take what I can get. If you think it could help, I could do that in a future PR.

As you said, I expect the elliptic solve(s) to dominate significantly for the other equations. Maybe it could help a bit but I would not expect significant performance gains like here

Co-authored-by: Joshua Lampert <[email protected]>
Copy link
Owner

@JoshuaLampert JoshuaLampert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@JoshuaLampert JoshuaLampert merged commit 99583ab into main Aug 19, 2024
17 checks passed
@JoshuaLampert JoshuaLampert deleted the hr/fast_broadcast branch August 19, 2024 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider using FastBroadcast.jl
2 participants