Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reductions with FieldTimeSeries on an ImmersedBoundaryGrid are very slow #3750

Closed
ali-ramadhan opened this issue Aug 30, 2024 · 6 comments · Fixed by #3801 · May be fixed by #3794
Closed

Reductions with FieldTimeSeries on an ImmersedBoundaryGrid are very slow #3750

ali-ramadhan opened this issue Aug 30, 2024 · 6 comments · Fixed by #3801 · May be fixed by #3794
Labels
output 💾 performance 🏍️ So we can get the wrong answer even faster

Comments

@ali-ramadhan
Copy link
Member

ali-ramadhan commented Aug 30, 2024

This also makes data_summary and Base.show very slow since showing a FieldTimeSeries prints its min, mean, and max. So it's harder to work with FieldTimeSeries interactively. Seems fine when not on a ImmersedBoundaryGrid.

I'm guessing it's slower because it's masking out the immersed values but I don't know if we expect it to be ~2000x slower than without an immersed boundary. It's those memory allocations...

A quick quality-of-life fix could be to not call data_summary when showing a FieldTimeSeries.

MWE

using Oceananigans

arch = CPU()

L = 1
H = 1

underlying_grid = LatitudeLongitudeGrid(
    arch;
    topology = (Bounded, Bounded, Bounded),
    size = (512, 512, 64),
    latitude = (-L/2, L/2),
    longitude = (-L/2, L/2),
    z = (-H, 0),
    halo = (4, 4, 4)
)

h = L/2
w = L/5 
mount(x, y) = h * exp(-x^2 / 2w^2) * exp(-y^2 / 2w^2)
bottom(x, y) = -H + mount(x, y)

grid = ImmersedBoundaryGrid(underlying_grid, GridFittedBottom(bottom))

model = HydrostaticFreeSurfaceModel(; grid)

simulation = Simulation(model, Δt=1, stop_iteration=1)

simulation.output_writers[:fields] =
    JLD2OutputWriter(
        model,
        model.velocities;
        filename = "test.jld2",
        schedule = IterationInterval(1),
        overwrite_existing = true
    )

run!(simulation)

u = FieldTimeSeries("test.jld2", "u")
u2 = u[2]

Reduction over the FieldTimeSeries:

julia> @time minimum(u2)
 20.954897 seconds (118.72 M allocations: 130.792 GiB, 25.74% gc time)
0.0

Reduction over the underlying data:

julia> @time minimum(u2.data)
  0.011304 seconds (3 allocations: 1.562 KiB)
0.0

or almost 2000x faster.

Environment

Oceananigans.jl main branch with

Julia Version 1.10.5
Commit 6f3fdf7b362 (2024-08-27 14:19 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 48 × AMD Ryzen Threadripper 7960X 24-Cores
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 48 virtual cores)
@ali-ramadhan
Copy link
Member Author

ali-ramadhan commented Aug 30, 2024

I thought this might be an issue with Fields themselves, but no:

using Oceananigans

arch = CPU()

L = 1
H = 1

underlying_grid = LatitudeLongitudeGrid(
    arch;
    topology = (Bounded, Bounded, Bounded),
    size = (512, 512, 64),
    latitude = (-L/2, L/2),
    longitude = (-L/2, L/2),
    z = (-H, 0),
    halo = (4, 4, 4)
)

h = L/2
w = L/5 
mount(x, y) = h * exp(-x^2 / 2w^2) * exp(-y^2 / 2w^2)
bottom(x, y) = -H + mount(x, y)

grid = ImmersedBoundaryGrid(underlying_grid, GridFittedBottom(bottom))

model = HydrostaticFreeSurfaceModel(; grid)

u = model.velocities.u

then

julia> @time minimum(u)
  0.063563 seconds (344 allocations: 31.789 KiB)
0.0

julia> @time minimum(u.data)
  0.013262 seconds (3 allocations: 1.391 KiB)
0.0

Only ~6x slower.

So maybe the FieldTimeSeries is creating the field from u[2] differently?

I haven't looked into the code much but wanted to at least open this issue.

@ali-ramadhan ali-ramadhan added performance 🏍️ So we can get the wrong answer even faster output 💾 labels Aug 30, 2024
@simone-silvestri
Copy link
Collaborator

simone-silvestri commented Aug 30, 2024

Reductions on FieldTimeSeries are performed individually for each element by constructing two Fields and reducing one into another. Probably, the construction of the individual field is what is causing the loss in performance?
We do not necessarily need to do that, we can just wrap the data in a ConditionalOperation.

@glwagner
Copy link
Member

Just a thought that we probably want reductions on field time series to be performant anyways. So it's better that we call data summary because then more people will be annoyed that they are slow => more pressure to fix it 😆

@glwagner
Copy link
Member

PS there are more minimal ways to create a FieldTimeSeries by invoking the constructor directly.

@glwagner
Copy link
Member

glwagner commented Sep 26, 2024

I'm guessing it's slower because it's masking out the immersed values but I don't know if we expect it to be ~2000x slower than without an immersed boundary. It's those memory allocations...

But why are there memory allocations when calling minimum(u2)? This is a type inference problem I think.

The problem is actually computing reductions of windowed immersed fields. u2 is windowed because halos were excluded from the output. If we write with_halos = true and re-run everything, there is no issue:

julia> @time minimum(u2)
  0.031018 seconds (366 allocations: 33.102 KiB, 6.05% compilation time)
0.0

@glwagner
Copy link
Member

Here's an illustration:

using Oceananigans

Nx, Ny, Nz = 100, 100, 100
latitude = longitude = z = (0, 1)
underlying_grid = LatitudeLongitudeGrid(size=(Nx, Ny, Nz); latitude, longitude, z)
grid = ImmersedBoundaryGrid(underlying_grid, GridFittedBottom((λ, φ) -> 0.5))

ci = CenterField(grid)
ciw = view(ci, 1:Nx, 1:Ny, 1:Nz)

cu = CenterField(underlying_grid)
cuw = view(cu, 1:Nx, 1:Ny, 1:Nz)

for n = 1:10
    @time minimum(ci)
    @time minimum(ciw)
    @time minimum(cu)
    @time minimum(cuw)
end

Note there is such a concept as "stubborn compilation" so we sometimes have to invoke functions a few times to get them to compile...

Now I get:

julia> @time minimum(ci)
  0.000888 seconds (331 allocations: 33.148 KiB)
0.0

julia> @time minimum(ciw)
  1.611260 seconds (7.27 M allocations: 7.968 GiB, 37.23% gc time)
0.0

julia> @time minimum(cu)
  0.001069 seconds (387 allocations: 21.586 KiB)
0.0

julia> @time minimum(cuw)
  0.001060 seconds (686 allocations: 33.258 KiB)
0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
output 💾 performance 🏍️ So we can get the wrong answer even faster
Projects
None yet
3 participants