-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate memory allocations #98
Comments
Just FYI this is the memory allocation that happens within Julia julia> using BitInformation
julia> A = rand(Float32,100,200);
julia> @time bitinformation(A);
0.001958 seconds (281 allocations: 13.328 KiB)
julia> A = rand(Float32,1000,2000);
julia> @time bitinformation(A);
0.118632 seconds (281 allocations: 13.328 KiB) As you can see it's independent of the array size and should only contain the counter arrays (but I haven't checked as it's so small anyway). I don't know how you want to check memory allocation on the python side but in case it includes the allocations within julia this is the lower bound. |
Thanks for these numbers! @milankl, can the calculation of the information content be separated by chunks? I though that an array could easily be chunked into 1D- import xbitinfo as xb
import numpy as np
import xarray as xr
xr.set_options(display_style="text")
ds = xr.tutorial.load_dataset("eraint_uvz")
ds_selection = ds[['z']].isel(latitude=10,month=1) # selection to have only 2D data
print(ds_selection)
"""
<xarray.Dataset>
Dimensions: (level: 3, longitude: 480)
Coordinates:
* longitude (longitude) float32 -180.0 -179.2 -178.5 ... 177.8 178.5 179.2
* level (level) int32 200 500 850
Data variables:
z (level, longitude) float32 1.151e+05 1.151e+05 ... 1.359e+04
Attributes:
Conventions: CF-1.0
Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...
"""
# Apply BitInformation.jl on each `level` dimension separately
bitinfo_chunks = {}
for level in range(3):
bitinfo_chunks[level] = xb.get_bitinformation(ds_selection.isel(level=level), dim='longitude').z.values
# Combine information content across `level`s by a simple mean
bitinfo_chunks_combined = np.array([np.hstack(v) for v in bitinfo_chunks.values()]).mean(axis=0)
print(bitinfo_chunks_combined)
"""
[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0.24053964 0.27775998 0.2840053 0.83751976 0.77918835
0.71564043 0.55766442 0.37690905 0.33012151 0.08764519 0.15900722
0.30933046 0.10688834 0.21276485 0.30603788 0.06007946 0.1924336
0.0458691 0.1069354 ]
"""
# Apply BitInformation.jl on each dimensions simultanously
bitinfo_all = xb.get_bitinformation(ds_selection, dim='longitude').z.values
print(bitinfo_all)
"""
[0. 0. 0. 0. 0.91829583 0.91829583
0.91829583 0.91829583 0.91829583 0. 0.91829583 0.91829583
0. 0.8119377 0.46199519 0.94738475 0.89059076 0.79981288
0.70050295 0.51163389 0.2799572 0.33485714 0.08154634 0.05379317
0.28785182 0.07245253 0.07962509 0.30929713 0.06492267 0.11333437
0.18299123 0.14846125]
""" but the outputs are obviously not identical. Is this a bug in our wrapper, do I need to combine the field differently or is this expected? Understanding how the analysis of a dataset can be split into chunks, would be a good start for fixing this issue. |
I agree, as the python side will be xarray-aware the chunking should happen there. You then call with every chunk the
Yes, you can't average information calculated from chunks. Think about a bitstream like 00001111 for which the entropy is 1 bit. However, if you cut it into two chunks (0000, 1111) the entropy in both of them is 0. However, the biggest work is done in the bitcounting which can be called with the counter array |
Btw this here From https://github.com/JuliaPy/PyCall.jl#readme tells me that it should be possible to call Julia code without copy from python by using |
Currently, the input data to
xbitinfo.get_bitinformation
is duplicated when callingxbitinfo/xbitinfo/xbitinfo.py
Lines 185 to 186 in c91a21f
Main.X = X
is a deep copy operation. This poses in general an issue for large datasets, because a single copy of the dataset can already be too much to load into memory. Further, I observed that the memory is not freed when callingxbitinfo.get_bitinformation
again.This results into several issues/tasks:
The text was updated successfully, but these errors were encountered: