Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproject on larger dataset #189

Closed
joemcglinchy opened this issue Nov 16, 2020 · 4 comments
Closed

reproject on larger dataset #189

joemcglinchy opened this issue Nov 16, 2020 · 4 comments
Labels
question Further information is requested

Comments

@joemcglinchy
Copy link

joemcglinchy commented Nov 16, 2020

Hello, I'm trying to use the reprojection capability on a global raster dataset. I can read the data no problem, and it is of shape (band: 1, y: 43200, x: 86400). The data is located here and is public, so the code below should load the data for anyone: https://earthlab-jmcglinchy.s3-us-west-2.amazonaws.com/for_public/lc_mosaic_2003.tif

it seems like the attempt for a numpy array of same dimensions is causing the memory error.

Code Sample, a copy-pastable example if possible

A "Minimal, Complete and Verifiable Example" will make it much easier for maintainers to help you:
http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

import rioxarray as rx

lulc = rx.open_rasterio('/vsis3/earthlab-jmcglinchy/for_public/lc_mosaic_2003.tif')
lulc.rio.reproject("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")

Problem description

I get the following traceback:

C:\software\Anaconda3\envs\x-python\lib\site-packages\rioxarray\rioxarray.py:270: UserWarning: The nodata value (-3.4e+38) has been automatically changed to (-3.3999999521443642e+38) to match the dtype of the data.
  f"The nodata value ({original_nodata}) has been automatically "
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\software\Anaconda3\envs\x-python\lib\site-packages\rioxarray\rioxarray.py", line 1194, in reproject
    source=self._obj.values,
  File "C:\software\Anaconda3\envs\x-python\lib\site-packages\xarray\core\dataarray.py", line 569, in values
    return self.variable.values
  File "C:\software\Anaconda3\envs\x-python\lib\site-packages\xarray\core\variable.py", line 510, in values
    return _as_array_or_item(self._data)
  File "C:\software\Anaconda3\envs\x-python\lib\site-packages\xarray\core\variable.py", line 272, in _as_array_or_item
    data = np.asarray(data)
  File "C:\software\Anaconda3\envs\x-python\lib\site-packages\numpy\core\_asarray.py", line 83, in asarray
    return array(a, dtype, copy=False, order=order)
  File "C:\software\Anaconda3\envs\x-python\lib\site-packages\xarray\core\indexing.py", line 685, in __array__
    self._ensure_cached()
  File "C:\software\Anaconda3\envs\x-python\lib\site-packages\xarray\core\indexing.py", line 682, in _ensure_cached
    self.array = NumpyIndexingAdapter(np.asarray(self.array))
  File "C:\software\Anaconda3\envs\x-python\lib\site-packages\numpy\core\_asarray.py", line 83, in asarray
    return array(a, dtype, copy=False, order=order)
  File "C:\software\Anaconda3\envs\x-python\lib\site-packages\xarray\core\indexing.py", line 655, in __array__
    return np.asarray(self.array, dtype=dtype)
  File "C:\software\Anaconda3\envs\x-python\lib\site-packages\numpy\core\_asarray.py", line 83, in asarray
    return array(a, dtype, copy=False, order=order)
  File "C:\software\Anaconda3\envs\x-python\lib\site-packages\xarray\core\indexing.py", line 560, in __array__
    return np.asarray(array[self.key], dtype=None)
  File "C:\software\Anaconda3\envs\x-python\lib\site-packages\rioxarray\_io.py", line 177, in __getitem__
    key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
  File "C:\software\Anaconda3\envs\x-python\lib\site-packages\xarray\core\indexing.py", line 845, in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)
  File "C:\software\Anaconda3\envs\x-python\lib\site-packages\rioxarray\_io.py", line 160, in _getitem
    out = riods.read(band_key, window=window, masked=self.masked)
  File "rasterio\_io.pyx", line 337, in rasterio._io.DatasetReaderBase.read
MemoryError: Unable to allocate 13.9 GiB for an array with shape (1, 43200, 86400) and data type float32

Expected Output

the xarray dataset projected to epsg 4326, from epsg 9001

Environment Information

rioxarray (0.1.1) deps:
rasterio: 1.1.7
xarray: 0.16.1
GDAL: 3.1.4

Other python deps:
scipy: 1.5.2
pyproj: 2.6.1.post1

System:
python: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 01:53:57) [MSC v.1916 64 bit (AMD64)]
executable: C:\software\Anaconda3\envs\x-python\python.exe
machine: Windows-10-10.0.18362-SP0

Installation method

conda install -c conda-forge rioxarray

Conda environment information (if you installed with conda):


Environment (conda list):
gdal                      3.1.4            py37hebdd5d2_0    conda-forge
libgdal                   3.1.4                h0e5aa5a_0    conda-forge
rasterio                  1.1.7            py37hce843d0_1    conda-forge
rioxarray                 0.1.1              pyhd8ed1ab_0    conda-forge
sklearn-xarray            0.4.0                    pypi_0    pypi
xarray                    0.16.1                     py_0    conda-forge



Details about conda and system ( conda info ):
active environment : x-python
    active env location : C:\software\Anaconda3\envs\x-python
            shell level : 1
       user config file : C:\Users\joe\.condarc
 populated config files : C:\Users\joe\.condarc
          conda version : 4.8.1
    conda-build version : 3.17.6
         python version : 3.7.1.final.0
       virtual packages : __cuda=11.0
       base environment : C:\software\Anaconda3  (writable)
           channel URLs : https://conda.anaconda.org/rios/win-64
                          https://conda.anaconda.org/rios/noarch
                          https://conda.anaconda.org/conda-forge/win-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://repo.anaconda.com/pkgs/main/win-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/win-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/msys2/win-64
                          https://repo.anaconda.com/pkgs/msys2/noarch
          package cache : C:\software\Anaconda3\pkgs
                          C:\Users\joe\.conda\pkgs
                          C:\Users\joe\AppData\Local\conda\conda\pkgs
       envs directories : C:\software\Anaconda3\envs
                          C:\Users\joe\.conda\envs
                          C:\Users\joe\AppData\Local\conda\conda\envs
               platform : win-64
             user-agent : conda/4.8.1 requests/2.21.0 CPython/3.7.1 Windows/10 Windows/10.0.18362
          administrator : False
             netrc file : None
           offline mode : False

@joemcglinchy joemcglinchy added the bug Something isn't working label Nov 16, 2020
@snowman2 snowman2 added question Further information is requested and removed bug Something isn't working labels Nov 16, 2020
@snowman2
Copy link
Member

I think using a WarpedVRT when loading the data is your best option. Example: #119 (comment)

@joemcglinchy
Copy link
Author

interesting! thanks @snowman2 I'll give that a shot

@joemcglinchy
Copy link
Author

yes this seems to have worked. thanks again!

@snowman2
Copy link
Member

Glad to hear that it worked 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants