Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Soil data macrosys #1547

Open
henrykironde opened this issue Feb 26, 2021 · 26 comments
Open

Add Soil data macrosys #1547

henrykironde opened this issue Feb 26, 2021 · 26 comments

Comments

@henrykironde
Copy link
Contributor

Soil water content (volumetric %) for 33kPa and 1500kPa suctions predicted at 6 standard depths (0, 10, 30, 60, 100 and 200 cm) at 250 m resolution

source https://zenodo.org/record/2784001#.YDlJ02pKiBR
or
https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_WATERCONTENT-33KPA_USDA-4B1C_M_v01

citation: "Tomislav Hengl, & Surya Gupta. (2019). Soil water content (volumetric %) for 33kPa and 1500kPa suctions predicted at 6 standard depths (0, 10, 30, 60, 100 and 200 cm) at 250 m resolution (Version v0.1) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.2784001"

License (for files):
Creative Commons Attribution Share Alike 4.0 International

@henrykironde
Copy link
Contributor Author

@henrykironde henrykironde changed the title Add Soil water content data - macrosys Add Soil data macrosys Mar 1, 2021
@MarconiS
Copy link
Contributor

MarconiS commented Mar 1, 2021

Here is the link to the Zenodo archive for all derived datasets of global soil properties (0.065km2 spatial resolution)

@Aakash3101
Copy link
Contributor

Are these datasets added to scripts in retriever-recipes ? If not then I would like to solve this issue.

@henrykironde
Copy link
Contributor Author

@Aakash3101 feel free to work on the issue. Recommend that you start from down to up

@Aakash3101
Copy link
Contributor

Sure @henrykironde

@Aakash3101
Copy link
Contributor

@henrykironde I wanted to clear a doubt, In the last dataset "Soil available water capacity in mm derived for 5 standard layers", I can make a single script for all the files in the dataset, right? The dataset has 7 files so when I run retriever autocreate I can have all the files in the same directory ?

@Aakash3101
Copy link
Contributor

Also shall I make separate commits for each dataset or a combined commit?

@henrykironde
Copy link
Contributor Author

I can make a single script for all the files in the dataset, right? The dataset has 7 files so when I run retriever autocreate I can have all the files in the same directory ?

Yes all the files in the same directory. In this case, I think a fitting name for the directory would be Soil_available_water_capacity

@Aakash3101
Copy link
Contributor

Aakash3101 commented Mar 24, 2021

@henrykironde I think this PR can be completed during my GSOC project, if I get selected, Because these files are very big indeed 😂, and I might take time to check each one, and then make a PR for the dataset added.

@henrykironde
Copy link
Contributor Author

Each checkbox is a single PR, I am actually working on them so do worry about the whole issue. Your goal should be to understand or get a good overview of the moving parts in the project.

@Aakash3101
Copy link
Contributor

Each checkbox is a single PR, I am actually working on them so do worry about the whole issue. Your goal should be to understand or get a good overview of the moving parts in the project.

Yes, actually I am enjoying doing this kind of work as I am learning new things.

@Aakash3101
Copy link
Contributor

@henrykironde I am not able to load the .tif files into postgresql. There is some kind of limitation of size for raster2pgsql to work efficiently. raster2pgsql works completely fine with small files, but it is just stuck when I run it for the big files which are around 3 to 4 GB.

@henrykironde
Copy link
Contributor Author

I will check this out

@Aakash3101
Copy link
Contributor

I will check this out

Well I am also figuring out something, and it turns out that the tile size can impact the processing time. In the code for the install command the tile size is 100x100, and when I tried for tile size 2000x2000, the file was saved in the database, but I cannot view it in QGIS. Both pgadmin4 and DB manager in QGIS show that the table does have raster values.

@Aakash3101
Copy link
Contributor

I will check this out

Any updates @henrykironde? To me, it seems that when a tile size of 100x100 is used, a lot of rows will be generated with the tile size.
For example, the size of this file is 172800x71698

aakash01@aakash01-G3-3579:~/.retriever/raw_data/soil-available-water-capacity $ gdalinfo sol_available.water.capacity_usda.mm_m_250m_30..60cm_1950..2017_v0.1.tif 
Driver: GTiff/GeoTIFF
Files: sol_available.water.capacity_usda.mm_m_250m_30..60cm_1950..2017_v0.1.tif
Size is 172800, 71698
Coordinate System is:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Origin = (-180.000000000000000,87.370000000000005)
Pixel Size = (0.002083333000000,-0.002083333000000)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=DEFLATE
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (-180.0000000,  87.3700000) (180d 0' 0.00"W, 87d22'12.00"N)
Lower Left  (-180.0000000, -62.0008094) (180d 0' 0.00"W, 62d 0' 2.91"S)
Upper Right ( 179.9999424,  87.3700000) (179d59'59.79"E, 87d22'12.00"N)
Lower Right ( 179.9999424, -62.0008094) (179d59'59.79"E, 62d 0' 2.91"S)
Center      (  -0.0000288,  12.6845953) (  0d 0' 0.10"W, 12d41' 4.54"N)
Band 1 Block=172800x1 Type=Int16, ColorInterp=Gray
  NoData Value=-32768
  Overviews: 86400x35849, 43200x17925, 21600x8963, 10800x4482, 5400x2241, 2700x1121, 1350x561

When I run the raster2pgsql command for a tile size of 100x100, it takes an indefinite time to process, while for tile sizes 2000x2000 or 5000x5000 it takes about 40 mins - 1hour. But the problem is when I try to view the raster through QGIS it seems to add the layer to the canvas and then it crashes after 10 mins or so.

One another way to deal with this processing time issue is that if we reference the file to the database using the -R flag of raster2pgsql command, using this flag only the reference will be stored and not the raster data into the database.

But this will impact the reason why we are storing it in the database in the first place because if the file is moved from the destination it should be in, the reference would not work. I had the idea for the -R flag because since the raw data is downloaded when you first install the dataset, and it does not get deleted, so if we reference the data, it would save the user some storage on the system.

@henrykironde
Copy link
Contributor Author

@Aakash3101 what are your computational resources?

@Aakash3101
Copy link
Contributor

@Aakash3101 what are your computational resources?

CPU : i7 8th Gen
GPU: GeForce GTX 1050 Ti
Ram: 8GB DDR4
GPU Ram: 4GB
OS: Ubuntu 20.04 LTS

@henrykironde
Copy link
Contributor Author

Could you try to close other applications(especially IDEs), open QGIS and try to load the map. I will try it later from my end.
Give it a few minutes to render.

@Aakash3101
Copy link
Contributor

I can load and view the map from the raw data file, but not from the PostGIS database.

@henrykironde
Copy link
Contributor Author

henrykironde commented Mar 29, 2021

Yes load the data from PostGIS database and give it at least 10 minutes based on your resources. Make sure we free at least 4 gb of memory. Most Ides will take about 2gb. Closing them will enable QGIS load the data

@Aakash3101
Copy link
Contributor

Okay, I will let you know if it opens.

@Aakash3101
Copy link
Contributor

So this time while loading the file in QGIS, I monitored my RAM usage through the terminal and it uses all my memory. And then the application is terminated. I don't know the reasons, but I will soon find out.

@Aakash3101
Copy link
Contributor

Aakash3101 commented Mar 29, 2021

And when I open the raw data file, it uses just around 2GB of my RAM.
I think that the memory usage is caused by PostGIS in the background by running queries or something.

@Aakash3101
Copy link
Contributor

When I query the table in pgadmin4 to show all the values in the table, postgres uses all the RAM, and then it freezes, so I think I need to optimize the memory available for queries. Please let me know if you find something useful to optimize the memory usage.

@henrykironde
Copy link
Contributor Author

Okey I think at this point you should let me handle this. It could take at least one day or two. I will try to find a way around.
This is at a good point/phase. I will update you. I need to finish up with some other spatial datasets first

@Aakash3101
Copy link
Contributor

Sure @henrykironde

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants