-
Notifications
You must be signed in to change notification settings - Fork 39
Data Storage Protocol
John Brandt edited this page Sep 5, 2020
·
4 revisions
- The projects are stored in a comma separated file with
lat
,long
,unique_path
, andname
. - This is loaded into 4-predict and 4-download and indexed by the name or unique path.
- Everything in
raw/*
is stored as int16, vianp.trunc(array * 65535).astype(np.int16)
because the original reflectance values are int16 and minimal calculations have occured - Everything in
interim/*
is float32, vianp.float32(array)
because there are still calculations to be done - Everything in
processed/*
is int32, vianp.trunc(array * 65535).astype(np.int32)
- All calculuations are float32, all tensors are float32, meaning that on loading any array, call np.float32(array), and assert that the array is between -10 and 10.
- Unique_path is created as the
country/admin1/name-uniqueid/
- Local and cloud are separated with a
local_prefix
andcloud_prefix
- Currently hickle
- All data in
raw/
is persistent - All other data is processed on demand and should be deleted from the respective folders before closing the docker containers
- The
processed/*
is int16 sizing but saved as int32, because it is signed - The
hickle
protocol does not seem to allow for streaming to / from s3, so it may be returned topickle
in the future