-
Notifications
You must be signed in to change notification settings - Fork 46
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can we make intake-esm more transparent? #163
Comments
It should be relatively easy to return the nested dictionary. A couple other ideas include enabling an |
👍 |
More thoughts: how would this work? Would what would the keys be? Would it just group by all columns? |
It would return a dataset for each row in the database. We could form keys from the groupby applied to all columns, but maybe it would be more accessible if the key was just the index. What do you think? |
What would intake-esm currently do if there were no |
Answer: Raise That is NOT the right behavior. Aggregation should be totally 100% optional in these catalogs. |
Agreed, that's a bug, but easy to fix. Without groups = self.df.groupby(self.df.columns.tolist()) and the returned keys will be of the same format. We can trigger the same behavior if |
With #164 the following works: import intake
col_file = "https://raw.githubusercontent.com/NCAR/intake-esm-datastore/master/catalogs/pangeo-cmip6.json"
col = intake.open_esm_datastore(col_file)
query = dict(experiment_id='historical', table_id='Oyr',
variable_id='o2', grid_label='gn', member_id='r1i1p1f1')
cat = col.search(**query)
# Disable aggregations
dsets_pp = cat.to_dataset_dict(aggregate=False)
print(dsets_pp.keys()) --> The keys in the returned dictionary of datasets are constructed as follows:
'zstore'
--> There will be 2 group(s)
dict_keys(['gs://cmip6/CMIP/CCCma/CanESM5/historical/r1i1p1f1/Oyr/o2/gn/', 'gs://cmip6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Oyr/o2/gn/']) |
@andersy005 - nice! However, I would prefer for the keys to be the groups, not the paths, as @matt-long suggested. Are the keys the datasets themselves? |
Assuming that we have a row with the following attributes: activity_id AerChemMIP
institution_id BCC
source_id BCC-ESM1
experiment_id ssp370
member_id r1i1p1f1
table_id Amon
variable_id pr
grid_label gn
zstore gs://cmip6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1...
dcpp_init_year NaN
Name: 0, dtype: object
Should we have something along these lines? { 'AerChemMIP.BCC.BCC-ESM1.ssp370.r1i1p1f1.Amon.pr.gn.NaN' :
<xarray.Dataset>
Dimensions: (bnds: 2, lat: 64, lon: 128, time: 492)
Coordinates:
* lat (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 82.31 85.1 87.86
lat_bnds (lat, bnds) float64 dask.array<chunksize=(64, 2), meta=np.ndarray>
* lon (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
lon_bnds (lon, bnds) float64 dask.array<chunksize=(128, 2), meta=np.ndarray>
* time (time) object 2015-01-16 12:00:00 ... 2055-12-16 12:00:00
time_bnds (time, bnds) object dask.array<chunksize=(492, 2), meta=np.ndarray>
Dimensions without coordinates: bnds
Data variables:
pr (time, lat, lon) float32 dask.array<chunksize=(492, 64, 128), meta=np.ndarray>
Attributes:
Conventions: CF-1.7 CMIP-6.2
activity_id: AerChemMIP
further_info_url: https://furtherinfo.es-doc.org/CMIP6.BCC.BCC-ESM1...
grid: T42 |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
I'm sitting with @naomi-henderson, and we are discussing how we might make intake-esm more transparent about what it's doing under the hood.
It would be nice if there were a mode where, rather than running the all the merge operations, intake returns a nested dictionary similar to the one I showed in my recursive merge demo
This would allow users to manually descend into the different individual datasets and examine them one a time, optionally applying their own merge logic.
This should be relatively easy, since intake-esm probably has an internal data structure like this already.
The text was updated successfully, but these errors were encountered: