-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow mappings for tabular data #680
Comments
I think this is a good idea. So the proposal is to allow mappings in this way, right?
rather than just
which requires the dimension name to appear exactly as-is in the data, and should fail in all other cases |
Pretty much. If possible, though, I think supporting both is the best case. Basically, if the type is string, assume its a match. If it's a dict, assume its a mapping? |
@irm-codebase I'd argue that those different columns actually introduce some ambiguity. How is one to know that the ones not suffixed with Still, I'm willing to introduce this mapping for these edge cases. Having different data types for a single config entry is always a pain to maintain and to document. I would prefer to have something like a data_sources:
demand_elec_timeseries:
source: timeseries/demand/electricity.csv
columns: nodes
rows: timesteps
map_dims:
timesteps: time
add_dims:
techs: demand_elec
parameters: sink_use_equals |
I agree @brynpickering Our default should be to assume the data is provided in the correct format. But adding a bit of extra flexibility should avoid extra code in certain cases. I think your mapping approach is also better than my proposal, so no issues there! |
What can be improved?
Being able to load tabular data is a wonderful new feature.
But it is currently limited by forcing the data to 'match' calliope's dimension names in certain cases. Enforcing strict naming by default makes a lot of sense: you avoid ambiguity and you also avoid the risks of relying on column position.
However, it will often be too inflexible.
Reasoning
Some of our names might lead to files being less human readable, or finicky:
nodes
is less informative thancountry
technology
ortech
instead oftechs
, since it can be intuitive to name columns in singulartime
orutc_timestamp
, overtimesteps
...This will lead to a lot of 'boilerplate' code that just shapes the data to fit Calliope's naming. See the following 3 examples for different names used for timeseries in Euro Calliope with v6.10:
All 3 are equally 'human' readable, but since they do not specify
timesteps
, they won't load into Calliope.Proposal
An option to use mappings would solve this issue.
For example, you could load one of the timeseries above this way:
This is still strict, but more flexible.
Version
v0.7.0.dev3
The text was updated successfully, but these errors were encountered: