Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow mappings for tabular data #680

Open
irm-codebase opened this issue Sep 10, 2024 · 4 comments
Open

Allow mappings for tabular data #680

irm-codebase opened this issue Sep 10, 2024 · 4 comments
Labels
enhancement v0.7 (upcoming) version 0.7

Comments

@irm-codebase
Copy link
Contributor

irm-codebase commented Sep 10, 2024

What can be improved?

Being able to load tabular data is a wonderful new feature.
But it is currently limited by forcing the data to 'match' calliope's dimension names in certain cases. Enforcing strict naming by default makes a lot of sense: you avoid ambiguity and you also avoid the risks of relying on column position.

However, it will often be too inflexible.

Reasoning

Some of our names might lead to files being less human readable, or finicky:

  • if a model is national resolution, nodes is less informative than country
  • people will often go for technology or tech instead of techs, since it can be intuitive to name columns in singular
  • people might prefer time or utc_timestamp, over timesteps...

This will lead to a lot of 'boilerplate' code that just shapes the data to fit Calliope's naming. See the following 3 examples for different names used for timeseries in Euro Calliope with v6.10:

image
image
image

All 3 are equally 'human' readable, but since they do not specify timesteps, they won't load into Calliope.

Proposal

An option to use mappings would solve this issue.
For example, you could load one of the timeseries above this way:

data_sources:
  demand_elec_timeseries:
    source: timeseries/demand/electricity.csv
    columns: nodes
    rows: {timesteps: time}
    add_dims:
      techs: demand_elec
      parameters: sink_use_equals

This is still strict, but more flexible.

Version

v0.7.0.dev3

@irm-codebase irm-codebase added enhancement v0.7 (upcoming) version 0.7 labels Sep 10, 2024
@sjpfenninger
Copy link
Member

I think this is a good idea. So the proposal is to allow mappings in this way, right?

rows: {column_name_in_data: dimension_name_in_model}

rather than just

rows: dimension_name_in_model

which requires the dimension name to appear exactly as-is in the data, and should fail in all other cases

@irm-codebase
Copy link
Contributor Author

Pretty much. If possible, though, I think supporting both is the best case.

Basically, if the type is string, assume its a match. If it's a dict, assume its a mapping?

@brynpickering
Copy link
Member

@irm-codebase I'd argue that those different columns actually introduce some ambiguity. How is one to know that the ones not suffixed with _utc are in UTC timezone....? Generally, it makes a lot of sense to follow a standard index name format (e.g., there's a relatively strict set used by the climate community). It's so much easier to maintain tabular data if you follow a standard format.

Still, I'm willing to introduce this mapping for these edge cases.

Having different data types for a single config entry is always a pain to maintain and to document. I would prefer to have something like a mapping key, e.g.:

data_sources:
  demand_elec_timeseries:
    source: timeseries/demand/electricity.csv
    columns: nodes
    rows: timesteps
    map_dims:
      timesteps: time
    add_dims:
      techs: demand_elec
      parameters: sink_use_equals

@irm-codebase
Copy link
Contributor Author

I agree @brynpickering

Our default should be to assume the data is provided in the correct format. But adding a bit of extra flexibility should avoid extra code in certain cases.

I think your mapping approach is also better than my proposal, so no issues there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement v0.7 (upcoming) version 0.7
Projects
None yet
Development

No branches or pull requests

3 participants