Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build the input workflow #289

Closed
9 of 20 tasks
clizbe opened this issue Nov 22, 2023 · 21 comments
Closed
9 of 20 tasks

Build the input workflow #289

clizbe opened this issue Nov 22, 2023 · 21 comments
Assignees
Labels
epic Epic issues (collection of smaller tasks towards a goal)

Comments

@clizbe
Copy link
Member

clizbe commented Nov 22, 2023

Build the basic input workflow from raw data to the model.

See discussion #288

Considerations

Meta considerations

  • Do we need parallel execution of pipelines?
  • Maybe supporting parallel jobs with shared inputs is sufficient

Capabilities/Usability requirements

WHAT WE WANT
Build the network once (in a while)
Use draft networks to build new networks
Sufficient flexibility for ad-hoc code for experimentation
Definition of temporal stuff
Definition of scenarios (what is included here?)
Scope: just model or parts of pipeline (which parts?)
Definition of solver specifications
Be able to mix data sources (ESDL + ENTSO-E for example)
Self-hosted Tulipa database (in case sources change/vanish, & reduce re-pulling/processing data)
Export ESDL to simplified representation that is compatible with Tulipa

@clizbe clizbe added the epic Epic issues (collection of smaller tasks towards a goal) label Nov 22, 2023
@abelsiqueira
Copy link
Member

Does this includes the representative periods and the assets and flows partitions, or is it just for the data sources?

@suvayu
Copy link
Member

suvayu commented Nov 23, 2023

The representative period comes from an algorithm, so that should be included, but optionally. A scenario might not require the algorithm, and use fixed periods instead, or the case where the algorithm has run once, and the input hasn't changed, then it need not run again.

As for the flow partitions, aren't they derivable from the profiles? If so, then that would also be along the lines of "compute if input changes".

@clizbe
Copy link
Member Author

clizbe commented Nov 23, 2023

@Lokkij I tagged you on this one too if you're interested. You're of course our source for ESDL knowledge but I thought you might also be interested in this stuff. :)

@clizbe clizbe mentioned this issue Nov 23, 2023
8 tasks
@suvayu
Copy link
Member

suvayu commented Nov 23, 2023

Is it possible to filter out attributes not used in Tulipa when exporting ESDL to JSON?

I thought we decided against this because that would be a choice on the Tulipa side and not the ESDL side?

@clizbe I'm guessing you left that comment? Best to discuss in the thread instead of the editing top post.

I see that my wording is pretty unclear. AFAICT, there are two levels of filtering; the top-level includes stuff that are not in Tulipa because of fundamental modelling choices, e.g. no connections. So maybe then having the Port attributes in ESDL will never make sense. And the next level is any other finer choices that we make, which evolves with time.

In this case I mean the top-level fundamental choices. But maybe I'm over thinking it, and doing everything in one go is simpler.

@clizbe
Copy link
Member Author

clizbe commented Nov 28, 2023

Yes I think some of it will be specifying the type of ESDL file that Tulipa accepts - which variables should be filled, etc. And then probably a step of converting that ESDL into the form that Tulipa likes, which will include throwing out anything else and maybe some conversion trickery. I would prefer if the ESDL file looks normal before conversion and that we don't build really weird ESDLs - but we'll see what works.

@Lokkij
Copy link

Lokkij commented Nov 28, 2023

Is it possible to filter out attributes not used in Tulipa when exporting ESDL to JSON?

I thought we decided against this because that would be a choice on the Tulipa side and not the ESDL side?

Usually the approach here is to leave attributes in ESDL and simply not read them from the model if you don't need them. In our case, I would keep the filtering as close to Tulipa as possible. That will likely make it easier to write back results to ESDL while keeping the original attributes intact.

Do we need a local data store?

What would the local data store be used for? To store temporary in-between data, or something else?

@suvayu
Copy link
Member

suvayu commented Nov 28, 2023 via email

@clizbe
Copy link
Member Author

clizbe commented Nov 28, 2023

Just saw this at a Spine meeting and thought it would be super handy to have something similar! (Maybe you had this in mind already, but it's new to me.) From what I understand it shows where specific data is coming from and the lines sort of indicate how it's processed?
image

@Lokkij
Copy link

Lokkij commented Nov 28, 2023

As my understanding goes, for larger datasets we will have to connect to
influxdb (or similar) and download for Tulipa to read. There will also be
intermediate steps (e.g different ways to compute representative days) etc. I doubt we want to download the dataset every time, or recompute unchanged steps every time.

Ah, so a sort of local DB to store data while doing other operations? I wouldn't expect our data to be so big as to need it, honestly - you can fit a lot of profiles in a few GBs of RAM. But maybe I'm missing something?

Just saw this at a Spine meeting and thought it would be super handy to have something similar! (Maybe you had this in mind already, but it's new to me.) From what I understand it shows where specific data is coming from and the lines sort of indicate how it's processed?

To me this looks like a class diagram, very similar to the diagrams for ESDL. The ESDL documentation has diagrams for all classes, for example: https://energytransition.github.io/#router/doc-content/687474703a2f2f7777772e746e6f2e6e6c2f6573646c/PowerPlant.html

@clizbe
Copy link
Member Author

clizbe commented Nov 29, 2023

@datejada @gnawin @clizbe
Add some use-cases of how you're going to use the model and what your workflow is so they have a better idea of what we need.
"I want to run the model from the train" is valid. :)

@clizbe
Copy link
Member Author

clizbe commented Nov 30, 2023

Use Cases
I would like to be able to:

  • summarize/visualize my input data (in tables or graphs), such as total wind capacity, transport line capacities, available technologies.
  • make transport capacities in certain areas unlimited, while still constraining others.
  • set up multiple scenarios to run in parallel or (otherwise) series - set and forget.
  • visualize output data from one scenario, as well as compare multiple scenarios.
  • keep track of what model version and what data was used for a particular run/analysis - reproducibility.
  • easily specify scenario parameters for multiple scenarios.
  • occasionally add new data / data sources.
  • specify which data sources to use to build a scenario.
  • run the model somewhere that I can go about other work while it runs.
  • know when the model is finished running.

My current workflow for running scenarios is:

  • Duplicate a "default" Access dataset - this has everything needed to do a run.
  • In Excel, process scenario-unique (new) data, so it works with the model.
  • In Access, filter for and delete any data that will be replaced by the new data.
  • Copy and paste the new data into the dataset.
  • Go into the model, Browse for the dataset, Load it, Run the model.
  • Check frequently if the model has finished running.
  • Export data to Excel to make graphs (although Wester is building a UI to make this nicer).

Pros/Cons of Access

  • Can easily see data (once you know where it is)
  • Easy to learn how to edit
  • Takes a long time to edit
  • Sometimes you don't know where the data is
  • Huge tables make it slow even loading/filtering

@suvayu
Copy link
Member

suvayu commented Dec 1, 2023

Ah, so a sort of local DB to store data while doing other operations? I wouldn't expect our data to be so big as to need it, honestly - you can fit a lot of profiles in a few GBs of RAM. But maybe I'm missing something?

I guess that's pretty small. However I would really like to support a workflow that doesn't necessitate you to be online. But if people say there's no such need, we can drop it.

Edit: more I think about it, I think we need it, e.g. for running different scenarios it makes no sense to download the same data repeatedly even if it is small. So the question is, should the local store also be accessible to normal users for inspection and analysis. And based on @clizbe's points, I think it should be.

@suvayu
Copy link
Member

suvayu commented Dec 1, 2023

Pros/Cons of Access

  • Can easily see data (once you know where it is)
  • Easy to learn how to edit
  • Takes a long time to edit
  • Sometimes you don't know where the data is
  • Huge tables make it slow even loading/filtering

@clizbe Do you know SQL? Is it fair to expect someone who is doing analysis to know/learn a bit of SQL?

@clizbe
Copy link
Member Author

clizbe commented Jan 15, 2024

@suvayu Sorry I don't know if I responded in person.
Learning SQL is totally feasible. I don't think our current modellers know it. (I've used it once.)

@clizbe
Copy link
Member Author

clizbe commented Feb 6, 2024

Compiling the model takes a lot of time (Julia thing) with future runs going faster. How are we dealing with this in the workflow? Is the stable version of Tulipa something that compiles once and then can take any data through it? Or will the scenario define a model that needs precompiling before doing multiple runs?

@suvayu
Copy link
Member

suvayu commented Feb 6, 2024

I think this request needs to be separated according to use case. For example, if you changed an input dataset, naively, you have to rerun. However if you say "I'm doing a sensitivity study, and my changes are only limited to X" then theoretically the repetitions need not start from scratch. But I think that's a very advanced feature which requires deep technical research. AFAIU, this is in @g-moralesespana and @datejada's wishlist (GUSS in GAMS). But there could be simpler use cases between these two extremes.

That said, I'm not sure whether this would fall under the purview or pipeline/workflow or model building. My hunch is, it'll depend on the use case.

I hope that makes sense :⁠-⁠P

@clizbe
Copy link
Member Author

clizbe commented Feb 7, 2024

Yeah I figured I'd comment here in case it's a simple answer, but it's probably a bigger discussion.

This is becoming an issue with Spine, so it's good to think about it early.

@datejada
Copy link
Member

For the ENTSOE data base I found this, but I'm not sure if we have access (or if we could have)...it might be interesting to explore it...

https://www.linkedin.com/posts/activity-7140005469414133760-f4XH/?utm_source=share&utm_medium=member_desktop

@datejada
Copy link
Member

@nope82 commented the following about ENTSO-E:

From just a quick check it seems that this PEMMDB is only accessed by TSOs (Author’s comment: “Sadly no, (data transparency) it is only for sharing between TSO members”. When looking for access to the data, only found a reglament from the EERA study from ACER asking for the PEMMDB data :

“On 23 November 2021, ACER requested ENTSO-E to provide all input data for the ERAA 2021. On 2 December 2021, ENTSO-E provided ACER with access to the pan-European market modelling database (PEMMDB) and the assumptions for the economic viability assessment (EVA)”.

So it seems that ENTSO-E would be the only one that could give access to it, and also seems to be one-time thing access for specific data (or need to do recurrent request access) instead of a completely open access to the data probably

@clizbe
Copy link
Member Author

clizbe commented Jul 29, 2024

@clizbe Reorganize the info here and close this issue

@clizbe
Copy link
Member Author

clizbe commented Sep 19, 2024

Stale issue - ongoing efforts moved to other places (links provided)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Epic issues (collection of smaller tasks towards a goal)
Projects
None yet
Development

No branches or pull requests

5 participants