Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent format of results of EntsoePandasClient.query_generation() #328

Open
mkaut opened this issue May 2, 2024 · 1 comment
Open
Labels
more info needed when more info is needed from the person opening the issue

Comments

@mkaut
Copy link

mkaut commented May 2, 2024

I am testing the EntsoePandasClient and getting inconsistent formatting of results of query_generation(), in several ways.
In all cases, I am asking for data from 2019 to 2023, i.e., I am calling client.query_generation(country_code, start=pd.Timestamp('20190101', tz='Europe/Brussels'), end=pd.Timestamp('20240101', tz='Europe/Brussels')).

For Germany (country_code = 'DE_LU'), the result has a multi-indexed columns:

                                    Biomass Fossil Brown coal/Lignite Fossil Coal-derived gas        Fossil Gas                     ...              Solar             Waste     Wind Offshore      Wind Onshore
                          Actual Aggregated         Actual Aggregated       Actual Aggregated Actual Aggregated Actual Consumption  ... Actual Consumption Actual Aggregated Actual Aggregated Actual Aggregated Actual Consumption
2019-01-01 00:00:00+01:00            4812.0                    6932.0                   273.0            3410.0                1.0  ...                NaN             783.0            3177.0           19366.0                NaN
2019-01-01 00:15:00+01:00            4828.0                    6351.0                   481.0            3295.0                1.0  ...                NaN             772.0            3174.0           20132.0                NaN
2019-01-01 00:30:00+01:00            4834.0                    6221.0                   481.0            3228.0                1.0  ...                NaN             779.0            3167.0           20863.0                NaN

The same query for Denmark's DK-1 zone (country_code='DK_1') returns single-indexed columns:

                           Biomass  (Biomass, Actual Aggregated)  (Biomass, Actual Consumption)  ...  (Wind Offshore, Actual Consumption)  (Wind Onshore, Actual Aggregated)  (Wind Onshore, Actual Consumption)
2019-01-01 00:00:00+01:00      NaN                          79.0                            NaN  ...                                  NaN                             2330.0                                 NaN
2019-01-01 01:00:00+01:00      NaN                          62.0                            NaN  ...                                  NaN                             2427.0                                 NaN
2019-01-01 03:00:00+01:00      NaN                          62.0                            NaN  ...                                  NaN                             2290.0                                 NaN
2019-01-01 04:00:00+01:00      NaN                          58.0                            NaN  ...                                  NaN                             2229.0                                 NaN

Note that the column names are tuples, but the .columns is still an Index, not MultiIndex like for Germany.

What's even worse, the column assignment changes when I use the psr_type argument in the call. To illustrate this, consider all columns for offshore wind from the previous dataframe:

                           Wind Offshore  (Wind Offshore, Actual Aggregated)  (Wind Offshore, Actual Consumption)
2019-01-01 00:00:00+01:00            NaN                               638.0                                  NaN
2019-01-01 01:00:00+01:00            NaN                               686.0                                  NaN
2019-01-01 02:00:00+01:00            NaN                               296.0                                  NaN
2019-01-01 03:00:00+01:00            NaN                               289.0                                  NaN
2019-01-01 04:00:00+01:00            NaN                               283.0                                  NaN
...                                  ...                                 ...                                  ...
2023-12-31 19:00:00+01:00         1129.0                                 NaN                                  NaN
2023-12-31 20:00:00+01:00         1093.0                                 NaN                                  NaN
2023-12-31 21:00:00+01:00         1165.0                                 NaN                                  NaN
2023-12-31 22:00:00+01:00         1191.0                                 NaN                                  NaN
2023-12-31 23:00:00+01:00         1163.0                                 NaN                                  NaN

There, we can see that the values actually switch columns somewhere during the period.
EDIT: It turns out the data switch column several times: they are in (Wind Offshore, Actual Aggregated) in 2019 and 2021 and in Wind Offshore in 2020, 2022, and 2023.
Also note the inconsistency in naming, with the first column having name as string, while the other two as a tuple.

On the other hand, asking only for offshore wind with psr_type='B18' returns

                           (Wind Offshore, Actual Aggregated)  (Wind Offshore, Actual Consumption)  Wind Offshore
2019-01-01 00:00:00+01:00                                 NaN                                  NaN          638.0
2019-01-01 01:00:00+01:00                                 NaN                                  NaN          686.0
2019-01-01 02:00:00+01:00                                 NaN                                  NaN          296.0
2019-01-01 03:00:00+01:00                                 NaN                                  NaN          289.0
2019-01-01 04:00:00+01:00                                 NaN                                  NaN          283.0
...                                                       ...                                  ...            ...
2023-12-31 19:00:00+01:00                                 NaN                                  NaN         1129.0
2023-12-31 20:00:00+01:00                                 NaN                                  NaN         1093.0
2023-12-31 21:00:00+01:00                                 NaN                                  NaN         1165.0
2023-12-31 22:00:00+01:00                                 NaN                                  NaN         1191.0
2023-12-31 23:00:00+01:00                                 NaN                                  NaN         1163.0

i.e., the values are in the Wind Offshore column in all years.
EDIT: The values turned out to be in column (Wind Offshore, Actual Aggregated) in 2021.

In other words, values one gets with the psr_type argument are not a subset of values without, as I would expect.

@fboerman
Copy link
Collaborator

fboerman commented May 2, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more info needed when more info is needed from the person opening the issue
Projects
None yet
Development

No branches or pull requests

2 participants