You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am testing the EntsoePandasClient and getting inconsistent formatting of results of query_generation(), in several ways.
In all cases, I am asking for data from 2019 to 2023, i.e., I am calling client.query_generation(country_code, start=pd.Timestamp('20190101', tz='Europe/Brussels'), end=pd.Timestamp('20240101', tz='Europe/Brussels')).
For Germany (country_code = 'DE_LU'), the result has a multi-indexed columns:
Biomass Fossil Brown coal/Lignite Fossil Coal-derived gas Fossil Gas ... Solar Waste Wind Offshore Wind Onshore
Actual Aggregated Actual Aggregated Actual Aggregated Actual Aggregated Actual Consumption ... Actual Consumption Actual Aggregated Actual Aggregated Actual Aggregated Actual Consumption
2019-01-01 00:00:00+01:00 4812.0 6932.0 273.0 3410.0 1.0 ... NaN 783.0 3177.0 19366.0 NaN
2019-01-01 00:15:00+01:00 4828.0 6351.0 481.0 3295.0 1.0 ... NaN 772.0 3174.0 20132.0 NaN
2019-01-01 00:30:00+01:00 4834.0 6221.0 481.0 3228.0 1.0 ... NaN 779.0 3167.0 20863.0 NaN
The same query for Denmark's DK-1 zone (country_code='DK_1') returns single-indexed columns:
Biomass (Biomass, Actual Aggregated) (Biomass, Actual Consumption) ... (Wind Offshore, Actual Consumption) (Wind Onshore, Actual Aggregated) (Wind Onshore, Actual Consumption)
2019-01-01 00:00:00+01:00 NaN 79.0 NaN ... NaN 2330.0 NaN
2019-01-01 01:00:00+01:00 NaN 62.0 NaN ... NaN 2427.0 NaN
2019-01-01 03:00:00+01:00 NaN 62.0 NaN ... NaN 2290.0 NaN
2019-01-01 04:00:00+01:00 NaN 58.0 NaN ... NaN 2229.0 NaN
Note that the column names are tuples, but the .columns is still an Index, not MultiIndex like for Germany.
What's even worse, the column assignment changes when I use the psr_type argument in the call. To illustrate this, consider all columns for offshore wind from the previous dataframe:
Wind Offshore (Wind Offshore, Actual Aggregated) (Wind Offshore, Actual Consumption)
2019-01-01 00:00:00+01:00 NaN 638.0 NaN
2019-01-01 01:00:00+01:00 NaN 686.0 NaN
2019-01-01 02:00:00+01:00 NaN 296.0 NaN
2019-01-01 03:00:00+01:00 NaN 289.0 NaN
2019-01-01 04:00:00+01:00 NaN 283.0 NaN
... ... ... ...
2023-12-31 19:00:00+01:00 1129.0 NaN NaN
2023-12-31 20:00:00+01:00 1093.0 NaN NaN
2023-12-31 21:00:00+01:00 1165.0 NaN NaN
2023-12-31 22:00:00+01:00 1191.0 NaN NaN
2023-12-31 23:00:00+01:00 1163.0 NaN NaN
There, we can see that the values actually switch columns somewhere during the period. EDIT: It turns out the data switch column several times: they are in (Wind Offshore, Actual Aggregated) in 2019 and 2021 and in Wind Offshore in 2020, 2022, and 2023.
Also note the inconsistency in naming, with the first column having name as string, while the other two as a tuple.
On the other hand, asking only for offshore wind with psr_type='B18' returns
(Wind Offshore, Actual Aggregated) (Wind Offshore, Actual Consumption) Wind Offshore
2019-01-01 00:00:00+01:00 NaN NaN 638.0
2019-01-01 01:00:00+01:00 NaN NaN 686.0
2019-01-01 02:00:00+01:00 NaN NaN 296.0
2019-01-01 03:00:00+01:00 NaN NaN 289.0
2019-01-01 04:00:00+01:00 NaN NaN 283.0
... ... ... ...
2023-12-31 19:00:00+01:00 NaN NaN 1129.0
2023-12-31 20:00:00+01:00 NaN NaN 1093.0
2023-12-31 21:00:00+01:00 NaN NaN 1165.0
2023-12-31 22:00:00+01:00 NaN NaN 1191.0
2023-12-31 23:00:00+01:00 NaN NaN 1163.0
i.e., the values are in the Wind Offshore column in all years. EDIT: The values turned out to be in column (Wind Offshore, Actual Aggregated) in 2021.
In other words, values one gets with the psr_type argument are not a subset of values without, as I would expect.
The text was updated successfully, but these errors were encountered:
I am testing the
EntsoePandasClient
and getting inconsistent formatting of results ofquery_generation()
, in several ways.In all cases, I am asking for data from 2019 to 2023, i.e., I am calling
client.query_generation(country_code, start=pd.Timestamp('20190101', tz='Europe/Brussels'), end=pd.Timestamp('20240101', tz='Europe/Brussels'))
.For Germany (
country_code = 'DE_LU'
), the result has a multi-indexed columns:The same query for Denmark's DK-1 zone (
country_code='DK_1'
) returns single-indexed columns:Note that the column names are tuples, but the
.columns
is still anIndex
, notMultiIndex
like for Germany.What's even worse, the column assignment changes when I use the
psr_type
argument in the call. To illustrate this, consider all columns for offshore wind from the previous dataframe:There, we can see that the values actually switch columns somewhere during the period.
EDIT: It turns out the data switch column several times: they are in
(Wind Offshore, Actual Aggregated)
in 2019 and 2021 and inWind Offshore
in 2020, 2022, and 2023.Also note the inconsistency in naming, with the first column having name as string, while the other two as a tuple.
On the other hand, asking only for offshore wind with
psr_type='B18'
returnsi.e., the values are in the
Wind Offshore column
in all years.EDIT: The values turned out to be in column
(Wind Offshore, Actual Aggregated)
in 2021.In other words, values one gets with the
psr_type
argument are not a subset of values without, as I would expect.The text was updated successfully, but these errors were encountered: