You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to calculate the cumulative sum of a target column by rows by using ibis. Here is a sample code snippet used:
import ibis
import ibis.backends.bigquery as bigquery
con = ibis.backends.bigquery.connect(project_id="pers-decision-engine-dev", dataset_id="pde_food_sow_20250121_dktlu3ov")
tbl = con.table("pers-decision-engine-dev.pde_food_sow_20250121_dktlu3ov.ranking_stg_chosen_exploit_offers_grow")
tbl_muted = tbl.mutate(running_cost=tbl.cost.cumsum(order_by="alpha"))
print(ibis.to_sql(tbl_muted))
The generated SQL is
SELECT
`t0`.`cust_id`,
`t0`.`campaign_id`,
`t0`.`cost`,
`t0`.`model_predicted_cost`,
`t0`.`calibration_factor`,
`t0`.`rewards_estimate`,
`t0`.`static_rand`,
`t0`.`roi`,
`t0`.`alpha`,
SUM(`t0`.`cost`) OVER (ORDER BY `t0`.`alpha` ASC) AS `running_cost` -- problematic window function to work out cumulative sum by rows
FROM `pers-decision-engine-dev`.`pde_food_sow_20250121_dktlu3ov`.`ranking_stg_chosen_exploit_offers_grow` AS `t0`
I did a experiment to prove this Bigquery behavior with different window frames by the calculation of the cumulative sum by rows:
From above result, It can tell that only the rows_window_frame_running_rewards with explict rows window frame giving out the correct running sum result.
Therefore, an explicit rows window frame must be respected by ibis instead of dropping the window spec and using the default window frame when the range is BETWEEN UNBOUND PRECEDING AND CURRENT ROW.
The Bigquery default window frame behavior shouldn't be used by ibis to implement the cumusum function and this logic further impacts the cumulative_window, rows_window and et al.
What happened?
What happened?
I would like to calculate the cumulative sum of a target column by rows by using ibis. Here is a sample code snippet used:
The generated SQL is
According to the paper searched in google here the Bigquery logic with default window frame is :
I did a experiment to prove this Bigquery behavior with different window frames by the calculation of the cumulative sum by rows:
From above result, It can tell that only the
rows_window_frame_running_rewards
with explict rows window frame giving out the correct running sum result.Therefore, an explicit rows window frame must be respected by ibis instead of dropping the window spec and using the default window frame when the range is
BETWEEN UNBOUND PRECEDING AND CURRENT ROW
.The Bigquery default window frame behavior shouldn't be used by ibis to implement the
cumusum
function and this logic further impacts thecumulative_window
,rows_window
and et al.What version of ibis are you using?
As limited testing, the problem exists:
What backend(s) are you using, if any?
big query
Relevant log output
(venv) ➜ ibis-debug pip list | grep ibis ibis-framework 9.5.0 (venv) ➜ ibis-debug python -V Python 3.10.16 (venv) ➜ ibis-debug
Code of Conduct
The text was updated successfully, but these errors were encountered: