Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: cumsum function does not work as expected in Bigquery backend due to ibis default window frame logic #10699

Open
1 task done
maxshine opened this issue Jan 21, 2025 · 1 comment · May be fixed by #10700
Open
1 task done
Assignees
Labels
bug Incorrect behavior inside of ibis

Comments

@maxshine
Copy link
Contributor

What happened?

What happened?

I would like to calculate the cumulative sum of a target column by rows by using ibis. Here is a sample code snippet used:

import ibis
import ibis.backends.bigquery as bigquery

con = ibis.backends.bigquery.connect(project_id="pers-decision-engine-dev", dataset_id="pde_food_sow_20250121_dktlu3ov")
tbl = con.table("pers-decision-engine-dev.pde_food_sow_20250121_dktlu3ov.ranking_stg_chosen_exploit_offers_grow")
tbl_muted = tbl.mutate(running_cost=tbl.cost.cumsum(order_by="alpha"))
print(ibis.to_sql(tbl_muted))

The generated SQL is

SELECT
  `t0`.`cust_id`,
  `t0`.`campaign_id`,
  `t0`.`cost`,
  `t0`.`model_predicted_cost`,
  `t0`.`calibration_factor`,
  `t0`.`rewards_estimate`,
  `t0`.`static_rand`,
  `t0`.`roi`,
  `t0`.`alpha`,
  SUM(`t0`.`cost`) OVER (ORDER BY `t0`.`alpha` ASC) AS `running_cost` -- problematic window function to work out cumulative sum by rows
FROM `pers-decision-engine-dev`.`pde_food_sow_20250121_dktlu3ov`.`ranking_stg_chosen_exploit_offers_grow` AS `t0`

According to the paper searched in google here the Bigquery logic with default window frame is :

Image

I did a experiment to prove this Bigquery behavior with different window frames by the calculation of the cumulative sum by rows:

Image

From above result, It can tell that only the rows_window_frame_running_rewards with explict rows window frame giving out the correct running sum result.

Therefore, an explicit rows window frame must be respected by ibis instead of dropping the window spec and using the default window frame when the range is BETWEEN UNBOUND PRECEDING AND CURRENT ROW.

The Bigquery default window frame behavior shouldn't be used by ibis to implement the cumusum function and this logic further impacts the cumulative_window, rows_window and et al.

What version of ibis are you using?

As limited testing, the problem exists:

  • 9.2.0
  • 9.5.0

What backend(s) are you using, if any?

big query

Relevant log output

(venv) ➜  ibis-debug pip list | grep ibis
ibis-framework                9.5.0
(venv) ➜  ibis-debug python -V
Python 3.10.16
(venv) ➜  ibis-debug

Code of Conduct

  • I agree to follow this project's Code of Conduct
@maxshine maxshine added the bug Incorrect behavior inside of ibis label Jan 21, 2025
@maxshine
Copy link
Contributor Author

/take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Status: backlog
Development

Successfully merging a pull request may close this issue.

1 participant