Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[async] Support microbatching when using ExecutionMode.AIRFLOW_ASYNC #1270

Open
1 task
tatiana opened this issue Oct 21, 2024 · 1 comment
Open
1 task
Labels
area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc do-not-stale Related to stale job and dosubot

Comments

@tatiana
Copy link
Collaborator

tatiana commented Oct 21, 2024

Context

Incremental models in dbt is a materialization strategy designed to efficiently update your data warehouse tables by only transforming and loading new or changed data since the last run. Instead of processing your entire dataset every time, incremental models append or update only the new rows, significantly reducing the time and resources required for your data transformations.

Even with all the benefits of incremental models as they exist today, there are limitations with this approach, such as:

  • burden is on YOU to calculate what’s “new” - what has already been loaded, what needs to be loaded, etc.
  • can be slow if you have many partitions to process (like when running in full-refresh mode) as it’s done in “one big” SQL statement - can time out, if it fails you end up needing to retry already successful partitions, etc.
  • if you want to specifically name a partition for your incremental model to process, you have to add additional “hack”y logic, likely using vars
    data tests run on your entire model, rather than just the "new" data

dbt-labs/dbt-core#10624

Acceptance criteria

  • ExecutionMode.AIRFLOW_ASYNC can leverage dbt microbatching strategies
@dosubot dosubot bot added the area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc label Oct 21, 2024
Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Nov 20, 2024
@pankajastro pankajastro added do-not-stale Related to stale job and dosubot and removed stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed labels Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc do-not-stale Related to stale job and dosubot
Projects
None yet
Development

No branches or pull requests

2 participants