Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the foreign composite key on TI to ti.id #44147

Open
kaxil opened this issue Nov 18, 2024 · 3 comments
Open

Replace the foreign composite key on TI to ti.id #44147

kaxil opened this issue Nov 18, 2024 · 3 comments
Assignees
Labels
area:db-migrations PRs with DB migration area:task-execution-interface-aip72 AIP-72: Task Execution Interface (TEI) aka Task SDK
Milestone

Comments

@kaxil
Copy link
Member

kaxil commented Nov 18, 2024

Now that we have UUID primary key on TI table (since #43243), let's adjust foreign key constraints on tables like XCom, Task Reschedule, TI note etc

@ashb
Copy link
Member

ashb commented Feb 3, 2025

Thinking through this a little bit more @jedcunningham and I have decided that:

TaskReschedule, XCom, TaskMap and TINote should not be linked to a specific TI Try/TIHistory row and should instead apply to the whole "dag run task" (a concept which doesn't currently exist in Airflow 2 or 3 or even have a good name).

Our thinking is:

  • TI Note: For UX reasons note should be shared across all attempts of a Task
  • TaskMap and TaskReschedule don't serve any value in storing beyond the currently active task (so they could either be FKd to a specific TI uuid, or be deleted when the TI state is terminal.)
  • XCom: Bit of a funny one, we clear the XCom value on start of the next attempt anyway, so we don't ever store history. However there could be some value in some of these being tied to an atempt. For example, many BaseOperatorLink subclasses store the URL in xcom, and being able to see external job logs (EMR, DataProc etc) for a previous attempt would be a nice feature.

The only other relationships onto TaskInstance table is TIHistory and RenderedTaskInstanceFIelds.

While it might be nice to see some Xcom values and the RTIF for a given TIHistory row, the cost of keeping TIHistory in sync with TaskInstance changes and the shenanigans we'd need to pull to get the FKs update means we don't need to update anything.

One option if we only want to avoid composite PKs of (dag_id,run_id,task_id,map_index) would be to create a "Dag Task id" (this name sucks though) column which is an integer/UUID key that is unique over (dag_id,run_id)

For example, we might store these TI history rows

 dag_id | run_id | task_id |                NEW_ID                |                  id                  | map_index | try_number
--------+--------+---------+--------------------------------------+--------------------------------------+-----------+------------
 dag1   | run1   | my_task | 64379755-f717-4463-9f8b-5e2cb16c74f8 | f4578f24-1ecb-4ae1-b23d-2da43863cf80 |        -1 |          1
 dag1   | run1   | my_task | 64379755-f717-4463-9f8b-5e2cb16c74f8 | 7fdd63b8-b6da-4726-8aab-423612dc98b7 |        -1 |          2

(Note: id is 100% unique, each mapped task/map_index value gets a unique id).

and this "active" TI:

 dag1   | run1   | my_task | 64379755-f717-4463-9f8b-5e2cb16c74f8 | 87dc53d3-d46c-4e80-a3fe-e7a986879951 |        -1 |          3

The goal of the id column in the first place was driven by the Task Execution Interface needing a single value (that is easy/convient to use) that will uniquely identify the TI attempt. Also the fact that the number of "PK" columns in TI keeps growing, and adding a UUID pk now makes that tennable going forward to add more "ID" columns without ever bloating the FKs etc.

@potiuk
Copy link
Member

potiuk commented Feb 5, 2025

I am glad then that we have not started the work on multi-team in 3.0 - because those are the kind of changes / dependencies I expected to be worked out /cleaned before we complete 3.0.

@potiuk
Copy link
Member

potiuk commented Feb 5, 2025

cc: @o-nikolas

@phanikumv phanikumv moved this from Todo to Next 2 Weeks in AIP-72 - Task Execution Interface and SDK Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:db-migrations PRs with DB migration area:task-execution-interface-aip72 AIP-72: Task Execution Interface (TEI) aka Task SDK
Projects
Development

Successfully merging a pull request may close this issue.

4 participants