Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add history mode guide #49

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from
6 changes: 4 additions & 2 deletions development-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,12 +105,14 @@ This operation should report all columns in the destination table, including Fiv
- This operation might be requested for a table that does not exist in the destination. In that case, it should NOT fail, simply ignore the request and return `success = true`.
- `utc_delete_before` has millisecond precision.

#### WriteBatchRequest
#### WriteBatchRequest
- `replace_files` is for `upsert` operation where the rows should be inserted if they don't exist or updated if they do. Each row will always provide values for all columns. Set the `_fivetran_synced` column in the destination with the values coming in from the csv files.

- `update_files` is for `update` operation where modified columns have actual values whereas unmodified columns have the special value `unmodified_string` in `CsvFileParams`. Soft-deleted rows will arrive in here as well. Update the `_fivetran_synced` column in the destination with the values coming in from the csv files.

- `delete_files` is for `hard delete` operation. Use primary key columns (or `_fivetran_id` system column for primary-keyless tables) to perform `DELETE FROM`.
- `delete_files` is for `hard delete` operation. Use primary key columns (or `_fivetran_id` system column for primary-keyless tables) to perform `DELETE FROM`.

Note: To handle history mode `replace_files` , `update_files` and `delete_files`. Follow [How to Handle History Mode Data](how-to-handle-history-mode-files.md) guide
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

Also, Fivetran will deduplicate operations such that each primary key will show up only once in any of the operations

Expand Down
220 changes: 220 additions & 0 deletions how-to-handle-history-mode-files.md
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
#### What is History Mode
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

History mode allows us to capture every version of each record processed by the fivetran connectors.
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved
In order to keep all versions of the record, we have introduced three new system columns for tables with history mode enabled.
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved


Column | Type | Description
--- | --- | ---
_fivetran_active | Boolean | TRUE if it is the currently active record. FALSE if it is a historical version of the record. Only one version of the record can be TRUE.
_fivatran_start | TimeStamp | The time when the record was first created or modified in the source.
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved
_fivetran_end | TimeStamp | The value for this column depends on whether the record is active. If the record is not active(`_fivetran_active`=FALSE), then `_fivetran_end` value will be `_fivetran_start` of the next version of the record minus 1 millisecond. If the record is deleted, then the value will be the same as the timestamp of delete operation. If the record is active(`_fivetran_active`=TRUE), then `_fivetran_end` is the max allowed value that we can set for a TIMESTAMP column.
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved


#### Points to remember in history mode
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

- In WriterBatchRequest we pass a new optional field HistoryMode which indicates connector is in history mode or not. In this HistoryMode field, we pass `deleted_column` column name which we need to modify only if it is present in destination.
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved
- If the existing table is not empty then in the batch file we also send a boolean column `_fivetran_earliest`. Suppose in an `upsert` we got multiple versions of the same record in a flush, then we set the `_fivetran_earliest` to `TRUE` for the record which have the earliest `_fivetran_start` and rest of the versions will have `_fivetran_earliest` as FALSE.
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved
- For each Replace, Update and Delete batch files, DELETE the existing records from destination table if `_fivetran_start` of destination table is greater than or equal to `_fivetran_start` of batch file(Refer Replace example 1 and example 2).
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

Note: The `_fivetran_earliest` column shouldn't be added in the destination table. It is introduced to easily identify the earliest record and can be used to optimize data loads query.
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved
Below is an example of `replace_file`
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

Id(PK) | COL1 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_earliest
---|---------|---------------------| --- |------------------| ---
1 | abc | T1 | T2-1 | FALSE | TRUE
2 | xyz | T1 | TMAX | TRUE | TRUE
1 | pqr | T2 | T3-1 | FALSE | FALSE
1 | def | T3 | TMAX | TRUE | FALSE

#### How to Handle Replaces, Updates and Deletes
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

##### Replace
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

###### Example 1:
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

When `_fivetran_start` of destination table is less than `_fivetran_start` of batch file.
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved
Suppose the existing Table in destination is as below:

Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced
--- |------|----| --- | --- | --- | ---
1 | abc | 1 |T1 | T2-1 | FALSE | T100
1 | pqr | 2 | T2 | TMAX | TRUE | T101
2 | mno | 3 | T2 | TMAX | TRUE | T103

At source new records are added:
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

Id(PK) | COL1 | COL2 | Timestamp | Type
--- | --- | --- |-----------| ---
1 | def |1 | T3 | Inserted
1 | ghi | 1 | T4 | Inserted

Replace batch file will be:
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivatran_earliest | _fivetran_synced
--- |------|-------|---------------------| --- | --- | --- | ---
1 | def | 1 | T3 | T4-1 | FALSE | TRUE | T104
1 | ghi | 1| T4 | TMAX | TRUE | FALSE | T105


Final Destination Table will be:
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced
--- |---|--------|---------------------| --- |------------------| ---
1 | abc | 1 | T1 | T2-1 | FALSE | T100
1 | pqr | 2 | T2 | T3-1 | FALSE | T101
2 | mno | 3 | T3 | TMAX | TRUE | T103
1 | def | 1 |T3 | T4-1 | FALSE | T104
1 | ghi | 1 | T4 | TMAX | TRUE | T105

**Explanation:**
- We got new records for id = 1.
- Check for corresponding earliest record(`_fivetran_earliest` as TRUE), DELETE the existing records from destination table if `_fivetran_start` of destination table is greater than or equal to `_fivetran_start` of batch file(In above example no)
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved
- `_fivetran_end` of the active record in destination table is set to `_fivatran_start`-1 of the `_fivatran_earliest` record of batch file.
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved
- Set `_fivetran_active` for above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved
- New records are inserted AS IS excluding `_fivetran_earliest` column in destination table.
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

###### Example 2
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

When `_fivetran_start` of destination table is greater than or equal to `_fivetran_start` of batch file.
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved
Suppose the existing Table in destination is as below:
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced
--- |---|--------|---------------------| --- |------------------| ---
1 | xyz | 4 | T1 | T3-1 | FALSE | T100
1 | abc | 1 | T3 | T4-1 | FALSE | T100
1 | pqr | 2 | T4 | TMAX | TRUE | T101
2 | mno | 3 | T4 | TMAX | TRUE | T103

At source new records are added:
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

Id(PK) | COL1 | COL2 | Timestamp | Type
--- | --- | --- | --- | ---
1 | ghi | 1 | T2 | Inserted



Replace batch file will be:
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivatran_earliest | _fivetran_synced
--- | --- | --- | --- | --- | --- | --- | ---
1 | ghi | 1 | T2 | TMAX | TRUE | TRUE | T104

Final Destination table will be:

Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced
--- | --- | --- | --- | --- | --- | ---
1 | ghi | 1 | T2 | TMAX | TRUE | T104
1 | xyz | 4 | T1 | T3-1 | FALSE | T100
2 | mno | 3 | T4 | TMAX | TRUE | T103

**Explanation:**
We got new records for id = 1.
- Check for corresponding earliest record(`_fivetran_earliest` TRUE), DELETE the existing records from destination table if `_fivetran_start` of destination table is greater than or equal to `_fivetran_start` of batch file(in above example yes, so deleted id = 1 with _fivetran_start = T3 and T4)
- `_fivetran_end` of the active record in destination table is set to `_fivatran_start`-1 of the `_fivatran_earliest` record of batch file.
- Set `_fivetran_active` for above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE
- New records are inserted AS IS excluding `_fivetran_earliest` column in destination table.

##### Updates
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

Suppose the existing Table in destination is:

Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced
--- | --- | --- | --- | --- | --- | ---
1 | abc | 1 | T1 | T2-1 | FALSE | T100
1 | pqr | 2 | T2 | TMAX | TRUE | T101
2 | mno | 3 | T2 | TMAX | TRUE | T103


At source records with Id = 1 is updated:

Id(PK) | COL1 | Timestamp | Type
--- | --- | --- | ---
1 | xyz | T3 | Updated



And record with id = 2 is updated as:
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

Id(PK) | COL2 | Timestamp | Type
--- | --- | --- | ---
2 | 1000 | T4 | Updated

And record with Id = 1 is again updated as

Id(PK) | COL1 | Timestamp | Type
--- | --- | --- | ---
1 | def | T5 | Updated



Update batch file will be:


Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivatran_earliest | _fivatran_synced
--- | --- | --- | --- | --- | --- | --- | ---
1 | xyz | | T3| T5-1 | FALSE | TRUE | T107
2 | | 1000 | T4 | TMAX | TRUE | TRUE | T108
1 | def | | T5 | TMAX | TRUE | FALSE | T109


Final Destination Table will be:

Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced
--- | --- | --- | --- | --- | --- | ---
1 | abc | 1 | T1 | T2-1 | FALSE | T100
1 | pqr | 2 | T2 | T3-1 | FALSE | T101
2 | mno | 3 | T2 | T4-1 | FALSE | T103
1 | def | 2 | T5 | TMAX | TRUE | T109
1 | xyz | 2 | T3 | T5-1 | FALSE | T107
2 | mno | 1000 | T4 | TMAX | TRUE | T108



**Explanation:**
- In batch file we got records with id = 1 and id = 2.
- We set other columns(non updated columns) to the values of the active records. In above case for id = 2, we didn’t get COL1 value, so we set COL1 to “mno”(COL1 value of the active record)
- _fivetran_end of the active record in destination table is set to _fivatran_start-1 of the _fivatran_earliest record of batch file
- Set _fivetran_active for above updated record to FALSE and deleted_column(if present in destination table) to TRUE
- Other columns are set AS IS from the batch file in the destination table except _fivetran_earliest column.


##### Deletes
fivetran-abdulsalam marked this conversation as resolved.
Show resolved Hide resolved

Existing Table in destination:

Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced
--- | --- | --- | --- | --- | --- | ---
1 | abc | 1 | T1 | T2-1 | FALSE | T100
1 | pqr | 2 | T2 | TMAX | TRUE | T101
2 | mno | 3 | T2 | TMAX | TRUE | T103



At source a record is deleted:


Id(PK) | Timestamp | Type
--- | --- | ---
1 | T3 | Deleted


Delete batch file will be:

Id(PK) | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_earliest | _fivetran_synced
--- | --- |---------------|------| --- | ---
1 | | T3-1 | | TRUE | T104


Final Destination Table will be:

Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced
--- | --- | --- | --- | --- |------------------| ---
1 | abc | 1 | T1 | T2-1 | FALSE | T100
1 | pqr | 2 | T2 | T3-1 | FALSE | T101
2 | mno | 3 | T2 | TMAX | TRUE | T103

**Explanation:**
Set `_fivetran_active` to FALSE for the active record and set `_fivetran_end` = T3-1 and `deleted_column`(if present in destination) to TRUE


Loading