-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
doc(development-guide): add details for history mode (#84)
* added history mode guide * doc grammar update * Update how-to-handle-history-mode-files.md * incorporate tech writer changes * small refactor: * more refactor * changed grammar * changed insert to upserted * incorporate more comments * Apply suggestions from code review * Update how-to-handle-history-mode-batch-files.md * removed deleted_column * Apply suggestions from code review * doc(development-guide): add details for history mode * doc(development-guide): add diagram * doc(development-guide): add diagram * Update development-guide.md Co-authored-by: Alex Ilyichov <[email protected]> * Update development-guide.md Co-authored-by: Alex Ilyichov <[email protected]> * Update development-guide.md * Update how-to-handle-history-mode-batch-files.md * Update how-to-handle-history-mode-batch-files.md Co-authored-by: Alex Ilyichov <[email protected]> * Update development-guide.md Co-authored-by: Alex Ilyichov <[email protected]> * Update how-to-handle-history-mode-batch-files.md Co-authored-by: Alex Ilyichov <[email protected]> * Update how-to-handle-history-mode-batch-files.md * Add files via upload * Update how-to-handle-history-mode-batch-files.md * update image --------- Co-authored-by: fivetran-abdulsalam <[email protected]> Co-authored-by: Alex Ilyichov <[email protected]> Co-authored-by: Rishabh Ghosh <[email protected]>
- Loading branch information
1 parent
04b72eb
commit 31d2f13
Showing
3 changed files
with
105 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
# What is History Mode | ||
|
||
History mode allows to capture every available version of each record from Fivetran source connectors. | ||
In order to keep all versions of the records, the following new system columns are added to tables with history mode enabled: | ||
|
||
|
||
Column | Type | Description | ||
--- | --- | --- | ||
_fivetran_active | Boolean | TRUE if it is the currently active record. FALSE if it is a historical version of the record. Only one version of the record can be TRUE. | ||
_fivetran_start | TimeStamp | The time when the record was first created or modified in the source. | ||
_fivetran_end | TimeStamp | The value for this column depends on whether the record is active. If the record is not active, then `_fivetran_end` value will be `_fivetran_start` of the next version of the record minus 1 millisecond. If the record is deleted, then the value will be the same as the timestamp of the delete operation. If the record is active, then `_fivetran_end` is set to maximum TIMESTAMP value. | ||
|
||
|
||
### How to track historical changes? | ||
|
||
The following types of files are a part of the `WriteHistoryBatchRequest` gRPC call. These files must be processed in the exact order as described in the following subsections. | ||
|
||
#### `earliest_start_files` | ||
|
||
In `WriteHistoryBatchRequest`, we pass a new field, `earliest_start_files`. This file contains a single record for each primary key in the incoming batch, with the earliest `_fivetran_start`. | ||
|
||
For this file, the following operations must be implemented in the exact order as they are listed: | ||
1. Removing any overlapping records where existing `_fivetran_start` is greater than the `earliest_fivetran_start` timestamp value in the `earliest_start_files` file: | ||
|
||
```sql | ||
DELETE FROM <schema.table> WHERE pk1 = <val> {AND pk2 = <val>.....} AND _fivetran_start >= val<_earliest_fivetran_start>; | ||
``` | ||
|
||
3. Updating of the values of the history mode-specific system columns `fivetran_active` and `fivetran_end` in the destination. | ||
|
||
```sql | ||
UPDATE <schema.table> SET fivetran_active = FALSE, _fivetran_end = earliest_fivetran_start - 1 msec WHERE _fivetran_active = TRUE AND pk1 = <val> {AND pk2 = <val>.....}` | ||
``` | ||
|
||
#### `update_files` | ||
|
||
This file contains records where only some column values where modified in the source. The modified column values are provided as they are in the source whereas the columns without changes in the source are assigned the `unmodified_string` value. For such records, all column values must be populated before the records are inserted to the table in the destination. The column values that are not modified in the source, i.e. that are `unmodified_string`, are populated with the corresponding column's value of the the last active record in the destination, i.e., the record that has the same primary key and `_fivetran_active` set to `true`. | ||
Suppose the existing table in destination is as follows: | ||
ID(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced | ||
--- | --- | --- | --- | --- | --- | --- | ||
1 | abc | 1 | T1 | T2-1 | FALSE | T100 | ||
1 | pqr | 2 | T2 | TMAX | TRUE | T101 | ||
2 | mno | 3 | T2 | TMAX | TRUE | T103 | ||
At the source, the record with ID = 1 is updated: | ||
ID(PK) | COL1 | Timestamp | Type | ||
--- | --- | --- | --- | ||
1 | xyz | T3 | Updated | ||
and the record with ID = 2 is updated: | ||
Id(PK) | COL2 | Timestamp | Type | ||
--- | --- | --- | --- | ||
2 | 1000 | T4 | Updated | ||
And lastly, the record with ID = 1 is updated again: | ||
Id(PK) | COL1 | Timestamp | Type | ||
--- | --- | --- | --- | ||
1 | def | T5 | Updated | ||
The update batch file will be as follows: | ||
ID(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced | ||
--- | --- | --- | --- | --- | --- | --- | ||
1 | xyz | | T3| T5-1 | FALSE | T107 | ||
1 | def | | T5 | TMAX | TRUE | T109 | ||
2 | | 1000 | T4 | TMAX | TRUE | T108 | ||
The final destination table will be as follows: | ||
ID(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced | ||
--- | --- | --- | --- | --- | --- | --- | ||
1 | abc | 1 | T1 | T2-1msec | FALSE | T100 | ||
1 | pqr | 2 | T2 | T3-1msec | FALSE | T101 | ||
2 | mno | 3 | T2 | T4-1msec | FALSE | T103 | ||
1 | def | 2 | T5 | TMAX | TRUE | T109 | ||
1 | xyz | 2 | T3 | T5-1msec | FALSE | T107 | ||
2 | mno | 1000 | T4 | TMAX | TRUE | T108 | ||
The column values that were not modified in the source are set to the values of the active records. In this example, for ID = 2, we didn’t get COL1 value from the source, so we set COL1 to “mno” (COL1 value of the active record in the destination). | ||
#### `upsert_files` | ||
For upsert files, the column values are inserted in the destiation table. This is the case where all column values are modified in the source, as per incoming batch. | ||
#### `delete_files` | ||
For the active record (the one that has `_fivetran_active = TRUE`) with a given primary key in the destination, the `_fivetran_active` column value is set to FALSE, and the `_fivetran_end` column value is set to the `_fivetran_end` column value of the record with the same primary key in the batch file. | ||
![History Mode Batch File](./history_mode.png) |