Skip to content

Commit

Permalink
doc(development-guide): add details for history mode (#84)
Browse files Browse the repository at this point in the history
* added history mode guide

* doc grammar update

* Update how-to-handle-history-mode-files.md

* incorporate tech writer changes

* small refactor:

* more refactor

* changed grammar

* changed insert to upserted

* incorporate more comments

* Apply suggestions from code review

* Update how-to-handle-history-mode-batch-files.md

* removed deleted_column

* Apply suggestions from code review

* doc(development-guide): add details for history mode

* doc(development-guide): add diagram

* doc(development-guide): add diagram

* Update development-guide.md

Co-authored-by: Alex Ilyichov <[email protected]>

* Update development-guide.md

Co-authored-by: Alex Ilyichov <[email protected]>

* Update development-guide.md

* Update how-to-handle-history-mode-batch-files.md

* Update how-to-handle-history-mode-batch-files.md

Co-authored-by: Alex Ilyichov <[email protected]>

* Update development-guide.md

Co-authored-by: Alex Ilyichov <[email protected]>

* Update how-to-handle-history-mode-batch-files.md

Co-authored-by: Alex Ilyichov <[email protected]>

* Update how-to-handle-history-mode-batch-files.md

* Add files via upload

* Update how-to-handle-history-mode-batch-files.md

* update image

---------

Co-authored-by: fivetran-abdulsalam <[email protected]>
Co-authored-by: Alex Ilyichov <[email protected]>
Co-authored-by: Rishabh Ghosh <[email protected]>
  • Loading branch information
4 people authored Jan 30, 2025
1 parent 04b72eb commit 31d2f13
Show file tree
Hide file tree
Showing 3 changed files with 105 additions and 0 deletions.
5 changes: 5 additions & 0 deletions development-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,11 @@ Also, Fivetran deduplicates operations such that each primary key shows up only
- `null_string` value is used to represent `NULL` value in all batch files.
- `unmodified_string` value is used to indicate columns in `update_files` where the values did not change.

#### WriteHistoryBatchRequest
The `WriteHistoryBatchRequest` RPC call provides details about the batch files containing the records to be written to the destination for [**History Mode**](https://fivetran.com/docs/using-fivetran/features#historymode). In addition to the parameters of the [`WriteBatchRequest`](#writebatchrequest), this request also contains the `earliest_start_files` parameter used for updating history mode-specific columns for the existing rows in the destination.

> NOTE: To learn how to handle `earliest_start_files`, `replace_files`, `update_files` and `delete_files` in history mode, follow the [How to Handle History Mode Batch Files](how-to-handle-history-mode-batch-files.md) guide.
### Examples of Data Types
Examples of each [DataType](https://github.com/fivetran/fivetran_sdk/blob/main/common.proto#L73C6-L73C14) as they would appear in CSV batch files are as follows:
- UNSPECIFIED: This data type never appears in batch files
Expand Down
Binary file added history_mode.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
100 changes: 100 additions & 0 deletions how-to-handle-history-mode-batch-files.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# What is History Mode

History mode allows to capture every available version of each record from Fivetran source connectors.
In order to keep all versions of the records, the following new system columns are added to tables with history mode enabled:


Column | Type | Description
--- | --- | ---
_fivetran_active | Boolean | TRUE if it is the currently active record. FALSE if it is a historical version of the record. Only one version of the record can be TRUE.
_fivetran_start | TimeStamp | The time when the record was first created or modified in the source.
_fivetran_end | TimeStamp | The value for this column depends on whether the record is active. If the record is not active, then `_fivetran_end` value will be `_fivetran_start` of the next version of the record minus 1 millisecond. If the record is deleted, then the value will be the same as the timestamp of the delete operation. If the record is active, then `_fivetran_end` is set to maximum TIMESTAMP value.


### How to track historical changes?

The following types of files are a part of the `WriteHistoryBatchRequest` gRPC call. These files must be processed in the exact order as described in the following subsections.

#### `earliest_start_files`

In `WriteHistoryBatchRequest`, we pass a new field, `earliest_start_files`. This file contains a single record for each primary key in the incoming batch, with the earliest `_fivetran_start`.

For this file, the following operations must be implemented in the exact order as they are listed:
1. Removing any overlapping records where existing `_fivetran_start` is greater than the `earliest_fivetran_start` timestamp value in the `earliest_start_files` file:

```sql
DELETE FROM <schema.table> WHERE pk1 = <val> {AND pk2 = <val>.....} AND _fivetran_start >= val<_earliest_fivetran_start>;
```

3. Updating of the values of the history mode-specific system columns `fivetran_active` and `fivetran_end` in the destination.

```sql
UPDATE <schema.table> SET fivetran_active = FALSE, _fivetran_end = earliest_fivetran_start - 1 msec WHERE _fivetran_active = TRUE AND pk1 = <val> {AND pk2 = <val>.....}`
```

#### `update_files`

This file contains records where only some column values where modified in the source. The modified column values are provided as they are in the source whereas the columns without changes in the source are assigned the `unmodified_string` value. For such records, all column values must be populated before the records are inserted to the table in the destination. The column values that are not modified in the source, i.e. that are `unmodified_string`, are populated with the corresponding column's value of the the last active record in the destination, i.e., the record that has the same primary key and `_fivetran_active` set to `true`.
Suppose the existing table in destination is as follows:
ID(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced
--- | --- | --- | --- | --- | --- | ---
1 | abc | 1 | T1 | T2-1 | FALSE | T100
1 | pqr | 2 | T2 | TMAX | TRUE | T101
2 | mno | 3 | T2 | TMAX | TRUE | T103
At the source, the record with ID = 1 is updated:
ID(PK) | COL1 | Timestamp | Type
--- | --- | --- | ---
1 | xyz | T3 | Updated
and the record with ID = 2 is updated:
Id(PK) | COL2 | Timestamp | Type
--- | --- | --- | ---
2 | 1000 | T4 | Updated
And lastly, the record with ID = 1 is updated again:
Id(PK) | COL1 | Timestamp | Type
--- | --- | --- | ---
1 | def | T5 | Updated
The update batch file will be as follows:
ID(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced
--- | --- | --- | --- | --- | --- | ---
1 | xyz | | T3| T5-1 | FALSE | T107
1 | def | | T5 | TMAX | TRUE | T109
2 | | 1000 | T4 | TMAX | TRUE | T108
The final destination table will be as follows:
ID(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced
--- | --- | --- | --- | --- | --- | ---
1 | abc | 1 | T1 | T2-1msec | FALSE | T100
1 | pqr | 2 | T2 | T3-1msec | FALSE | T101
2 | mno | 3 | T2 | T4-1msec | FALSE | T103
1 | def | 2 | T5 | TMAX | TRUE | T109
1 | xyz | 2 | T3 | T5-1msec | FALSE | T107
2 | mno | 1000 | T4 | TMAX | TRUE | T108
The column values that were not modified in the source are set to the values of the active records. In this example, for ID = 2, we didn’t get COL1 value from the source, so we set COL1 to “mno” (COL1 value of the active record in the destination).
#### `upsert_files`
For upsert files, the column values are inserted in the destiation table. This is the case where all column values are modified in the source, as per incoming batch.
#### `delete_files`
For the active record (the one that has `_fivetran_active = TRUE`) with a given primary key in the destination, the `_fivetran_active` column value is set to FALSE, and the `_fivetran_end` column value is set to the `_fivetran_end` column value of the record with the same primary key in the batch file.
![History Mode Batch File](./history_mode.png)

0 comments on commit 31d2f13

Please sign in to comment.