Skip to content

Latest commit

 

History

History
160 lines (117 loc) · 5.8 KB

README.md

File metadata and controls

160 lines (117 loc) · 5.8 KB

CMS De-Identification

Empty tables (just like the identified ones) are created in the de-identified schema. Then, following the KUMC/HERON De-Identification Strategy, the de-identified tables are loaded with data such that:

Drop/build i2b2-style patient mapping table in the DEID schema:

# Drop/create deid mapping tables on the DEID side
cd deid
sqlplus $DEID_USER/$DEID_PASSWORD@$ORACLE_SID <<EOF

set echo on;

whenever sqlerror continue;
start i2b2_patient_mapping_init.sql;

EOF

Build some helper functions from the ETL code in the ID schema (cms_source_cd is the i2b2-style source string for the patient mapping).:

# Build functions, constants include in the ETL code
cd etl_i2b2/sql_scripts
sqlplus $ID_USER/$ID_PASSWORD@$ORACLE_SID <<EOF

set echo on;

whenever oserror exit 9;
whenever sqlerror exit sql.sqlcode;

define cms_source_cd='''${cms_source_cd}''';
define design_digest='''dummy'''

start cms_keys.pls

EOF

Build the mappings in the ID schema - pmap_upload_id, cms_source_cd, and pmap_project_id are i2b2 style columns for the patient mapping.

# Build mappings
cd deid
sqlplus $ID_USER/$ID_PASSWORD@$ORACLE_SID <<EOF

set echo on;

whenever oserror exit 9;
whenever sqlerror exit sql.sqlcode;

define cms_source_cd=${cms_source_cd};
define design_digest=dummy
start cms_mapping_init.sql;

define deid_schema=${DEID_USER};
define upload_id=${pmap_upload_id};
define cms_source_cd=${cms_source_cd};
define project_id=${pmap_project_id};
--bene_id_deid_start = previous year's max bene_id_deid + 1
define bene_id_deid_seq=${bene_id_deid_seq};
--msis_id_deid_seq_start =previous year's max msis_id_deid + 1 
define msis_id_deid_seq=${msis_id_deid_seq};

start bene_id_mapping.sql;

Indexes for the patient mapping:

# Build mapping indexes on the DEID side
cd deid
sqlplus $DEID_USER/$DEID_PASSWORD@$ORACLE_SID <<EOF

set echo on;

whenever sqlerror continue;
start i2b2_patient_mapping_index.sql;

EOF

Drop/create empty CMS tables in the DEID schema using oracle_create.sql generated by the staging code:

# Drop/create deid tables
sqlplus $DEID_USER/$DEID_PASSWORD@$ORACLE_SID <<EOF

set echo on;

whenever sqlerror continue;
start oracle_drop.sql; --This .sql is generated using parse_fts.py

whenever oserror exit 9;
whenever sqlerror exit sql.sqlcode;

start oracle_create.sql; --This .sql is generated using parse_fts.py
EOF

Load the de-identified tables from the identified copy:

# Run the deid script
cd deid

sqlplus $ID_USER/$ID_PASSWORD@$ORACLE_SID <<EOF

set echo on;

whenever oserror exit 9;
whenever sqlerror exit sql.sqlcode;

start cms_deid_init.sql;
start events.sql;
start hipaa_dob_shift.sql;

define deid_schema=${DEID_USER}; --Variable to replace "&&deid_schema" with the appropriate schema name

start cms_deid.sql;
EOF

Drop/create i2b2-style patient mapping.

Initialize tables mapping identified and de-identified data.

Create a mapping between bene_id (or, msis_id + state_cd) and a sequence number representing the de-identified bene_id/msis_id. The mapping also includes the per-person random date shift and the date of birth shift (or, NULL if no shift is required) and the date of birth shift noted above. This file also builds an i2b2-style patient mapping table.

Indexes for i2b2-style patient mappings.

Drop/create intermediate tables, sequences, etc. in preparation for de-identification.

Any date field is considered an "event". A table is created with person identifier columns (bene_id for Medicare tables or potentially msis_is + state_cd for Medicaid data) and a column for the date field. This table is used to calculate how far to shift birth dates forward in time to make sure that no person appears to be over the age of 89.

Calculate the date of birth shift based on the "events" table.

Tests to help make sure that de-identification was done properly.

Populates the de-id CMS tables with de-identified data based on the date/age shifts calculated prior.

"Helper" script used to generate cms_deid.sql and events.sql.

Usage:
gen_deid_sql.py <path/to/oracle_create.sql> <path/to/table_column_desc.csv> <cms_deid_sql|date_events>
  • <path/to/oracle_create.sql> is the path to the "create table" script created by the staging code.
  • <path/to/table_column_desc.csv> is the path to the table/column description .csv file also created by the staging code.
  • Option cms_deid_sql means the script generates the deid sql (cms_deid.sql)
  • Option date_events means the script generates the events sql (events.sql)

work in progress