-
Notifications
You must be signed in to change notification settings - Fork 2
Welcome
Welcome to Seattle Flu Study (SFS) GitHub documentation! If you're reading this, you're probably new here. This page, while currently a stub, dreams of being a one-stop shop for getting you set up and answering your questions before you start using or contributing to our code.
The Seattle Flu Study practices open science. This means that our code and documentation (with very few, notable exceptions) is all open source. Even though our work can be highly specific and not always applicable to the public domain, we use a "public by default" rather than a "private by default" approach to our developer material. We only privatize repositories or documentation that contain sensitive data or whose release could pose a security risk to the study or its members. Here is a quick guide to what's safe to share and what's not.
- Documentation that is Seattle Flu specific but not sensitive
- Links to Slack conversations
- Links to Trello cards
- Links to Metabase (or other apps at backoffice.seattleflu.org)
- First names of study team contacts
- Personally identifiable information of study participants or employees, including full names or email addresses
- Real study barcodes
- Secrets
- Passwords
- API authorization IDs
- API access tokens
- deidentification secrets
- etc.
- Links to Google Drive (the risk is that link-based sharing could be turned on by accident for documents/folders, allowing access to anyone with the link)
Slack is the primary method of communication used by Seattle Flu Study members. Our culture is to post messages in public channels wherever possible so that as much context as possible can be gained by relevant parties. We recognize that private channels are often appropriate for certain types of data-sensitive or intra-team communication, but we discourage the use of direct messages for study-related questions, requests, or planning.
An overview of the SFS slack:
- #channel-map - organized list of broad channels and their purposes
- #directory - study-wide directory
Noteworthy Slack channels for SFS developers include:
- #barcodes - used for requesting newly minted barcodes. See the channel description for the upload destination for new labels.
- #clia - used for questions about CLIA compliance
- #data-transfer-ellume - a shared channel used to communicate with Ellume about data sharing
- #data-transfer-labmed (private) - a shared channel used to communicate with UW SecureLink about data sharing
- #data-transfer-nwgc - used to communicate with Northwest Genomics Center about data sharing
- #data-transfer-retrospectives - used to communicate with Seattle Childrens about data sharing
- #id3c - used to discuss the ID3C code base
- #id3c-alerts - alert system for cronjobs that run on the backoffice server
- #informatics - questions about SFS ID3C views or Seattle Flu Metabase queries
- #ncov-reporting (private) - alert system for positive or inconclusive hCoV-19 results
- #record-troubleshooting - used to communicate bad barcodes or REDCap records
- #redcap - general REDCap questions
- #website - questions about the public-facing study website (currently managed by Formative)
Before you can get access to sensitive data, you must complete all required trainings and sign the confidentiality agreement. Please get in touch with the management team to complete all the necessary paperwork!
You will need access to the following:
- ID3C - basic user, admin user, & postgres user
- Seattle Flu Metabase - Admin, Seattle Flu Study, & hCoV-19 visibility groups
-
Seattle Flu GitHub
- private specimen-manifests repo
- private security-audit repo (additional documentation)
- LabMed private securelink repo
- Seattle Flu "backoffice" server, the core of our infrastructure
- AWS access
- Seattle Flu Study AWS account
- Fred Hutch Bedford S3 Bucket
- Securelink S3 Bucket
- UW OneDrive - for specimen manifest sheets
- UW ITHS REDCap plus access to all SFS/SCAN projects
- SCAN Switchboard
- Kaiser Permanente Secure File Transfer
- #record-troubleshooting Trello Board
SFS developers maintain the following codebases and apps.
Core ID3C and its extensions, ID3C-customizations, are the repositories that contain our CLI tools, ETL pipelines, database schemas, database roles, and API endpoints.
The long term vision for ID3C's design is to create a general platform that can be used by other researchers, in other cities, for other organisms. De-coupling SFS/SCAN-specific code from ID3C and porting it to ID3C-customizations is still a work in progress.
Data sources include prospective (SFS or SCAN enrollments via REDCap) and retrospective (Seattle Childrens, UW Medicine, Kaiser Permanente, or Fred Hutch data exports)
See more details about data ingestion at data-flow.
There are data quality checks throughout the ingestion and ETL pipelines in ID3C/ID3C-customizations. Errors and warnings are posted in the #id3c-alerts Slack channel.
See more details about how to handle common errors at troubleshooting.
The dev team gets requests for minting new barcodes in the #barcodes Slack channel. More detailed instructions are available at barcodes.
CLIA certified lab results are returned to participants via a web portal. The dev team is responsible for generating PDF reports for these lab results, exporting the results and PDFs to UW LabMed, and maintaining the SCAN customizations for the web portal.
The lab-result-reports repo contains the templates and code for generating the individual PDF reports for lab results.
SCAN is currently using the UW LabMed results portal since their infrastructure is considered CLIA approved. The securelink repo contains the code for the results portal. The SFS dev team is only responsible for maintaining the front-end customizations for the SCAN study.
Used with barcode minting.
Metabase is an open-source, B.I. tool that is used widely across the SFS. The Seattle Flu Metabase service is documented here.
The SCAN Switchboard is an internal tool built to speed up the lab's unboxing and quality control processes for recieved SCAN kits. The source code is separate from the deployment configuration.
Git is our version control tool, and our repositories are all hosted on GitHub. We generally follow these guidelines for writing git commit messages. We typically do development in feature branches and merge into master. Generally, we deploy from master (or images/snapshots created from master). Before merging, we rebase our commits to create the most human-readable history of our codebase as possible.
We use GitHub Actions in the following repositories:
Python 3 is the programming language of choice for our ETL pipelines. We also use it for other tasks or services in the backoffice repo.
Across our codebases, we manage Python dependencies with Pipenv. See an example Pipfile here.
We use Flask to set up the ID3C web API.
We use Click to set up the ID3C command-line interface.
We're currently using PostgreSQL version 10 for our production database with plans to eventually upgrade to PostgreSQL 12.
See ID3C's design for a motivation on the schema setup of our database.
See this flowchart for an overview of the warehouse
schema of the seattleflu
database.
See our motivation for using sqitch.
See our high level AWS documentation.
We currently don't have many tests in ID3C, as we tend to rely more heavily on our alert system (see the #id3c-alerts Slack channel).
Run doctests in the ID3C or ID3C-customizations repo with pytest -v
.
Run mypy type checking in the ID3C or ID3C-customizations repo with ./dev/mypy
.
REDCap is an online tool used to build and manage surveys for collecting information from SFS/SCAN study participants. There is a REDCap team within SFS that works with UW ITHS to build and manage these surveys. Therefore, the dev team is not immediately responsible for these surveys, but it is useful to understand how REDCap works to debug ingestion and data quality issues that may arise.
See ITHS REDCap Training to sign up for classes and/or read training materials.
The REDCap CLI can be helpful for digging into a project's data.
To be able to better integrate with other data systems, we have begun adopting HL7 FHIR vocabulary and documents wherever possible. Our REDCap DET ETLs produce FHIR bundles that then get processed by our FHIR ETL. See some minimal FHIR bundle examples.
- Refresh your local dev database with a copy from production
From within the id3c
or id3c-customizations
repo, run the following:
PGDATABASE=seattleflu pipenv run id3c --help
Here's how you might see the identifier (barcode) sets in our production instance:
PGSERVICE=seattleflu-production pipenv run id3c identifier set ls
whereas in local testing, I'd do:
PGDATABASE=seattleflu pipenv run id3c identifier set ls
PGSERVICE points to a named service definition in a ~/.pg_service.conf
file.
ID3C doesn't provide application-specific connection defaults, and it relies on the standard Pg environment variables to define the connection.
Set the processing log for all targeted rows to be blank. When any ETL, like the REDCap DET ETL, is run, it will process all rows with a blank processing log for the current revision (e.g. 12). Running the following code will make the affected rows (recieved on or after Jun 06, 2020) be picked up in a subsequent REDCap DET ETL run:
update
receiving.redcap_det
set
processing_log = '[]'
where
received > '2020-06-25';
To process redcap-det
, run
PGDATABASE=seattleflu pipenv run id3c etl redcap-det --prompt
To process clinical
run
PGDATABASE=seattleflu pipenv run id3c etl clinical --prompt
If you don't have LOG_LEVEL=debug
turned on, full logs should be available on your system via
grep seattleflu /var/log/syslog
We configure our editors to use spaces instead of tabs, trim trailing whitespace at the end of each line, and add an empty newline to the end of each file.
If you use VS Code, add these settings: "files.insertFinalNewline": true, "files.trimFinalNewlines": true, "files.trimTrailingWhitespace": true,
We add the following configuration to our global .gitconfig
:
[pull]
rebase = true
When pulling from master, local changes are now rebased on top of the master branch instead of creating "Merge branch 'master'..." commits.
Set pager environment variable to less
and specify which less
method to use.
If there's less than a screen full of information, don't page it.
PAGER=less LESS=SFRXi psql seattleflu
To save these settings, add the following lines to your ~/.psqlrc
.
\setenv PAGER less
\setenv LESS SRFXi