Criteria2Query 2.0 is published! Online Demo
Criteria2Query (C2Q) is an automatic cohort identification system. It enhances human-computer collaboration to convert complex eligibility criteria text into more accurate and feasible cohort SQL queries. It synergizes machine efficiency and human intelligence of domain experts to enable real-time user intervention for criteria selection and simplification, parsing error correction, and context-dependent concept mapping.
- An editable user interface with functions to prioritize or simplify the eligibility criteria text for cohort querying;
- Accessible and portable cohort SQL query formulation based on the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) version 5;
- Real-time cohort query execution with result visualization.
- Java 8+
- Apache Maven 3
- Apache Tomcat
- Python 3.7.6+
- PostgreSQL DBMS (to demonstrate the real-time cohort SQL query execution, not strictly required)
- SynPUF_1K and SynPUF_5% datasets in CDM Version 5.2.2 format.
- OMOP CDM Vocabulary version 5 files. These can be obtained from Athena.
- These are for the demonstration of real-time cohort SQL query execution (not strictly required)
-
Download and install everything based on the system requirements above.
-
Git clone this repository.
-
Download the negation scope detection model and move it to the folder
NegationDetection
. -
Create a virtual environment in Python and install packages based on
venv_requirements.txt
. (Instruction: https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/) -
Change the directories of Negation Detection and the Python virtual environment in the file
/criteria2query/src/main/java/edu/columbia/dbmi/ohdsims/pojo/GlobalSetting.java
//Change the directories (examples)
public final static String negateDetectionFolder = "/opt/tomcat/NegationDetection";
public final static String virtualEnvFolder = "/opt/tomcat/python_virtualenvs/C2Q_NEGATION/bin"; // or "D:\\C2Q\\python_virtualenvs\\C2Q_NEGATION\\Scripts";
-
Import SynPUF_1K and SynPUF_5% datasets to your PostgreSQL DBMS (You can skip this step if they are already imported.)
- Download the SynPUF_1K and SynPUF_5% datasets in CDM Version 5.2.2 format.
- Download the OMOP CDM Vocabulary version 5 files from Athena.
- Follow the instruction here (https://github.com/OHDSI/CommonDataModel/tree/v5.2.2/PostgreSQL) to create your instantiations of the Common Data Model for SynPUF_1K and SynPUF_5%, respectively.
-
Connect to your own database (SynPUF_1K and SynPUF_5%) by changing the URL, user, and password in the file
/criteria2query/src/main/java/edu/columbia/dbmi/ohdsims/pojo/GlobalSetting.java
//Connect to the databases
public final static String databaseURL1K = "jdbc:postgresql://localhost/synpuf1k";
public final static String databaseURL5pct = "jdbc:postgresql://localhost/synpuf5pct";
public final static String databaseUser = "Please connect to a database.";
public final static String databasePassword = "*****";
- Deploy C2Q and visit it in your web browser.
Fang, Y., Idnay, B., Sun, Y., Liu, H., Chen, Z., Marder, K., Xu, H., Schnall, R., & Weng, C. (2022). Combining human and machine intelligence for clinical trial eligibility querying. Journal of the American Medical Informatics Association : JAMIA, ocac051. Advance online publication. https://doi.org/10.1093/jamia/ocac051
Yuan, C., Ryan, P. B., Ta, C., Guo, Y., Li, Z., Hardin, J., Makadia, R., Jin, P., Shang, N., Kang, T., & Weng, C. (2019). Criteria2Query: a natural language interface to clinical databases for cohort definition. Journal of the American Medical Informatics Association : JAMIA, 26(4), 294–305. https://doi.org/10.1093/jamia/ocy178
If you have any questions/comments/feedback, please submit a form here or contact Dr. Chunhua Weng at Columbia University.