Skip to content
karafecho edited this page Sep 8, 2023 · 2 revisions

Drug Conflator

Description

The Drug Conflator is a tool to identify "essentially the same" drugs based on the drug information (e.g., ingredients, brand name, drug components, clinical groups) provided by RxNav Service. For any given drug curie, this tool can return all RXCUI identifiers related to this curie. When drugs have certian amount of overlapping RXCUI identifiers, they are considered as "essentially the same". Based on the overlapping RXCUI identifiers, this tool can tell whether two given drug curies should be conflated together or not based on a specific threshold.

The process to find the RXCUI identifiers for a curie is as follows

Find all equivalent curies and names of a given curie using both [node normalizer](https://github.com/TranslatorSRI/NodeNormalization) and [node synnoymizer](https://github.com/RTXteam/RTX/tree/master/code/ARAX/NodeSynonymizer).
If the equivalent curies include the identifiers from ATC, Drugbank, GCN_SEQNO(NDDF), HIC_SEQN(NDDF), MESH, UNII_CODE(UNII), VUID(VANDF), we call [findRxcuiById](https://lhncbc.nlm.nih.gov/RxNav/APIs/api-RxNorm.findRxcuiById.html) API to get the corresponding RXCUI identifiers. For each returned RXCUI ID, we collect all related RXCUI IDs from [RxNav](https://mor.nlm.nih.gov/RxNav/) Service according to ingredient, precise ingredient, brand name, clinical drug component, branded drug component, clinical drug or pack, branded drug or pack, clinical dose form group, and branded dose form group
If the equivalent curies include the identifers from CHEMBL, UMLS, KEGG.DRUG, DRUGBANK, NCIT, CHEBI, VANDF, HMDB, DrugCentral, UNII, we call [mychem.info](https://mychem.info/) API to get the corresponding RXCUI identifiers.
For each equivalent name, we leverage the [getApproximateMatch](https://lhncbc.nlm.nih.gov/RxNav/APIs/api-RxNorm.getApproximateMatch.html) API with rank==1 to get the corresponding RXCUI identifiers (if can be turned off by setting use_curie_name = False in get_rxcui_results).

Given the RxCUI lists of two query cureis, we provide two metrics to determine how close they are:

Jaccard Similarity $$ JS(A, B) = |A ∩ B| / |A ∪ B| $$
Max Containment $$ MC(A, B) = |A ∩ B| / min(|A|, |B|) $$ where A is the RxCUI list of curie1 and B is the RxCUI list of curie2.

To review code, see https://github.com/RTXteam/DrugConflator.

Clone this wiki locally