NCATS Text Mining Provider Roadmap

This repository serves as a centralized location for organizing and tracking the development of the NCATS Translator Text Mining Provider. The Text Mining Provider aims to provide an up-to-date, Biolink-compatible, knowledge graph composed of assertions mined from the available full-text biomedical literature. While tools are developed to extract Biolink associations from the literature, a knowledge graph based on the cooccurrence of concepts in sentences will be used as a proxy, indicating a related_to relation between concepts.

A project board is available to monitor progress of the milestones from the initial Text Mining Provider proposal.

Feature requests and issues

The Text Mining Provider aims to serve the needs of the NCATS Translator community, and thus development efforts are prioritized based on community input and feedback. If you have a specific target you would like the Text Mining Provider to address, or would like to report a text mining error please submit an issue to this repository.

A project board is available to monitor feature requests and issues as they migrate through the different stages of development.

Available Knowledge Graphs

Targeted text-mined association KG
- a KG of Biolink associations extracted from scientific text
Ontology concept cooccurrence KG
- a KG linking ontology concepts that cooccur in scientific text, scored using Google normalized distance
Ontology KGs
- KGs containing the subclass and relation hierarchies of biomedical ontologies normalized to Biolink concepts/relations

Concept Recognition

Currently, the Text Mining Provider extracts mentions of concepts from ten Biolink-compatible Open Biomedical Ontologies:

Evaluation of concept recognition on the CRAFT test corpus

The CRAFT corpus contains an evaluation set of 30 full text articles that have been used to evaluate the concept recognition systems used by the Text Mining Provider as they have developed. Versions of the concept recognition systems used for available KGs and their respective performances are detailed below.

Ontology	Version	P	R	F
CHEBI OGER	0.1.0	0.6997	0.6609	0.6797
CHEBI OGER+CRF	0.2.0	0.8559	0.5536	0.6723

CL OGER	0.1.0	0.7849	0.6712	0.7236
CL OGER+CRF	0.2.0	0.7862	0.6419	0.7067

GO_BP OGER	0.1.0	0.5137	0.2823	0.3644
GO_BP OGER+CRF	0.2.0	0.5863	0.2405	0.3411

GO_CC OGER	0.1.0	0.8004	0.8447	0.8220
GO_CC OGER+CRF	0.2.0	0.9712	0.7801	0.8652

GO_MF OGER	0.1.0	0.7467	0.6127	0.6731
GO_MF OGER+CRF	0.2.0	0.8135	0.4956	0.6159

MOP OGER	0.1.0	0.3043	0.6306	0.4106
MOP OGER+CRF	0.2.0	0.6000	0.3243	0.4211

NCBITAXON OGER	0.1.0	0.4861	0.7816	0.5994
NCBITAXON OGER+CRF	n/a	0.5749	0.7511	0.6513
NCBITAXON OGER+CRF+USE_GENERAL	0.2.0	0.9222	0.7511	0.8279

PR OGER	0.1.0	0.3433	0.8121	0.4826
PR OGER+CRF	n/a	0.5097	0.6445	0.5693
PR OGER+CRF+CAT=GENE	0.2.0	0.6690	0.6448	0.6567

SO OGER	0.1.0	0.4096	0.5257	0.4604
SO OGER+CRF	0.2.0	0.6554	0.5018	0.5684

UBERON OGER	0.1.0	0.8162	0.5842	0.6810
UBERON OGER+CRF	0.2.0	0.9266	0.5062	0.6547

BioLink Association Extraction

Initial development of tools to extract explicit Biolink associations is underway. For details, please see these issues.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
sample-kg		sample-kg
LICENSE		LICENSE
README.md		README.md
README_assoc_kgs.md		README_assoc_kgs.md
README_cooccur_kgs.md		README_cooccur_kgs.md
README_ontology_kgs.md		README_ontology_kgs.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NCATS Text Mining Provider Roadmap

Feature requests and issues

Available Knowledge Graphs

Concept Recognition

Evaluation of concept recognition on the CRAFT test corpus

BioLink Association Extraction

Associated Code Repositories

About

Releases

Packages

License

mikebada/Text-Mining-Provider-Roadmap

Folders and files

Latest commit

History

Repository files navigation

NCATS Text Mining Provider Roadmap

Feature requests and issues

Available Knowledge Graphs

Concept Recognition

Evaluation of concept recognition on the CRAFT test corpus

BioLink Association Extraction

Associated Code Repositories

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages