Skip to content

Milestone 2, Segment II: Develop new tools for extracting associations from text

No due date 0% complete

During Segment I, we have successfully demonstrated the ability to tune BlueBERT (formerly NCBIBert) language models to classify Biolink associations in sentences. In Segment II, we will apply this approach to improving performance of the associations targeted in Segment I while targeting a new set of six associations. Selection of the six new association…

During Segment I, we have successfully demonstrated the ability to tune BlueBERT (formerly NCBIBert) language models to classify Biolink associations in sentences. In Segment II, we will apply this approach to improving performance of the associations targeted in Segment I while targeting a new set of six associations. Selection of the six new associations will be based on feedback we receive from the Translator community. The effort required for the relation annotation work in Segment I demonstrates a need to increase our efficiency. We therefore propose to develop an association annotation tool that is closely integrated to BERT model training. The tool will incorporate filtering mechanisms that our annotators found useful during Segment I, e.g. sorting sentences by length and filtering based on concept types that are present. Incorporating an active learning component to help prioritize sentences to be annotated is a stretch goal for the year (not a milestone). An investigation of existing annotation tools will guide development efforts, with the hopes of re-using existing code.

Loading