Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formalize TRAPI attribute structure for concept cooccurrence query results #93

Open
bill-baumgartner opened this issue Sep 17, 2021 · 0 comments
Labels
in progress status - This issue is actively being addressed

Comments

@bill-baumgartner
Copy link
Collaborator

Create a preliminary TRAPI attribute structure for returning concept cooccurrence results. This structure can be modeled after the COHD attribute structure proposed by Matt Brush.

COHD example provided by Matt Brush

Screen Shot 2021-09-16 at 5 29 14 PM

Proposed Cooccurrence Attribute Structure

cooccurrence-attribute-schema

Proposed Node TSV

id name category
CHEBI:3215 bupivacaine biolink:ChemicalEntity
PR:000031567 leucine-rich repeat-containing protein 3B biolink:Protein

Proposed Edge TSV (Note: scroll table to see all columns)

subject predicate object id association_type supporting_study_results _attributes
CHEBI:3215 biolink:related_to PR:000031567 hcR2-6QIJratLDFyFxwcSO6UW1M biolink:Association tmkp:a1a1a1a1a1a1|tmkp:b2b2b2b2b2b2|tmkp:c3c3c3c3c3c3|tmkp:d4d4d4d4d4d4 ATTRIBUTE_JSON_BLOB

where the ATTRIBUTE_JSON_BLOB would be JSON represented by the following YAML:

- attribute_type_id: biolink:original_knowledge_source
  value: infores:text-mining-provider-cooccurrence
  value_type_id: biolink:InformationResource
  description: The Text Mining Provider Concept Cooccurrence KP from NCATS Translator provides cooccurrence metrics for text-mined concepts that cooccur at various levels, e.g. document, sentence, etc. in the biomedical literature.
  attribute_source: infores:text-mining-provider-cooccurrence

- attribute_type_id: biolink:supporting_data_source
  value: infores:pubmed
  value_type_id: biolink:InformationResource
  attribute_source: infores:text-mining-provider-cooccurrence

- attribute_type_id: biolink:supporting_study_result
      value: tmkp:a1a1a1a1a1a1
      value_type_id: biolink:DocumentLevelConceptCooccurrenceAnalysisResult
      description: a single result from computing cooccurrence metrics between two concepts that cooccur at the document level
      attribute_source: infores:text-mining-provider-cooccurrence    
      attributes: 

    	- attribute_type_id: biolink:supporting_document    ## NOT CURRENTLY IN BIOLINK
          value: PMID:29085514|PMID:1236578
          value_type_id: biolink:Publication
          description: The documents where the concepts of this assertion were observed to cooccur at the document level.
          attribute_source: infores:pubmed
 
    	- attribute_type_id: biolink:tmkp_concept1_count
          value: 123
          value_type_id: SIO:000794     # SIO:count
          description: The number of times concept #1 was observed to occur at the document level in the documents that were processed
          attribute_source: infores:text-mining-provider-cooccurrence

    	- attribute_type_id: biolink:tmkp_concept2_count
          value: 321
          value_type_id: SIO:000794     # SIO:count
          description: The number of times concept #2 was observed to occur at the document level in the documents that were processed
          attribute_source: infores:text-mining-provider-cooccurrence

        - attribute_type_id: biolink:tmkp_concept_pair_count
          value: 2
          value_type_id: SIO:000794     # SIO:count
          description: The number of times the concepts of this assertion were observed to cooccur at the document level in the documents that were processed
          attribute_source: infores:text-mining-provider-cooccurrence

    	- attribute_type_id: biolink:tmkp_normalized_google_distance
          value: 0.876
          value_type_id: EDAM:data_1772     # EDAM:score 
          description: The normalized google distance score for the concepts in this assertion based on their cooccurrence in the documents that were processed
          attribute_source: infores:text-mining-provider-cooccurrence

    	- attribute_type_id: biolink:tmkp_pointwise_mutual_information
          value: 0.876
          value_type_id: EDAM:data_1772     # EDAM:score 
          description: The pointwise mutual information score for the concepts in this assertion based on their cooccurrence in the documents that were processed
          attribute_source: infores:text-mining-provider-cooccurrence

    	- attribute_type_id: biolink:tmkp_normalized_pointwise_mutual_information
          value: 0.876
          value_type_id: EDAM:data_1772     # EDAM:score 
          description: The normalized pointwise mutual information score for the concepts in this assertion based on their cooccurrence in the documents that were processed
          attribute_source: infores:text-mining-provider-cooccurrence

    	- attribute_type_id: biolink:tmkp_mutual_dependence
          value: 0.876
          value_type_id: EDAM:data_1772     # EDAM:score 
          description: The mutual dependence (PMI^2) score for the concepts in this assertion based on their cooccurrence in the documents that were processed
          attribute_source: infores:text-mining-provider-cooccurrence

    	- attribute_type_id: biolink:tmkp_normalized_pointwise_mutual_information_max
          value: 0.876
          value_type_id: EDAM:data_1772     # EDAM:score 
          description: A variant of the normalized pointwise mutual information score for the concepts in this assertion based on their cooccurrence in the documents that were processed
          attribute_source: infores:text-mining-provider-cooccurrence

    	- attribute_type_id: biolink:tmkp_log_frequency_biased_mutual_dependence
          value: 0.876
          value_type_id: EDAM:data_1772     # EDAM:score 
          description: The log frequency biased mutual dependence score for the concepts in this assertion based on their cooccurrence in the documents that were processed
          attribute_source: infores:text-mining-provider-cooccurrence

- attribute_type_id: biolink:supporting_study_result
      value: tmkp:b2b2b2b2b2b2 
      value_type_id: biolink:SentenceLevelConceptCooccurrenceAnalysisResult
      description: a single result from computing cooccurrence metrics between two concepts that cooccur at the sentence level
      attribute_source: infores:text-mining-provider-cooccurrence    
      attributes: 

      		[SAME ATTRIBUTES AS ABOVE]

- attribute_type_id: biolink:supporting_study_result
      value: tmkp:c3c3c3c3c3c3 
      value_type_id: biolink:TitleLevelConceptCooccurrenceAnalysisResult
      description: a single result from computing cooccurrence metrics between two concepts that cooccur in the document title
      attribute_source: infores:text-mining-provider-cooccurrence    
      attributes: 

      		[SAME ATTRIBUTES AS ABOVE]

- attribute_type_id: biolink:supporting_study_result
      value: tmkp:d4d4d4d4d4d4 
      value_type_id: biolink:AbstractLevelConceptCooccurrenceAnalysisResult
      description: a single result from computing cooccurrence metrics between two concepts that cooccur in the abstract
      attribute_source: infores:text-mining-provider-cooccurrence    
      attributes: 

      		[SAME ATTRIBUTES AS ABOVE]

@bill-baumgartner bill-baumgartner added incoming status - This issue has been submitted and is awaiting approval/triage in progress status - This issue is actively being addressed and removed incoming status - This issue has been submitted and is awaiting approval/triage labels Sep 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in progress status - This issue is actively being addressed
Projects
None yet
Development

No branches or pull requests

1 participant