OhdsiAuthorGraph

Code for visualizing the network of OHDSI authors. Nodes represent authors, links represent co-authorships.

This repo contains the code for preparing the data for visualization. There are three ways to do the visualization:

Using Cytoscape. Cytoscape is an open source software platform for visualizing complex networks.
Using JavaScript embedded in a web page. This uses the D3 JavaScript library.
Using Python matplotlib (preferred).

How to use

Get a list of PMIDs (PubMed IDs) of OHDSI papers.
Run PrepareGraphData.R.
Either view the results in the provided HTML page (see docs folder), load the .tsv files in the cytoscape folder in Cytoscape, or proceed to instructions for matplotlib (below).

Cytoscape instructions

File --> Import --> Network from file --> Select links.tsv

File --> Import --> Table from file --> Select authors.tsv

You probably want to select the 'Always Show Graphics Details' (looks like a pixelated diamond) in the bottom right of the graph pane. Else you won't see the labels etc. in the preview.

In Style - Node (see bottom tab):

Fill color:
- Column: firstYear
- Mapping Type: Continuous mapping
Label Font Size: 15
Label Position (Click Properties dropdown to show):
- Node Anchor Points: East
- Label Anchor Points: West
- Label Justification: Left Justified
- X Offset Value: 1
- Y Offset Value: 0
Shape
- Ellipse
Lock node width and height: check
Size:
- Column: paperCount
- Mapping Type: Continuous mapping

In Style - Edge (see bottom tab):

Stroke color: RGB all at 150
Transparency:
- Column: paperCount
- Mapping Type: Continuous mapping
- Open mapping. Double click left box, set value to 40. Double click right box, set value to 100

Layout --> yFiles Organic Layout --> yFiles Remove Overlaps (Tip: temporarily change node shape to rectangle, uncheck lock node width and heigh, set height to 25 and width to 50, node anchor to west. This will cause layout to avoid (most) label overlap)

Next, move nodes manually to fill screen and avoid label overlap (may take a while)

File --> Export --> Network to image

Matplotlib instructions

The Matplotlib code colors authors by the type of papers they publish. For this, we need to first classify their papers by type, for which we use LLMs:

Run ExtractPubAbstractTitle.R. This will save the titles and abstracts as XML in the intermediaryData folder.
Run PaperClassification.R. This requires access to an LLM like GPT-4. This will write the classifications to the paperClassification folder.
Run matplotlib/PlotAuthorGraph.py. Make sure to run the pickle files (positionsSpringForce.pkl and positionsNoOverlap.pkl) first. These are caches from a previous run.
In some image editor (e.g. Gimp), combine the plot (matplotlib/plot.png) with the legend (matplotlib/legend.png).

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
cytoscape		cytoscape
docs		docs
intermediaryData		intermediaryData
matplotlib		matplotlib
paperClassification		paperClassification
yearlyWordClouds		yearlyWordClouds
.gitignore		.gitignore
ExtractPubAbstractTitle.R		ExtractPubAbstractTitle.R
OhdsiAuthorGraph.Rproj		OhdsiAuthorGraph.Rproj
PaperClassification.R		PaperClassification.R
PrepareGraphData.R		PrepareGraphData.R
README.md		README.md
WordCloudsPerYear.R		WordCloudsPerYear.R
combined.png		combined.png
ohdsi-pubs.csv		ohdsi-pubs.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OhdsiAuthorGraph

How to use

Cytoscape instructions

Matplotlib instructions

About

Languages

schuemie/OhdsiAuthorGraph

Folders and files

Latest commit

History

Repository files navigation

OhdsiAuthorGraph

How to use

Cytoscape instructions

Matplotlib instructions

About

Resources

Stars

Watchers

Forks

Languages