-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author Disambiguation #25
Comments
Atm every Every "duplicate" is based on poor source data. This is usually the result of Authors using different representation of their names (or being references with a different representation of their names) On the other side of the spectrum this results in the problem, that Authors with the same name (e.g. a common name like Tom Miller) are merged to one Author atm. To bypass these problems Authors can attach an Orcid ID to their papers. This is done more and more by authors nowadays but unfortunately orcid IDs are missing in the CORD19 dataset. One could improve the current situation, by creating a new data source script which matches papers against pubmed data and try to obtain more detailed author data from there. As the author name representations in the references in the CORD19 data is very poor, this data will be dropped with the next datamodel release anyway. |
Some additional ideas from the Matrix chat:
|
I used Springer's SciGraph in the past which contains links between persons and organisations. Don't forget to consider that a person switches organisations over time. see: SciGraph Ontology |
Just took a quick look at the data they make available for download. Not sure how useful it is. We may need to develop a database on our own, that is specific to the COVID authors, and can rely on information on institutions, co-authors, etc. in the COVID-19 dataset. |
The authors on references are duplicated for each reference node, and should be unified across all references. Thus, with a single author node, one should have links to all references that that person is an author on.
covidgraph/documentation#1
The text was updated successfully, but these errors were encountered: