Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare the performance of PHI annotators on i2b2 and MCW data #10

Open
tschaffter opened this issue Oct 19, 2021 · 1 comment
Open
Assignees

Comments

@tschaffter
Copy link
Member

We currently benchmark PHI annotators on 5 different PHI annotation tasks using two datasets, the 2014 i2b2 dataset and the MCW dataset. The performance of the tools submitted achieve sometime the same performance on both datasets and sometimes the performance is quite large.

The goal of this task is to compare the performance of the tools on these two datasets. One hypothesis is that the difference of performance is due to a difference in the number of annotation of a given type. This difference may be due to the type of clinical notes used or to a difference in the annotation protocol.

Workflow

  • Write a notebook in this GH repository that captures the required information (@yy6linda)
  • Run the notebook to get data from Sage data node (@yy6linda ) and from the MCW data node (@gkowalski )
  • Compare the data (@yy6linda )
@gkowalski
Copy link
Member

gkowalski commented Oct 20, 2021

Well I ran the data-Node.Rmd , Received :

{'name': 'datasets/mcw-phi-20210608'}

@yy6linda let me know when you have a notebook to run that measures performance. But this exercise proves to myself I have the R Studio environment and tunnels to the data-node set up properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants