Active learning sampling strategy #125

aenglebert · 2023-02-23T18:31:53Z

Hello !

I am trying to use medCAT with medCATTrainer in an active learning setup to label a subset of a large set of unannotated French documents.

In the medcattrainer paper ( https://arxiv.org/pdf/1907.07322.pdf ), in section 3.2 Active Learning, it's specified the use of selective certainty-based sampling to guide the sampling of documents to annotate.

But the only parameter I found related to active learning in MedCATTrainer is the "train_model_on_submit" parameter in ProjectAnnotateEntities.

MedCATtrainer/webapp/api/api/models.py

Line 246 in ec7900f

    
           train_model_on_submit = models.BooleanField(default=True, help_text='Active learning - configured CDB is trained '

From what I found, this parameter is responsible for a call to the train_medcat function when a document is submitted, but it seems to have no influence on the order/sampling of documents in the project annotation interface.

Is there another option I missed or misunderstood that allows for replicating the certainty-based sampling described in the paper?

Or does this part need to be done outside of MedCATTrainer with the creation of a new project at each annotation step containing only the sampled documents?

By the way, thank you for this amazing tool !

tomolopolis · 2023-03-01T23:04:47Z

Hi @aenglebert - this feature did exist in an early version of MedCATtrainer, v0.x - we've since removed and did mean to reimplement.

Yes - that is the the only project parameter for online learning.

We'll let you know if we get around to implementing it in this version, but its possible right now to programmatically upload datasets and assign to projects, so a crude version could be performed semi-automatically I think.

aenglebert · 2023-03-02T08:48:13Z

Hello.
Ok, I understand, thank you for the answer.
I will check to automate the upload of subsets, it can be a good compromise.
Thanks !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Active learning sampling strategy #125

Active learning sampling strategy #125

aenglebert commented Feb 23, 2023

tomolopolis commented Mar 1, 2023

aenglebert commented Mar 2, 2023 •

edited

Loading

Active learning sampling strategy #125

Active learning sampling strategy #125

Comments

aenglebert commented Feb 23, 2023

tomolopolis commented Mar 1, 2023

aenglebert commented Mar 2, 2023 • edited Loading

aenglebert commented Mar 2, 2023 •

edited

Loading