You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to use medCAT with medCATTrainer in an active learning setup to label a subset of a large set of unannotated French documents.
In the medcattrainer paper ( https://arxiv.org/pdf/1907.07322.pdf ), in section 3.2 Active Learning, it's specified the use of selective certainty-based sampling to guide the sampling of documents to annotate.
But the only parameter I found related to active learning in MedCATTrainer is the "train_model_on_submit" parameter in ProjectAnnotateEntities.
train_model_on_submit=models.BooleanField(default=True, help_text='Active learning - configured CDB is trained '
From what I found, this parameter is responsible for a call to the train_medcat function when a document is submitted, but it seems to have no influence on the order/sampling of documents in the project annotation interface.
Is there another option I missed or misunderstood that allows for replicating the certainty-based sampling described in the paper?
Or does this part need to be done outside of MedCATTrainer with the creation of a new project at each annotation step containing only the sampled documents?
By the way, thank you for this amazing tool !
The text was updated successfully, but these errors were encountered:
Hi @aenglebert - this feature did exist in an early version of MedCATtrainer, v0.x - we've since removed and did mean to reimplement.
Yes - that is the the only project parameter for online learning.
We'll let you know if we get around to implementing it in this version, but its possible right now to programmatically upload datasets and assign to projects, so a crude version could be performed semi-automatically I think.
Hello !
I am trying to use medCAT with medCATTrainer in an active learning setup to label a subset of a large set of unannotated French documents.
In the medcattrainer paper ( https://arxiv.org/pdf/1907.07322.pdf ), in section 3.2 Active Learning, it's specified the use of selective certainty-based sampling to guide the sampling of documents to annotate.
But the only parameter I found related to active learning in MedCATTrainer is the "train_model_on_submit" parameter in ProjectAnnotateEntities.
MedCATtrainer/webapp/api/api/models.py
Line 246 in ec7900f
From what I found, this parameter is responsible for a call to the train_medcat function when a document is submitted, but it seems to have no influence on the order/sampling of documents in the project annotation interface.
Is there another option I missed or misunderstood that allows for replicating the certainty-based sampling described in the paper?
Or does this part need to be done outside of MedCATTrainer with the creation of a new project at each annotation step containing only the sampled documents?
By the way, thank you for this amazing tool !
The text was updated successfully, but these errors were encountered: