Skip to content

Conversation

@fynnos
Copy link
Collaborator

@fynnos fynnos commented Feb 19, 2025

This is a rather large PR that provides...

Quotation detection

Who says what to whom is automatically extracted from text using the quotect package.

Integrate quotect into the ray model worker

Send tokenized text to quotect model in ray, retrieve results and create span annotation as well as span groups to persist the predictions in the database.

Generic ML Jobs (Backend + Frontend)

As the model is rather heavy on compute, it is not added to our already too large pre-processing pipeline. Instead, this PR adds a new generic ML job that is used to run the quotation detection task on any unprocessed documents. If the user insists, it is also possible to delete any existing annotations and groups and re-compute (e.g. to use another quotation detection model on the same data)

Added a new page accessible by the drawer on the left to provide access to trigger ML automation features (currently only quotation detection):
Screenshot 2025-02-19 at 14 52 03
Screenshot 2025-02-19 at 14 53 07

Rendering span groups in the frontend UI

The document viewer and annotator now show span groups by numbering the span tags:

Screenshot 2025-02-19 at 14 54 27

@fynnos fynnos requested a review from bigabig February 25, 2025 16:22
@bigabig bigabig merged commit 75fee76 into main Feb 28, 2025
2 checks passed
@bigabig bigabig deleted the dev/quotect branch March 31, 2025 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants