Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is the task id from the frontend passed to the REST api of the backend and then to the ML backend? #6966

Open
chrisdukeLlama opened this issue Jan 25, 2025 · 8 comments

Comments

@chrisdukeLlama
Copy link

I would really appreciate some help on the question how the task id from the frontend is passed to the REST api of the backend and then to the ML backend.

So more detailed: When I click on a task in the browser I get the predicted annotation from my interactive ML backend (that works already). I thought in order for that to work the DataManager.jsx sends the task id to the backend and requests the task data there, the task data is added and the ml/api.py sends both to the predict endpoint of the ML backend.

In the DataManager.jsx this part looks like the right one to me, because "mlInteractive" sounds like the appropriate endpoint:

if (interactiveBacked) {
  dataManager.on("lsf:regionFinishedDrawing", (reg, group) => {
    const { lsf, task, currentAnnotation: annotation } = dataManager.lsf;
    const ids = group.map((r) => r.cleanId);
    const result = annotation.serializeAnnotation().filter((res) => ids.includes(res.id));
    const suggestionsRequest = api.callApi("mlInteractive", {
      params: { pk: interactiveBacked.id},
      body: {
        task: task.id,
        context: { result },
      },
    });

But it appears to be triggered on regionFinishedDrawing..which does not sound right, given that the prediction is triggered in my case by opening a task.

Could you provide me the route on which the task.id and finally the task is passed to the ML backend for the prediction in cases where the whole process is initiated by a click on the task and with an interactive backend?

Thanks so much in advance!

@chrisdukeLlama
Copy link
Author

Okay, now I understand that this is done by the tasks/api.py and not by ml/api.py.. could you just tell me where in the frontend code the api call is evoked? I cannot find the call to tasks/:taskid/project that is then triggering:

def get(self, request, pk):
    context = self.get_retrieve_serializer_context(request)
    context['project'] = project = self.task.project

    # get prediction
    if (
        project.evaluate_predictions_automatically or project.show_collab_predictions
    ) and not self.task.predictions.exists():
        evaluate_predictions([self.task])
        self.task.refresh_from_db()

    serializer = self.get_serializer_class()(
        self.task, many=False, context=context, expand=['annotations.completed_by']
    )
    data = serializer.data
    return Response(data)

I would be really thankful if you could help me with that, I need it to pass the DOM to the ML backend.. an underestimated feature ;)

@chrisdukeLlama
Copy link
Author

okay it is here: web/libs/datamanager/src/stores/DataStores/tasks.js

  loadTask: flow(function* (taskID, { select = true } = {}) {
    if (!isDefined(taskID)) {
      console.warn("Task ID must be provided");
      return;
    }

    self.setLoading(taskID);

    const taskData = yield self.root.apiCall("task", { taskID });

    const task = self.applyTaskSnapshot(taskData, taskID);

    if (select !== false) self.setSelected(task);

    self.finishLoading(taskID);

    return task;
  }),

@chrisdukeLlama
Copy link
Author

@heidi-humansignal of course feel free to delete the issue if you think it is not helping anyone

@heidi-humansignal
Copy link
Collaborator

Hello,

Hi there! Here’s a quick rundown of how the task ID flows to the ML backend when you open a task (as opposed to the “interactive” flow triggered by drawing a region):

  1. Frontend loading the task
  • In web/libs/datamanager/src/stores/DataStores/tasks.js, the loadTask function calls the Label Studio backend route GET /api/tasks/<taskID> to fetch that task’s data.
  1. Tasks API
  • On the backend side, tasks/api.py (for example, the TaskAPI.get(...) method) checks if the project is configured to fetch predictions automatically for tasks without them. If so, the backend calls evaluate_predictions([self.task]).
  1. Predicting tasks
  • Under the hood, evaluate_predictions in data_manager/functions.py calls ml_backend.predict_tasks([...]), passing along the task info.
  • Then the ML backend’s /predict endpoint is called (in ml/api_connector.py), sending the task data by ID. From there your ML backend’s predict method gets the relevant data.

If you want to pass part of the DOM to the ML backend, you would need to modify (or override) how Label Studio gathers or serializes data in predict_tasks or in your custom ML backend. But by default, it only passes task data, not the entire rendered interface.

Thank you,
Abu

Comment by Abubakar Saad
Workflow Run

@chrisdukeLlama
Copy link
Author

@heidi-humansignal
Hello Abu,

Thanks a lot for your detailed summary of the data flow!

Yesterday, I successfully passed a static DOM through tasks.js, evaluate_predictions, predict_tasks, and api_connector to the ML backend and was able to retrieve it from the request data.

The missing part now is to dynamically add the DOM from the DOMManager to the API call in tasks.js. Currently, it is not available (anymore? or not yet?) when I send the API request.

I suspect that the DOMManager might depend on the response from tasks.js, which would make sense since it needs the HTML from the backend to render. However, I need to dig deeper into this to confirm (I’m still new to this and figuring out the basics).

Any insights on this would be much appreciated!

Best,
Chris

@nehalecky
Copy link

@chrisdukeLlama , thanks for all your engagement and feedback here. Glad to see you making progress and that the feedback has been helpful. On your initial comment above.

I need it to pass the DOM to the ML backend.. an underestimated feature ;)

I'm interested to understand better your use case. Can you give us a little bit more color? Thanks!

@chrisdukeLlama
Copy link
Author

@nehalecky
Thanks a lot for the chance to make my case!

I’m working with PDFs that contain complex tables. I convert them to HTML because labeling based on raw text alone would be a nightmare—the user would struggle to find the relevant passages quickly. There’s no way around that. I tested the labeling, and it works great.

In the long run, I want to implement regex-based prelabeling, followed by ML-based prelabeling. But to make this usable, the user must see pre-labeled passages in the right location for correction & quality control. This requires the DOM to align perfectly with predictions.

The Problem: HTML & DOM Alignment Fails Without the User’s Actual DOM
I tried many ways to parse the HTML to match the browser’s DOM structure—including minifying it (though I suspect minification has been removed from Label Studio?). The issue:

My PDFs are ~100 pages each, and while the PDF-to-HTML conversion looks good visually, the underlying HTML is messy.
Without the user’s real DOM, trying to align text spans with the HTML structure is impossible.
Temporary (Messy) Solution That Works
Now, I’m not a developer—just a simple country lawyer—but I wrote some ugly code to:
1️⃣ Automatically download the user's DOM into a file.
2️⃣ Read the DOM from the ML backend.
3️⃣ Use regex to find spans in the text.
4️⃣ Match those spans against the real DOM → Perfect prelabeling alignment!

This proves it works—but obviously, this is not a long-term solution.

A Proper Implementation Could Benefit Others
I don’t know what your priorities are, but I think this could help:
✅ Anyone working with HTML tables or complex structured HTML.
✅ Anyone converting PDFs to HTML for annotation.
✅ A standardized way to label PDFs in Label Studio, if good PDF-to-HTML conversion was added.

For LLMs, I’d still need to pass the HTML as-is to the ML backend, do span prediction, then realign spans with the DOM (which I’d need passed to the ML backend as well). Without the DOM, this is impossible (at least for my files and me).

Anyway, I really appreciate Label Studio—this is some fantastic software, and I’m rooting for you! Hope all the hard work pays off!

Best,
Chris

@chrisdukeLlama
Copy link
Author

chrisdukeLlama commented Jan 30, 2025

Hey @nehalecky,

I wanted to add a thought that might align with Label Studio’s broader PDF labeling strategy.

I completely understand why Label Studio is introducing PDF-to-image workflows, especially for handling scanned documents. But for structured PDFs that already contain selectable text, tables, and formatting, converting them into images feels like an unnecessary step in terms of both processing power and accuracy.

Since Label Studio already works well with HTML-based annotation, wouldn’t it make sense to also support PDF-to-HTML conversion while preserving document structure? This would allow for span-based labeling while keeping the original text fully selectable.

The key missing piece for making this work efficiently is access to the actual DOM. If PDFs are converted to HTML but ML predictions are based on raw extracted text, the results will often be misaligned. However, if the DOM were passed to the ML backend, predictions could be more easily aligned with the actual rendered structure, ensuring accurate pre-labeling.

For my use case, converting pdfs of 100 pages into hundred images first appears to be somewhat excessive. I imagine this could be useful for other users dealing with structured PDFs as well. I am not criticizing OCR and I love that this is implemented, but I think it is just not the best solution for all cases. In my case it would also not work, because I am finally creating a Database from the labels with each task/file a row and each column a labelling category (using label studio for a perfect documenting trail of database creation)

Best regards,
Chris

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants