Skip to content

Question about extracting labels of PDF Elements #218

@Samyssmile

Description

@Samyssmile

First of all, thank you for making these models available—great work!

I have tried several AI models that extract content from PDFs and identify its type—e.g.,

  • text
  • title
  • list
  • table
  • figure.

The problem is that I haven’t yet found a model that correctly recognizes the hierarchy of headings, such as H1, H2, and H3. Can any of your models do that? So what I need looking for is a way to detect

  • text
  • title
  • list
  • table
  • figure.
  • H1
  • H2
  • H3
  • H4

Is it possible with one of your model?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions