Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Web] Text selection on web always starts at start of line #235

Open
MarcVanDaele90 opened this issue Sep 18, 2024 · 5 comments
Open

[Web] Text selection on web always starts at start of line #235

MarcVanDaele90 opened this issue Sep 18, 2024 · 5 comments

Comments

@MarcVanDaele90
Copy link

As mentioned in Issue #4 , text selection on Web has one remaining issue: it always selects complete lines.
This can be reproduced when trying to select a couple of words on the demo application https://espresso3389.github.io/pdfrx/

@MarcVanDaele90
Copy link
Author

Some more observations (which might be obvious to you).

I noticed the following when opening the same (two-page) pdf

  • on Linux, PdfPageTextPdfium._loadText(...) created 581/292 fragments
  • on Web, PdfPageTextWeb._loadText(...)created only 72/43 fragments

When printing out the text of the resulting PdfPageTextFragment, I noticed that Pdfium seems to add fragments on word level while Web seems to add fragments per line.

This explains why a selection always starts at the beginning of the line I guess.
Not sure whether you can get also word-fragments on web somehow?

@espresso3389
Copy link
Owner

You're right. I don't know how to extract word level coodinates with pdf.js. pdf.js example viewer can handle word level coodinates but it uses something provided by HTML canvas or such. I need more research on that...

@StroeAndreX
Copy link

Any updates on the text selection feature for the web? It seems there is also an issue with consistency when selecting text. For example, sometimes it misses certain words or skips some parts

@espresso3389
Copy link
Owner

espresso3389 commented Nov 19, 2024

I've just googled the things and found the issue.

It explains the dedicated part to extract text positions is;

I'll read the codes to know how pdf.js handles text coordinates.

@MarcVanDaele90
Copy link
Author

This is great news! Thanks for the heads up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants