Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API does will not parse text in PDF image #405

Open
SomebodySysop opened this issue Sep 17, 2024 · 0 comments
Open

API does will not parse text in PDF image #405

SomebodySysop opened this issue Sep 17, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@SomebodySysop
Copy link

I am using LlamaParse via the REST API: https://docs.cloud.llamaindex.ai/llamaparse/getting_started/api

I have PDF I am trying to parse: https://s3.us-west-2.amazonaws.com/docs.scbbs.com/docs/test/67_SL_23.pdf

The first two pages are indexable text, but the 3rd page is an image.

When I submit the PDF in the lllamacloud parser dashboard, it returns all pages correctly.

However, when I submit the same PDF to the API, it only returns the first two pages.

I've tried these parameters to force the OCR, but I still only get the first 2 pages via the API:

// Define the body parameters
$data = [
	'language' => 'en',
	'parsing_instruction' => 'Please use OCR to extract all text in the page 3 image.',
	'accurate_mode' => true,
	'fast_mode' => false,
	'disable_ocr' => false // Note: Using string 'false' as it will be sent as form-data
];

Any suggestions to get all 3 pages via the API?

@hexapode hexapode added the bug Something isn't working label Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants