API does will not parse text in PDF image #405

SomebodySysop · 2024-09-17T17:58:35Z

I am using LlamaParse via the REST API: https://docs.cloud.llamaindex.ai/llamaparse/getting_started/api

I have PDF I am trying to parse: https://s3.us-west-2.amazonaws.com/docs.scbbs.com/docs/test/67_SL_23.pdf

The first two pages are indexable text, but the 3rd page is an image.

When I submit the PDF in the lllamacloud parser dashboard, it returns all pages correctly.

However, when I submit the same PDF to the API, it only returns the first two pages.

I've tried these parameters to force the OCR, but I still only get the first 2 pages via the API:

// Define the body parameters
$data = [
	'language' => 'en',
	'parsing_instruction' => 'Please use OCR to extract all text in the page 3 image.',
	'accurate_mode' => true,
	'fast_mode' => false,
	'disable_ocr' => false // Note: Using string 'false' as it will be sent as form-data
];

Any suggestions to get all 3 pages via the API?

The text was updated successfully, but these errors were encountered:

hexapode added the bug Something isn't working label Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API does will not parse text in PDF image #405

API does will not parse text in PDF image #405

SomebodySysop commented Sep 17, 2024

API does will not parse text in PDF image #405

API does will not parse text in PDF image #405

Comments

SomebodySysop commented Sep 17, 2024