Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Failed to upload document. Please upload an unstructured text document." #226

Open
JusSil501 opened this issue Oct 22, 2024 · 2 comments
Open

Comments

@JusSil501
Copy link

JusSil501 commented Oct 22, 2024

I am getting this weird error where it says Failed to upload document. Please upload an unstructured text document.
I can confirm my api key is valid and has credits

chain-server logs->>>

2024-10-22 03:52:48 WARNING:unstructured:PDF text extraction failed, skip text extraction...
2024-10-22 03:52:48 INFO:unstructured:Processing entire page OCR with tesseract...
2024-10-22 03:53:00 ERROR:example:Failed to ingest document due to exception
2024-10-22 03:53:00 **********************************************************************
2024-10-22 03:53:00 Resource punkt_tab not found.
2024-10-22 03:53:00 Please use the NLTK Downloader to obtain the resource:
2024-10-22 03:53:00
2024-10-22 03:53:00 >>> import nltk
2024-10-22 03:53:00 >>> nltk.download('punkt_tab')
2024-10-22 03:53:00
2024-10-22 03:53:00 For more information see: https://www.nltk.org/data.html
2024-10-22 03:53:00
2024-10-22 03:53:00 Attempted to load tokenizers/punkt_tab/english/
2024-10-22 03:53:00
2024-10-22 03:53:00 Searched in:
2024-10-22 03:53:00 - '/tmp-data/nltk_data/'
2024-10-22 03:53:00 - '/root/nltk_data'
2024-10-22 03:53:00 - '/usr/nltk_data'
2024-10-22 03:53:00 - '/usr/share/nltk_data'
2024-10-22 03:53:00 - '/usr/lib/nltk_data'
2024-10-22 03:53:00 - '/usr/share/nltk_data'
2024-10-22 03:53:00 - '/usr/local/share/nltk_data'
2024-10-22 03:53:00 - '/usr/lib/nltk_data'
2024-10-22 03:53:00 - '/usr/local/lib/nltk_data'
2024-10-22 03:53:00 **********************************************************************
2024-10-22 03:53:00
2024-10-22 03:53:00 ERROR:RetrievalAugmentedGeneration.common.server:Error from POST /documents endpoint. Ingestion of file: /tmp/gradio/92f4570d0bbd4d801f7fbbae0ad13db83f59b1f6518c47156f6cbb0b605472d7/Justin_Silva_Resume.pdf failed with error: Failed to upload document. Please upload an unstructured text document.

Image

@DennisFaucher
Copy link

Same. I had to export my PDF as text.

@chinyeanyee0916
Copy link

same here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants