Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug/<raise BadZipFile> #3865

Open
Vampxgg opened this issue Jan 15, 2025 · 0 comments
Open

bug/<raise BadZipFile> #3865

Vampxgg opened this issue Jan 15, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@Vampxgg
Copy link

Vampxgg commented Jan 15, 2025

Describe the bug
No matter how I use it, it will report raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Screenshots
1736934512502

Environment Info
Traceback (most recent call last):
File "D:\pythonprojects\LANGCHAIN\main.py", line 87, in
elements = partition_pdf("D:\pythonprojects\LANGCHAIN\inputs\智能传感器装配调试台架-产品手册.pdf")
File "D:\miniconda\envs\LANGCHAIN\lib\site-packages\unstructured\documents\elements.py", line 581, in wrapper
elements = func(*args, **kwargs)
File "D:\miniconda\envs\LANGCHAIN\lib\site-packages\unstructured\file_utils\filetype.py", line 725, in wrapper
elements = func(*args, **kwargs)
File "D:\miniconda\envs\LANGCHAIN\lib\site-packages\unstructured\file_utils\filetype.py", line 683, in wrapper
elements = func(*args, **kwargs)
File "D:\miniconda\envs\LANGCHAIN\lib\site-packages\unstructured\chunking\dispatch.py", line 74, in wrapper
elements = func(*args, **kwargs)
File "D:\miniconda\envs\LANGCHAIN\lib\site-packages\unstructured\partition\pdf.py", line 209, in partition_pdf
return partition_pdf_or_image(
File "D:\miniconda\envs\LANGCHAIN\lib\site-packages\unstructured\partition\pdf.py", line 350, in partition_pdf_or_image
out_elements = _process_uncategorized_text_elements(elements)
File "D:\miniconda\envs\LANGCHAIN\lib\site-packages\unstructured\partition\pdf.py", line 930, in _process_uncategorized_text_elements
new_el = element_from_text(cast(Text, el).text)
File "D:\miniconda\envs\LANGCHAIN\lib\site-packages\unstructured\partition\text.py", line 149, in element_from_text
elif is_possible_narrative_text(text):
File "D:\miniconda\envs\LANGCHAIN\lib\site-packages\unstructured\partition\text_type.py", line 74, in is_possible_narrative_text
if exceeds_cap_ratio(text, threshold=cap_threshold):
File "D:\miniconda\envs\LANGCHAIN\lib\site-packages\unstructured\partition\text_type.py", line 270, in exceeds_cap_ratio
if sentence_count(text, 3) > 1:
File "D:\miniconda\envs\LANGCHAIN\lib\site-packages\unstructured\partition\text_type.py", line 219, in sentence_count
sentences = sent_tokenize(text)
File "D:\miniconda\envs\LANGCHAIN\lib\site-packages\unstructured\nlp\tokenize.py", line 56, in sent_tokenize
_download_nltk_packages_if_not_present()
File "D:\miniconda\envs\LANGCHAIN\lib\site-packages\unstructured\nlp\tokenize.py", line 41, in _download_nltk_packages_if_not_present
tagger_available = check_for_nltk_package(
File "D:\miniconda\envs\LANGCHAIN\lib\site-packages\unstructured\nlp\tokenize.py", line 29, in check_for_nltk_package
nltk.find(f"{package_category}/{package_name}", paths=paths)
File "D:\miniconda\envs\LANGCHAIN\lib\site-packages\nltk\data.py", line 551, in find
return find(modified_name, paths)
File "D:\miniconda\envs\LANGCHAIN\lib\site-packages\nltk\data.py", line 538, in find
return ZipFilePathPointer(p, zipentry)
File "D:\miniconda\envs\LANGCHAIN\lib\site-packages\nltk\data.py", line 391, in init
zipfile = OpenOnDemandZipFile(os.path.abspath(zipfile))
File "D:\miniconda\envs\LANGCHAIN\lib\site-packages\nltk\data.py", line 1020, in init
zipfile.ZipFile.init(self, filename)
File "D:\miniconda\envs\LANGCHAIN\lib\zipfile.py", line 1268, in init
self._RealGetContents()
File "D:\miniconda\envs\LANGCHAIN\lib\zipfile.py", line 1335, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

@Vampxgg Vampxgg added the bug Something isn't working label Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant