Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: zip() argument after * must be an iterable, not NoneType #1269

Open
JeisonJimenezA opened this issue Dec 11, 2024 · 1 comment
Open
Labels
bug Something isn't working

Comments

@JeisonJimenezA
Copy link

Description of the bug | 错误描述

When processing a specific file I get this error:

2024-12-11 09:56:54.076 | INFO | magic_pdf.model.pdf_extract_kit:call:226 - -----page total time: 1.12-----
2024-12-11 09:56:54.573 | INFO | magic_pdf.model.pdf_extract_kit:call:153 - layout detection time: 0.5
2024-12-11 09:56:54.690 | INFO | magic_pdf.model.pdf_extract_kit:call:161 - mfd time: 0.11
2024-12-11 09:56:54.691 | INFO | magic_pdf.model.pdf_extract_kit:call:168 - formula nums: 0, mfr time: 0.0
2024-12-11 09:56:54.691 | INFO | magic_pdf.model.pdf_extract_kit:call:194 - ocr time: 0.0
2024-12-11 09:56:55.762 | ERROR | main:pdf_parse_main:83 - zip() argument after * must be an iterable, not NoneType
Traceback (most recent call last):

File "C:\IA\MinerU\processing.py", line 88, in
pdf_parse_main(
└ <function pdf_parse_main at 0x00000193ADFB3E20>

File "C:\IA\MinerU\processing.py", line 57, in pdf_parse_main
pipe.pipe_analyze() # Document analysis
│ └ <function UNIPipe.pipe_analyze at 0x00000193E2BA1870>
└ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x00000193E2C70070>

File "C:\IA\MinerU\env_2\lib\site-packages\magic_pdf\pipe\UNIPipe.py", line 37, in pipe_analyze
self.model_list = doc_analyze(self.pdf_bytes, ocr=True,
│ │ │ │ └ b'%PDF-1.7\n%\xbf\xf7\xa2\xfe\n1 0 obj\n<< /Metadata 30 0 R /Pages 31 0 R /Type /Catalog >>\nendobj\n2 0 obj\n<< /Type /ObjSt...
│ │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x00000193E2C70070>
│ │ └ <function doc_analyze at 0x00000193E241C670>
│ └ []
└ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x00000193E2C70070>

File "C:\IA\MinerU\env_2\lib\site-packages\magic_pdf\model\doc_analyze_by_custom_model.py", line 166, in doc_analyze
result = custom_model(img)
│ └ array([[[255, 255, 255],
│ [255, 255, 255],
│ [255, 255, 255],
│ ...,
│ [255, 255, 255],
│ [255...
└ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x00000193E2C384C0>

File "C:\IA\MinerU\env_2\lib\site-packages\magic_pdf\model\pdf_extract_kit.py", line 211, in call
html_code, table_cell_bboxes, elapse = self.table_model.predict(new_image)
│ │ │ │ └ <PIL.Image.Image image mode=RGB size=1398x2008 at 0x19395916920>
│ │ │ └ <function RapidTableModel.predict at 0x0000019395762DD0>
│ │ └ <magic_pdf.model.sub_modules.table.rapidtable.rapid_table.RapidTableModel object at 0x000001939627FA60>
│ └ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x00000193E2C384C0>
└ None

File "C:\IA\MinerU\env_2\lib\site-packages\magic_pdf\model\sub_modules\table\rapidtable\rapid_table.py", line 13, in predict
html_code, table_cell_bboxes, elapse = self.table_model(np.asarray(image), ocr_result)
│ │ │ │ │ └ None
│ │ │ │ └ <PIL.Image.Image image mode=RGB size=1398x2008 at 0x19395916920>
│ │ │ └
│ │ └ <module 'numpy' from 'C:\IA\MinerU\env_2\lib\site-packages\numpy\init.py'>
│ └ <rapid_table.main.RapidTable object at 0x00000193AD23EA10>
└ <magic_pdf.model.sub_modules.table.rapidtable.rapid_table.RapidTableModel object at 0x000001939627FA60>

File "C:\IA\MinerU\env_2\lib\site-packages\rapid_table\main.py", line 55, in call
dt_boxes, rec_res = self.get_boxes_recs(ocr_result, h, w)
│ │ │ │ └ 1398
│ │ │ └ 2008
│ │ └ None
│ └ <function RapidTable.get_boxes_recs at 0x0000019395746DD0>
└ <rapid_table.main.RapidTable object at 0x00000193AD23EA10>

File "C:\IA\MinerU\env_2\lib\site-packages\rapid_table\main.py", line 69, in get_boxes_recs
dt_boxes, rec_res, scores = list(zip(*ocr_result))
└ None

TypeError: zip() argument after * must be an iterable, not NoneType

How to reproduce the bug | 如何复现

When processing a specific file I get this error:

2024-12-11 09:56:54.076 | INFO | magic_pdf.model.pdf_extract_kit:call:226 - -----page total time: 1.12-----
2024-12-11 09:56:54.573 | INFO | magic_pdf.model.pdf_extract_kit:call:153 - layout detection time: 0.5
2024-12-11 09:56:54.690 | INFO | magic_pdf.model.pdf_extract_kit:call:161 - mfd time: 0.11
2024-12-11 09:56:54.691 | INFO | magic_pdf.model.pdf_extract_kit:call:168 - formula nums: 0, mfr time: 0.0
2024-12-11 09:56:54.691 | INFO | magic_pdf.model.pdf_extract_kit:call:194 - ocr time: 0.0
2024-12-11 09:56:55.762 | ERROR | main:pdf_parse_main:83 - zip() argument after * must be an iterable, not NoneType
Traceback (most recent call last):

File "C:\IA\MinerU\processing.py", line 88, in
pdf_parse_main(
└ <function pdf_parse_main at 0x00000193ADFB3E20>

File "C:\IA\MinerU\processing.py", line 57, in pdf_parse_main
pipe.pipe_analyze() # Document analysis
│ └ <function UNIPipe.pipe_analyze at 0x00000193E2BA1870>
└ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x00000193E2C70070>

File "C:\IA\MinerU\env_2\lib\site-packages\magic_pdf\pipe\UNIPipe.py", line 37, in pipe_analyze
self.model_list = doc_analyze(self.pdf_bytes, ocr=True,
│ │ │ │ └ b'%PDF-1.7\n%\xbf\xf7\xa2\xfe\n1 0 obj\n<< /Metadata 30 0 R /Pages 31 0 R /Type /Catalog >>\nendobj\n2 0 obj\n<< /Type /ObjSt...
│ │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x00000193E2C70070>
│ │ └ <function doc_analyze at 0x00000193E241C670>
│ └ []
└ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x00000193E2C70070>

File "C:\IA\MinerU\env_2\lib\site-packages\magic_pdf\model\doc_analyze_by_custom_model.py", line 166, in doc_analyze
result = custom_model(img)
│ └ array([[[255, 255, 255],
│ [255, 255, 255],
│ [255, 255, 255],
│ ...,
│ [255, 255, 255],
│ [255...
└ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x00000193E2C384C0>

File "C:\IA\MinerU\env_2\lib\site-packages\magic_pdf\model\pdf_extract_kit.py", line 211, in call
html_code, table_cell_bboxes, elapse = self.table_model.predict(new_image)
│ │ │ │ └ <PIL.Image.Image image mode=RGB size=1398x2008 at 0x19395916920>
│ │ │ └ <function RapidTableModel.predict at 0x0000019395762DD0>
│ │ └ <magic_pdf.model.sub_modules.table.rapidtable.rapid_table.RapidTableModel object at 0x000001939627FA60>
│ └ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x00000193E2C384C0>
└ None

File "C:\IA\MinerU\env_2\lib\site-packages\magic_pdf\model\sub_modules\table\rapidtable\rapid_table.py", line 13, in predict
html_code, table_cell_bboxes, elapse = self.table_model(np.asarray(image), ocr_result)
│ │ │ │ │ └ None
│ │ │ │ └ <PIL.Image.Image image mode=RGB size=1398x2008 at 0x19395916920>
│ │ │ └
│ │ └ <module 'numpy' from 'C:\IA\MinerU\env_2\lib\site-packages\numpy\init.py'>
│ └ <rapid_table.main.RapidTable object at 0x00000193AD23EA10>
└ <magic_pdf.model.sub_modules.table.rapidtable.rapid_table.RapidTableModel object at 0x000001939627FA60>

File "C:\IA\MinerU\env_2\lib\site-packages\rapid_table\main.py", line 55, in call
dt_boxes, rec_res = self.get_boxes_recs(ocr_result, h, w)
│ │ │ │ └ 1398
│ │ │ └ 2008
│ │ └ None
│ └ <function RapidTable.get_boxes_recs at 0x0000019395746DD0>
└ <rapid_table.main.RapidTable object at 0x00000193AD23EA10>

File "C:\IA\MinerU\env_2\lib\site-packages\rapid_table\main.py", line 69, in get_boxes_recs
dt_boxes, rec_res, scores = list(zip(*ocr_result))
└ None

TypeError: zip() argument after * must be an iterable, not NoneType

Operating system | 操作系统

Windows

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.9.x

Device mode | 设备模式

cuda

@JeisonJimenezA JeisonJimenezA added the bug Something isn't working label Dec 11, 2024
@myhloli
Copy link
Collaborator

myhloli commented Dec 11, 2024

Can you upload the sample file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants