TypeError: zip() argument after * must be an iterable, not NoneType #1269

JeisonJimenezA · 2024-12-11T15:02:20Z

Description of the bug | 错误描述

When processing a specific file I get this error:

2024-12-11 09:56:54.076 | INFO | magic_pdf.model.pdf_extract_kit:call:226 - -----page total time: 1.12-----
2024-12-11 09:56:54.573 | INFO | magic_pdf.model.pdf_extract_kit:call:153 - layout detection time: 0.5
2024-12-11 09:56:54.690 | INFO | magic_pdf.model.pdf_extract_kit:call:161 - mfd time: 0.11
2024-12-11 09:56:54.691 | INFO | magic_pdf.model.pdf_extract_kit:call:168 - formula nums: 0, mfr time: 0.0
2024-12-11 09:56:54.691 | INFO | magic_pdf.model.pdf_extract_kit:call:194 - ocr time: 0.0
2024-12-11 09:56:55.762 | ERROR | main:pdf_parse_main:83 - zip() argument after * must be an iterable, not NoneType
Traceback (most recent call last):

File "C:\IA\MinerU\processing.py", line 88, in
pdf_parse_main(
└ <function pdf_parse_main at 0x00000193ADFB3E20>

File "C:\IA\MinerU\processing.py", line 57, in pdf_parse_main
pipe.pipe_analyze() # Document analysis
│ └ <function UNIPipe.pipe_analyze at 0x00000193E2BA1870>
└ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x00000193E2C70070>

File "C:\IA\MinerU\env_2\lib\site-packages\magic_pdf\pipe\UNIPipe.py", line 37, in pipe_analyze
self.model_list = doc_analyze(self.pdf_bytes, ocr=True,
│ │ │ │ └ b'%PDF-1.7\n%\xbf\xf7\xa2\xfe\n1 0 obj\n<< /Metadata 30 0 R /Pages 31 0 R /Type /Catalog >>\nendobj\n2 0 obj\n<< /Type /ObjSt...
│ │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x00000193E2C70070>
│ │ └ <function doc_analyze at 0x00000193E241C670>
│ └ []
└ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x00000193E2C70070>

File "C:\IA\MinerU\env_2\lib\site-packages\magic_pdf\model\doc_analyze_by_custom_model.py", line 166, in doc_analyze
result = custom_model(img)
│ └ array([[[255, 255, 255],
│ [255, 255, 255],
│ [255, 255, 255],
│ ...,
│ [255, 255, 255],
│ [255...
└ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x00000193E2C384C0>

File "C:\IA\MinerU\env_2\lib\site-packages\magic_pdf\model\pdf_extract_kit.py", line 211, in call
html_code, table_cell_bboxes, elapse = self.table_model.predict(new_image)
│ │ │ │ └ <PIL.Image.Image image mode=RGB size=1398x2008 at 0x19395916920>
│ │ │ └ <function RapidTableModel.predict at 0x0000019395762DD0>
│ │ └ <magic_pdf.model.sub_modules.table.rapidtable.rapid_table.RapidTableModel object at 0x000001939627FA60>
│ └ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x00000193E2C384C0>
└ None

File "C:\IA\MinerU\env_2\lib\site-packages\magic_pdf\model\sub_modules\table\rapidtable\rapid_table.py", line 13, in predict
html_code, table_cell_bboxes, elapse = self.table_model(np.asarray(image), ocr_result)
│ │ │ │ │ └ None
│ │ │ │ └ <PIL.Image.Image image mode=RGB size=1398x2008 at 0x19395916920>
│ │ │ └
│ │ └ <module 'numpy' from 'C:\IA\MinerU\env_2\lib\site-packages\numpy\init.py'>
│ └ <rapid_table.main.RapidTable object at 0x00000193AD23EA10>
└ <magic_pdf.model.sub_modules.table.rapidtable.rapid_table.RapidTableModel object at 0x000001939627FA60>

File "C:\IA\MinerU\env_2\lib\site-packages\rapid_table\main.py", line 55, in call
dt_boxes, rec_res = self.get_boxes_recs(ocr_result, h, w)
│ │ │ │ └ 1398
│ │ │ └ 2008
│ │ └ None
│ └ <function RapidTable.get_boxes_recs at 0x0000019395746DD0>
└ <rapid_table.main.RapidTable object at 0x00000193AD23EA10>

File "C:\IA\MinerU\env_2\lib\site-packages\rapid_table\main.py", line 69, in get_boxes_recs
dt_boxes, rec_res, scores = list(zip(*ocr_result))
└ None

TypeError: zip() argument after * must be an iterable, not NoneType

How to reproduce the bug | 如何复现

When processing a specific file I get this error:

2024-12-11 09:56:54.076 | INFO | magic_pdf.model.pdf_extract_kit:call:226 - -----page total time: 1.12-----
2024-12-11 09:56:54.573 | INFO | magic_pdf.model.pdf_extract_kit:call:153 - layout detection time: 0.5
2024-12-11 09:56:54.690 | INFO | magic_pdf.model.pdf_extract_kit:call:161 - mfd time: 0.11
2024-12-11 09:56:54.691 | INFO | magic_pdf.model.pdf_extract_kit:call:168 - formula nums: 0, mfr time: 0.0
2024-12-11 09:56:54.691 | INFO | magic_pdf.model.pdf_extract_kit:call:194 - ocr time: 0.0
2024-12-11 09:56:55.762 | ERROR | main:pdf_parse_main:83 - zip() argument after * must be an iterable, not NoneType
Traceback (most recent call last):

File "C:\IA\MinerU\processing.py", line 88, in
pdf_parse_main(
└ <function pdf_parse_main at 0x00000193ADFB3E20>

File "C:\IA\MinerU\processing.py", line 57, in pdf_parse_main
pipe.pipe_analyze() # Document analysis
│ └ <function UNIPipe.pipe_analyze at 0x00000193E2BA1870>
└ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x00000193E2C70070>

File "C:\IA\MinerU\env_2\lib\site-packages\magic_pdf\pipe\UNIPipe.py", line 37, in pipe_analyze
self.model_list = doc_analyze(self.pdf_bytes, ocr=True,
│ │ │ │ └ b'%PDF-1.7\n%\xbf\xf7\xa2\xfe\n1 0 obj\n<< /Metadata 30 0 R /Pages 31 0 R /Type /Catalog >>\nendobj\n2 0 obj\n<< /Type /ObjSt...
│ │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x00000193E2C70070>
│ │ └ <function doc_analyze at 0x00000193E241C670>
│ └ []
└ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x00000193E2C70070>

File "C:\IA\MinerU\env_2\lib\site-packages\magic_pdf\model\doc_analyze_by_custom_model.py", line 166, in doc_analyze
result = custom_model(img)
│ └ array([[[255, 255, 255],
│ [255, 255, 255],
│ [255, 255, 255],
│ ...,
│ [255, 255, 255],
│ [255...
└ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x00000193E2C384C0>

File "C:\IA\MinerU\env_2\lib\site-packages\magic_pdf\model\pdf_extract_kit.py", line 211, in call
html_code, table_cell_bboxes, elapse = self.table_model.predict(new_image)
│ │ │ │ └ <PIL.Image.Image image mode=RGB size=1398x2008 at 0x19395916920>
│ │ │ └ <function RapidTableModel.predict at 0x0000019395762DD0>
│ │ └ <magic_pdf.model.sub_modules.table.rapidtable.rapid_table.RapidTableModel object at 0x000001939627FA60>
│ └ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x00000193E2C384C0>
└ None

File "C:\IA\MinerU\env_2\lib\site-packages\magic_pdf\model\sub_modules\table\rapidtable\rapid_table.py", line 13, in predict
html_code, table_cell_bboxes, elapse = self.table_model(np.asarray(image), ocr_result)
│ │ │ │ │ └ None
│ │ │ │ └ <PIL.Image.Image image mode=RGB size=1398x2008 at 0x19395916920>
│ │ │ └
│ │ └ <module 'numpy' from 'C:\IA\MinerU\env_2\lib\site-packages\numpy\init.py'>
│ └ <rapid_table.main.RapidTable object at 0x00000193AD23EA10>
└ <magic_pdf.model.sub_modules.table.rapidtable.rapid_table.RapidTableModel object at 0x000001939627FA60>

File "C:\IA\MinerU\env_2\lib\site-packages\rapid_table\main.py", line 55, in call
dt_boxes, rec_res = self.get_boxes_recs(ocr_result, h, w)
│ │ │ │ └ 1398
│ │ │ └ 2008
│ │ └ None
│ └ <function RapidTable.get_boxes_recs at 0x0000019395746DD0>
└ <rapid_table.main.RapidTable object at 0x00000193AD23EA10>

File "C:\IA\MinerU\env_2\lib\site-packages\rapid_table\main.py", line 69, in get_boxes_recs
dt_boxes, rec_res, scores = list(zip(*ocr_result))
└ None

TypeError: zip() argument after * must be an iterable, not NoneType

Operating system | 操作系统

Windows

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.9.x

Device mode | 设备模式

cuda

myhloli · 2024-12-11T15:40:41Z

Can you upload the sample file？

JeisonJimenezA added the bug Something isn't working label Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: zip() argument after * must be an iterable, not NoneType #1269

TypeError: zip() argument after * must be an iterable, not NoneType #1269

JeisonJimenezA commented Dec 11, 2024

myhloli commented Dec 11, 2024

TypeError: zip() argument after * must be an iterable, not NoneType #1269

TypeError: zip() argument after * must be an iterable, not NoneType #1269

Comments

JeisonJimenezA commented Dec 11, 2024

Description of the bug | 错误描述

How to reproduce the bug | 如何复现

Operating system | 操作系统

Python version | Python 版本

Software version | 软件版本 (magic-pdf --version)

Device mode | 设备模式

myhloli commented Dec 11, 2024