Skip to content

Commit 72bf34f

Browse files
authored
fix issues with exporting tables containing cells that span multiple rows and columns (#119)
* 1、解决导出表格标注时添加colspan和rowspan时的异常 2、解决导出的gt文件中gt属性中html标签合规的问题 * fix code style * 修复单元格占多行又占多列导出报错的问题。issues:导出表格标注报错 #113
1 parent 8bb57f1 commit 72bf34f

File tree

2 files changed

+10
-9
lines changed

2 files changed

+10
-9
lines changed

PPOCRLabel.py

-1
Original file line numberDiff line numberDiff line change
@@ -3181,7 +3181,6 @@ def exportJSON(self):
31813181
"""
31823182
export PPLabel and CSV to JSON (PubTabNet)
31833183
"""
3184-
import pandas as pd
31853184

31863185
# automatically save annotations
31873186
self.saveFilestate()

libs/utils.py

+10-8
Original file line numberDiff line numberDiff line change
@@ -232,14 +232,16 @@ def convert_token(html_list):
232232
elif col == "td":
233233
token_list.extend(["<td>", "</td>"])
234234
else:
235-
token_list.append("<td")
236-
if "colspan" in col:
237-
_, n = col.split("colspan=")
238-
token_list.append(' colspan="{}"'.format(int(n)))
239-
if "rowspan" in col:
240-
_, n = col.split("rowspan=")
241-
token_list.append(' rowspan="{}"'.format(int(n)))
242-
token_list.extend([">", "</td>"])
235+
token_list.append("<td") # Start the td tag
236+
# Use regex to match "colspan" and "rowspan" attributes and their values
237+
colspan_match = re.search(r"colspan=(\d+)", col)
238+
rowspan_match = re.search(r"rowspan=(\d+)", col)
239+
if colspan_match:
240+
token_list.append(f' colspan="{colspan_match.group(1)}"')
241+
if rowspan_match:
242+
token_list.append(f' rowspan="{rowspan_match.group(1)}"')
243+
token_list.append(">") # End the opening td tag
244+
token_list.append("</td>") # Close the td tag
243245
token_list.append("</tr>")
244246
token_list.append("</tbody>")
245247

0 commit comments

Comments
 (0)