Skip to content

Conversation

@masumsoft
Copy link

fix: handle nested tables inside list component

  • fixes Account for empty cells in table extraction (xml) #633 by keeping empty cells inside tables as is to keep column structure consistent.
  • Also additionally handles nested tables inside list components and allows texts inside divs to be extracted when favor_recall is set to True.

fix: convert graphic, row and cell to html

@adbar
Copy link
Owner

adbar commented Jun 17, 2025

Hi @masumsoft, could you please make sure that the tests pass?

@masumsoft
Copy link
Author

@adbar I've fixed the root cause and all tests are passing now.

As we are allowing blank columns to remain as is to keep table structure consistent, as a side effect blank link elements inside tables were also passed as table column. The root cause was that the table link density test was not removing blank links, hence failing the test. I've now fixed the table link density test to not blindly allow blank elements.

@adbar
Copy link
Owner

adbar commented Jun 19, 2025

Hi @masumsoft, thanks, could you please fix the mypy error in Python 3.13?

@masumsoft
Copy link
Author

Hi @adbar, I've fixed the mypy error in python 3.13 for incompatible type argument.

@codecov
Copy link

codecov bot commented Jun 23, 2025

Codecov Report

❌ Patch coverage is 40.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.07%. Comparing base (badd594) to head (a041139).

Files with missing lines Patch % Lines
trafilatura/main_extractor.py 33.33% 8 Missing ⚠️
trafilatura/htmlprocessing.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #803      +/-   ##
==========================================
- Coverage   99.29%   99.07%   -0.22%     
==========================================
  Files          21       21              
  Lines        3664     3676      +12     
==========================================
+ Hits         3638     3642       +4     
- Misses         26       34       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@adbar
Copy link
Owner

adbar commented Jun 30, 2025

Hi @masumsoft, please make sure your changes are covered in the unit tests.

@muziqiushan
Copy link

I want to know when this issue can be fixed? really appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants