feat(rag): add excel reader to support excel files in the rag module #872

qbc2016 · 2025-10-28T08:47:09Z

AgentScope Version

[The version of AgentScope you are working on, e.g. import agentscope; print(agentscope.__version__)]

Description

As the title says.

Checklist

Please check the following items before code is ready to be reviewed.

Code has been formatted with pre-commit run --all-files command
All tests are passing
Docstrings are in Google style
Related documentation has been updated (e.g. links, examples, etc.)
Code is ready for review

cla-assistant · 2025-12-02T09:51:31Z

All committers have signed the CLA.

Copilot

Pull request overview

This pull request adds an ExcelReader to the RAG module to support reading and chunking Excel files (.xlsx, .xls). The implementation follows the pattern of existing readers (PDFReader, WordReader) and includes support for extracting text, tables, and images from Excel sheets.

Changes:

Added new ExcelReader class with support for table extraction in Markdown or JSON format, image extraction, and sheet-level processing control
Updated WordReader hash algorithm from MD5 to SHA256 for document ID generation
Added comprehensive test cases for the Excel reader functionality
Added required dependencies (pandas, openpyxl) to the full extras list

Reviewed changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/agentscope/rag/_reader/_excel_reader.py	New ExcelReader implementation with 630 lines including table/image extraction
tests/rag_reader_test.py	Added test case for Excel reader with images and tables
tests/test.xlsx	Binary Excel test file with sample data
src/agentscope/rag/_reader/_word_reader.py	Changed hash algorithm from MD5 to SHA256
src/agentscope/rag/_reader/init.py	Exported ExcelReader class
src/agentscope/rag/init.py	Exported ExcelReader at package level
pyproject.toml	Added pandas and openpyxl to full dependencies

tests/rag_reader_test.py

src/agentscope/rag/_reader/_word_reader.py

pyproject.toml

DavdGao

Similar, why we remove the example.docx here?

src/agentscope/rag/_reader/_excel_reader.py

DavdGao · 2026-01-17T13:33:34Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new ExcelReader for the RAG module, which is a great addition for handling Excel files. The implementation correctly uses pandas and openpyxl to extract table data and images. The new reader is properly integrated and new dependencies are added to pyproject.toml.

I've found a couple of issues that need attention:

The include_cell_coordinates parameter in ExcelReader is currently unused.
There's a bug in how Markdown tables are generated: pipe characters are not escaped correctly, which will lead to rendering issues.
The new tests include some debugging code that should be removed and also have assertions that rely on the incorrect table generation, which will need to be updated.

Additionally, the update to use sha256 for document ID generation in WordReader is a good improvement for consistency.

Overall, this is a solid contribution with a few minor fixes required.

src/agentscope/rag/_reader/_excel_reader.py

tests/rag_reader_test.py

src/agentscope/rag/_reader/_excel_reader.py

tests/rag_reader_test.py

qbc2016 · 2026-01-19T06:31:41Z

Similar, why we remove the example.docx here?

Since it's not used anywhere and is just a legacy dependency.

qbc2016 · 2026-01-19T06:59:42Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new ExcelReader to the RAG module, enabling the processing of Excel files. The implementation is robust, covering text, tables, and images, and includes good error handling and comprehensive unit tests. The use of lazy loading for dependencies and the structured approach to extracting and ordering content from sheets are well done.

I have a couple of suggestions for improvement:

For performance and safety, it's better to open the Excel workbook in read-only mode.
There is an inconsistency in the hashing algorithm used for generating document IDs across different readers, which could be standardized for better maintainability.

Overall, this is a great addition to the project's capabilities.

src/agentscope/rag/_reader/_excel_reader.py

DavdGao

lgtm

Add excel reader

cbc4e5e

qbc2016 mentioned this pull request Oct 28, 2025

Extend the current RAG implementation #811

Open

10 tasks

qbc2016 added 2 commits October 28, 2025 17:03

update

3d4d9ca

add packages

2ea023d

DavdGao self-requested a review October 29, 2025 02:12

DavdGao added Ready for Review Roadmap The development plan module: rag labels Oct 29, 2025

qbc2016 added 2 commits November 3, 2025 15:12

merge main and fix conflicts

ff528a6

update

8d49df4

DavdGao changed the title ~~Implement Excel Reader~~ feat(rag): add excel reader to support excel files in the rag module Nov 3, 2025

add test file

9ac4463

DavdGao requested a review from Copilot January 16, 2026 05:55

Copilot started reviewing on behalf of DavdGao January 16, 2026 05:56 View session

Copilot AI reviewed Jan 16, 2026

View reviewed changes

tests/rag_reader_test.py Outdated Show resolved Hide resolved

src/agentscope/rag/_reader/_word_reader.py Outdated Show resolved Hide resolved

pyproject.toml Outdated Show resolved Hide resolved

DavdGao reviewed Jan 17, 2026

View reviewed changes

src/agentscope/rag/_reader/_excel_reader.py Outdated Show resolved Hide resolved

src/agentscope/rag/_reader/_excel_reader.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Jan 17, 2026

View reviewed changes

src/agentscope/rag/_reader/_excel_reader.py Outdated Show resolved Hide resolved

tests/rag_reader_test.py Show resolved Hide resolved

src/agentscope/rag/_reader/_excel_reader.py Show resolved Hide resolved

tests/rag_reader_test.py Outdated Show resolved Hide resolved

qbc2016 added 2 commits January 19, 2026 11:27

merge main and fix conflicts

eb155bb

modified according to comments

40d982a

gemini-code-assist bot reviewed Jan 19, 2026

View reviewed changes

src/agentscope/rag/_reader/_excel_reader.py Show resolved Hide resolved

src/agentscope/rag/_reader/_excel_reader.py Show resolved Hide resolved

DavdGao approved these changes Jan 19, 2026

View reviewed changes

DavdGao merged commit c29bdde into agentscope-ai:main Jan 19, 2026
11 checks passed

feat(rag): add excel reader to support excel files in the rag module #872

feat(rag): add excel reader to support excel files in the rag module #872

Uh oh!

Conversation

qbc2016 commented Oct 28, 2025

AgentScope Version

Description

Checklist

Uh oh!

cla-assistant bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DavdGao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DavdGao commented Jan 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qbc2016 commented Jan 19, 2026

Uh oh!

qbc2016 commented Jan 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

DavdGao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cla-assistant bot commented Dec 2, 2025 •

edited

Loading