Fix markdown reader image path #334

Ceceliachenen · 2025-01-07T08:40:55Z

No description provided.

github-actions · 2025-01-07T08:56:45Z

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines	Covered	Coverage	Threshold	Status
8633	4256	49%	40%	🟢

New Files

No new covered files...

Modified Files

File	Coverage	Status
src/pai_rag/integrations/readers/pai_html_reader.py	89%	🟢
src/pai_rag/integrations/readers/pai_markdown_reader.py	28%	🟢
TOTAL	59%	🟢

updated for commit: fd90c57 by action🐍

moria97 · 2025-01-07T09:17:48Z

src/pai_rag/integrations/readers/pai_markdown_reader.py

@@ -55,9 +58,12 @@ def replace_image_paths(self, markdown_name: str, content: str):
        for match in html_image_matches:


突然想到,html文件里可能也要做这个处理

moria97 · 2025-01-07T09:59:52Z

src/pai_rag/integrations/readers/pai_markdown_reader.py

@@ -55,9 +58,12 @@ def replace_image_paths(self, markdown_name: str, content: str):
        for match in html_image_matches:
            full_match = match.group(0)  # 整个匹配
            local_url = match.group(1)  # 捕获的URL
+            image_name = os.path.basename(local_url)


image有可能会有上层目录，比如"figures/docs/1.jpg"

建议这样写

def is_url(url): """判断是否为 URL""" try: result = urlparse(url) return all([result.scheme, result.netloc]) except ValueError: return False base_dir = os.path.basedir(markdown_path) if not is_url(image_path): image_path = os.path.join(base_dir, image_path) #绝对路径不会被合并

image在上传的时候，上层目录没有被保留

…_path

moria97 · 2025-01-08T09:18:56Z

src/pai_rag/integrations/readers/pai_html_reader.py

            else:
-                content = content.replace(f"![{alt_text}]({image_url})", "")
+                content = content.replace(full_match, "")
+        for match in html_image_matches:


这段和上面165-178行看起来一模一样？可以写到一起吗

fix_markdown_reader_image_path

ed4b3b9

Ceceliachenen requested a review from moria97 January 7, 2025 08:40

moria97 changed the title ~~fix_markdown_reader_image_path~~ Fix markdown reader image path Jan 7, 2025

moria97 reviewed Jan 7, 2025

View reviewed changes

Ceceliachenen added 2 commits January 8, 2025 10:32

fix read local path

72e3609

Merge branch 'feature' into personal/ranxia/fix_markdown_reader_image…

6b47244

…_path

moria97 reviewed Jan 8, 2025

View reviewed changes

remove duplicate

fd90c57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix markdown reader image path #334

Fix markdown reader image path #334

Ceceliachenen commented Jan 7, 2025

github-actions bot commented Jan 7, 2025 •

edited

Loading

moria97 Jan 7, 2025

moria97 Jan 7, 2025

moria97 Jan 7, 2025

Ceceliachenen Jan 7, 2025

moria97 Jan 8, 2025

		@@ -55,9 +58,12 @@ def replace_image_paths(self, markdown_name: str, content: str):
		for match in html_image_matches:

Fix markdown reader image path #334

Are you sure you want to change the base?

Fix markdown reader image path #334

Conversation

Ceceliachenen commented Jan 7, 2025

github-actions bot commented Jan 7, 2025 • edited Loading

☂️ Python Coverage

Overall Coverage

New Files

Modified Files

moria97 Jan 7, 2025

Choose a reason for hiding this comment

moria97 Jan 7, 2025

Choose a reason for hiding this comment

moria97 Jan 7, 2025

Choose a reason for hiding this comment

Ceceliachenen Jan 7, 2025

Choose a reason for hiding this comment

moria97 Jan 8, 2025

Choose a reason for hiding this comment

github-actions bot commented Jan 7, 2025 •

edited

Loading