-
Notifications
You must be signed in to change notification settings - Fork 701
Feat:somark plugins #2487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feat:somark plugins #2487
The head ref may contain hidden characters: "feat\uFF1Asomark-plugins"
Conversation
Add Somark tool plugin for converting documents (PDFs, images, etc.) into structured Markdown or JSON format using the Somark API. Features: - Document extraction with OXR (Optical Everything Recognition) algorithm - Support for multiple file formats (PDF, PNG, JPG, etc.) - Configurable API endpoint and authentication - Max file size: 50MB/50 pages
Summary of ChangesHello @Soul-Code, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates the Somark DocAI platform as a new plugin within Dify, significantly enhancing its document processing capabilities. Users can now convert diverse document types, such as PDFs and images, into structured Markdown or JSON outputs. This integration provides advanced document understanding and data extraction, facilitating the incorporation of document content into LLM training, RAG systems, and intelligent agents. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new somark plugin, which is a great addition. The code is well-structured, but I've found a few issues that should be addressed before merging. These include a critical bug in API URL construction, missing credential validation, potential runtime errors, and several inconsistencies in metadata and documentation. Addressing these points will improve the plugin's robustness and user experience.
| en_US: https://somark.tech/api/v1/extract | ||
| zh_Hans: https://somark.tech/api/v1/extract |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The placeholder for base_url is https://somark.tech/api/v1/extract, which is misleading because the code appends /acc_sync to it. To prevent configuration errors, the placeholder should reflect the expected base path. Please update the placeholder to https://somark.tech/api/v1.
en_US: https://somark.tech/api/v1
zh_Hans: https://somark.tech/api/v1|
|
||
| # 2. 获取配置 | ||
| # 默认使用 v1 接口 | ||
| base_url = self.runtime.credentials.get("base_url", "https://somark.tech/api/v1/extract") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default base_url is set to https://somark.tech/api/v1/extract, but the code on line 31 appends /acc_sync, resulting in an incorrect URL https://somark.tech/api/v1/extract/acc_sync. This will cause API requests to fail with the default configuration. The base URL should only contain the base path of the API. Please change the default value to https://somark.tech/api/v1. This also needs to be corrected in provider/somark.yaml.
| base_url = self.runtime.credentials.get("base_url", "https://somark.tech/api/v1/extract") | |
| base_url = self.runtime.credentials.get("base_url", "https://somark.tech/api/v1") |
| def _validate_credentials(self, credentials: dict[str, Any]) -> None: | ||
| """ | ||
| 校验凭证是否有效 | ||
| """ | ||
| # 暂时不做严格校验,直接通过 | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _validate_credentials method is currently a placeholder and does not perform any validation. This means users will only discover that their credentials are wrong when they try to use the tool, which is a poor user experience. It is a security and usability best practice to validate credentials when they are first configured. Please implement this method to make a lightweight API call to an endpoint (like a user status or ping endpoint) to verify that the api_key and base_url are valid.
| return | ||
|
|
||
| # 7. 处理响应 | ||
| result = response.json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call response.json() will raise a json.JSONDecodeError if the API returns a non-JSON response (e.g., an HTML error page from a proxy or server error). This unhandled exception will be caught by the generic except Exception, but it's better to handle it specifically to provide a more informative error message. Please wrap the response.json() call in a try...except json.JSONDecodeError block.
| author: soulcode | ||
| name: somark | ||
| label: | ||
| en_US: Somark | ||
| zh_Hans: Somark | ||
| description: | ||
| en_US: SoMark is a DocAI that can convert diverse documents—such as PDFs, images, and more—into structured Markdown or JSON. It is designed to work seamlessly across all scenarios. | ||
| zh_Hans: SoMark 是一款 DocAI 产品,能够将各种文档(如 PDF、图片等)转换为结构化的 Markdown 或 JSON 格式。它旨在无缝适用于所有场景。 | ||
| icon: icon.png |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of metadata inconsistencies across the plugin files:
- Missing Icon:
icon.pngis referenced on line 11 (and also inprovider/somark.yaml), but the file is not included in the pull request. Please add the icon file. - Inconsistent Author Name: The author is
soulcodeon line 3, butSoulCodeinprovider/somark.yamlandtools/extract.yaml. For consistency, please use the same name across all files.SoulCodeis recommended.
author: SoulCode|
|
||
| 从基础的版面分割和阅读顺序还原,到复杂的元素(如表格、公式、图片,甚至化学符号),所有组件都能被准确提取和重构。输出的是文档的完整、高度结构化的表示。 | ||
|
|
||
|  |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This README file contains references to images (e.g., chatu1.png on this line, and others on lines 127 and 137) located in an _assets directory. However, this directory and the images are not included in the pull request, which will lead to broken images in the documentation. Please add the _assets folder and its contents.
| def _invoke(self, tool_parameters: Dict[str, Any]) -> Generator[ToolInvokeMessage, None, None]: | ||
| # 1. 获取参数 | ||
| file = tool_parameters.get("file") | ||
| lang = "auto" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lang parameter is hardcoded to "auto". While this is a reasonable default, allowing users to specify the document's language can improve extraction accuracy. Please consider adding lang as an optional tool parameter in tools/extract.yaml and using its value here.
For example, in tools/extract.yaml:
- name: lang
type: string
required: false
label:
en_US: Language
zh_Hans: 语言
human_description:
en_US: "Language of the document. Default is 'auto' for automatic detection."
zh_Hans: "文档语言。默认为 'auto' 自动检测。"
form: llm
default: 'auto'Then you can retrieve it here with lang = tool_parameters.get("lang", "auto").
| human: | ||
| en_US: Convert various document types—including PDFs, images, and more—into structured Markdown or JSON using the Somark API. | ||
| zh_Hans: 使用 Somark API 将各种文档(如 PDF、图片等)转换为结构化的 Markdown 或 JSON 格式。 | ||
| llm: A precise and reliable tool that utilizes the Somark API to convert documents (including PDFs, images, and DOC files) into clean, structured Markdown or JSON format, preserving the original layout and content hierarchy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The llm description states that the tool supports DOC files. However, the README.md file lists DOC and DOCX support as a future feature. This creates an inconsistency and could mislead users. Please update the description to accurately reflect the currently supported file types.
llm: A precise and reliable tool that utilizes the Somark API to convert documents (including PDFs and images) into clean, structured Markdown or JSON format, preserving the original layout and content hierarchy.
crazywoola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments.
| @@ -0,0 +1,31 @@ | |||
| version: 0.0.1 | |||
| type: plugin | |||
| author: soulcode | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The author in this repo should be langgenius, if you want to remain this author unchanged, please submit a packaged plugin to this repo https://github.com/langgenius/dify-plugins/pulls instead
Add Somark tool plugin for converting documents (PDFs, images, etc.) into structured Markdown or JSON format using the Somark API.
Features:
Document extraction with OXR (Optical Everything Recognition) algorithm
Support for multiple file formats (PDF, PNG, JPG, etc.)
Configurable API endpoint and authentication
Max file size: 50MB/50 pages
Related Issues or Context
This PR contains Changes to Non-Plugin
This PR contains Changes to Non-LLM Models Plugin
This PR contains Changes to LLM Models Plugin
Version Control (Any Changes to the Plugin Will Require Bumping the Version)
VersionField, Not in Meta Section)Dify Plugin SDK Version
dify_plugin>=0.3.0,<0.6.0is in requirements.txt (SDK docs)Environment Verification (If Any Code Changes)
Local Deployment Environment
SaaS Environment