Feat：somark plugins #2487

Soul-Code · 2026-01-26T05:44:38Z

Add Somark tool plugin for converting documents (PDFs, images, etc.) into structured Markdown or JSON format using the Somark API.

Features:

Document extraction with OXR (Optical Everything Recognition) algorithm
Support for multiple file formats (PDF, PNG, JPG, etc.)
Configurable API endpoint and authentication
Max file size: 50MB/50 pages

Related Issues or Context

This PR contains Changes to Non-Plugin

Documentation
Other

This PR contains Changes to Non-LLM Models Plugin

I have Run Comprehensive Tests Relevant to My Changes

This PR contains Changes to LLM Models Plugin

My Changes Affect Message Flow Handling (System Messages and User→Assistant Turn-Taking)

My Changes Affect Tool Interaction Flow (Multi-Round Usage and Output Handling, for both Agent App and Agent Node)

My Changes Affect Multimodal Input Handling (Images, PDFs, Audio, Video, etc.)

My Changes Affect Multimodal Output Generation (Images, Audio, Video, etc.)

My Changes Affect Structured Output Format (JSON, XML, etc.)

My Changes Affect Token Consumption Metrics

My Changes Affect Other LLM Functionalities (Reasoning Process, Grounding, Prompt Caching, etc.)

Other Changes (Add New Models, Fix Model Parameters etc.)

Version Control (Any Changes to the Plugin Will Require Bumping the Version)

I have Bumped Up the Version in Manifest.yaml (Top-Level Version Field, Not in Meta Section)

Dify Plugin SDK Version

I have Ensured dify_plugin>=0.3.0,<0.6.0 is in requirements.txt (SDK docs)

Environment Verification (If Any Code Changes)

Local Deployment Environment

Dify Version is: , I have Tested My Changes on Local Deployment Dify with a Clean Environment That Matches the Production Configuration.

SaaS Environment

I have Tested My Changes on cloud.dify.ai with a Clean Environment That Matches the Production Configuration

Add Somark tool plugin for converting documents (PDFs, images, etc.) into structured Markdown or JSON format using the Somark API. Features: - Document extraction with OXR (Optical Everything Recognition) algorithm - Support for multiple file formats (PDF, PNG, JPG, etc.) - Configurable API endpoint and authentication - Max file size: 50MB/50 pages

gemini-code-assist · 2026-01-26T05:44:56Z

Summary of Changes

Hello @Soul-Code, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the Somark DocAI platform as a new plugin within Dify, significantly enhancing its document processing capabilities. Users can now convert diverse document types, such as PDFs and images, into structured Markdown or JSON outputs. This integration provides advanced document understanding and data extraction, facilitating the incorporation of document content into LLM training, RAG systems, and intelligent agents.

Highlights

New Somark Plugin Integration: This pull request introduces a new Dify plugin for Somark, a DocAI platform designed for advanced document processing.
Document AI Capabilities: The plugin enables the conversion of various document types, including PDFs and images, into structured Markdown or JSON formats. It leverages Somark's proprietary 'OXR' (Optical Everything Recognition) algorithm for precise content extraction.
Seamless Workflow Integration: A new 'Extract Document' tool is provided, allowing users to easily integrate Somark's document parsing capabilities into their Dify workflows. Configuration requires a Somark API Key for authentication.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds a new somark plugin, which is a great addition. The code is well-structured, but I've found a few issues that should be addressed before merging. These include a critical bug in API URL construction, missing credential validation, potential runtime errors, and several inconsistencies in metadata and documentation. Addressing these points will improve the plugin's robustness and user experience.

gemini-code-assist · 2026-01-26T05:46:58Z

tools/somark/provider/somark.yaml

+      en_US: https://somark.tech/api/v1/extract
+      zh_Hans: https://somark.tech/api/v1/extract


The placeholder for base_url is https://somark.tech/api/v1/extract, which is misleading because the code appends /acc_sync to it. To prevent configuration errors, the placeholder should reflect the expected base path. Please update the placeholder to https://somark.tech/api/v1.

en_US: https://somark.tech/api/v1 zh_Hans: https://somark.tech/api/v1

gemini-code-assist · 2026-01-26T05:46:58Z

tools/somark/tools/extract.py

+
+        # 2. 获取配置
+        # 默认使用 v1 接口
+        base_url = self.runtime.credentials.get("base_url", "https://somark.tech/api/v1/extract")


The default base_url is set to https://somark.tech/api/v1/extract, but the code on line 31 appends /acc_sync, resulting in an incorrect URL https://somark.tech/api/v1/extract/acc_sync. This will cause API requests to fail with the default configuration. The base URL should only contain the base path of the API. Please change the default value to https://somark.tech/api/v1. This also needs to be corrected in provider/somark.yaml.

Suggested change

base_url = self.runtime.credentials.get("base_url", "https://somark.tech/api/v1/extract")

base_url = self.runtime.credentials.get("base_url", "https://somark.tech/api/v1")

gemini-code-assist · 2026-01-26T05:46:59Z

tools/somark/provider/somark.py

+    def _validate_credentials(self, credentials: dict[str, Any]) -> None:
+        """
+        校验凭证是否有效
+        """
+        # 暂时不做严格校验，直接通过
+        pass


The _validate_credentials method is currently a placeholder and does not perform any validation. This means users will only discover that their credentials are wrong when they try to use the tool, which is a poor user experience. It is a security and usability best practice to validate credentials when they are first configured. Please implement this method to make a lightweight API call to an endpoint (like a user status or ping endpoint) to verify that the api_key and base_url are valid.

gemini-code-assist · 2026-01-26T05:46:59Z

tools/somark/tools/extract.py

+                return
+
+            # 7. 处理响应
+            result = response.json()


The call response.json() will raise a json.JSONDecodeError if the API returns a non-JSON response (e.g., an HTML error page from a proxy or server error). This unhandled exception will be caught by the generic except Exception, but it's better to handle it specifically to provide a more informative error message. Please wrap the response.json() call in a try...except json.JSONDecodeError block.

gemini-code-assist · 2026-01-26T05:46:59Z

tools/somark/manifest.yaml

+author: soulcode
+name: somark
+label:
+  en_US: Somark
+  zh_Hans: Somark
+description:
+  en_US: SoMark is a DocAI that can convert diverse documents—such as PDFs, images, and more—into structured Markdown or JSON. It is designed to work seamlessly across all scenarios.
+  zh_Hans: SoMark 是一款 DocAI 产品，能够将各种文档（如 PDF、图片等）转换为结构化的 Markdown 或 JSON 格式。它旨在无缝适用于所有场景。
+icon: icon.png


There are a couple of metadata inconsistencies across the plugin files:

Missing Icon: icon.png is referenced on line 11 (and also in provider/somark.yaml), but the file is not included in the pull request. Please add the icon file.

Inconsistent Author Name: The author is soulcode on line 3, but SoulCode in provider/somark.yaml and tools/extract.yaml. For consistency, please use the same name across all files. SoulCode is recommended.

author: SoulCode

gemini-code-assist · 2026-01-26T05:46:59Z

tools/somark/README.md

+
+从基础的版面分割和阅读顺序还原，到复杂的元素（如表格、公式、图片，甚至化学符号），所有组件都能被准确提取和重构。输出的是文档的完整、高度结构化的表示。
+
+![](./_assets/chatu1.png)


This README file contains references to images (e.g., chatu1.png on this line, and others on lines 127 and 137) located in an _assets directory. However, this directory and the images are not included in the pull request, which will lead to broken images in the documentation. Please add the _assets folder and its contents.

gemini-code-assist · 2026-01-26T05:46:59Z

tools/somark/tools/extract.py

+    def _invoke(self, tool_parameters: Dict[str, Any]) -> Generator[ToolInvokeMessage, None, None]:
+        # 1. 获取参数
+        file = tool_parameters.get("file")
+        lang = "auto"


The lang parameter is hardcoded to "auto". While this is a reasonable default, allowing users to specify the document's language can improve extraction accuracy. Please consider adding lang as an optional tool parameter in tools/extract.yaml and using its value here.

For example, in tools/extract.yaml:

- name: lang type: string required: false label: en_US: Language zh_Hans: 语言 human_description: en_US: "Language of the document. Default is 'auto' for automatic detection." zh_Hans: "文档语言。默认为 'auto' 自动检测。" form: llm default: 'auto'

Then you can retrieve it here with lang = tool_parameters.get("lang", "auto").

gemini-code-assist · 2026-01-26T05:46:59Z

tools/somark/tools/extract.yaml

+    human:
+      en_US: Convert various document types—including PDFs, images, and more—into structured Markdown or JSON using the Somark API.
+      zh_Hans: 使用 Somark API 将各种文档（如 PDF、图片等）转换为结构化的 Markdown 或 JSON 格式。
+    llm: A precise and reliable tool that utilizes the Somark API to convert documents (including PDFs, images, and DOC files) into clean, structured Markdown or JSON format, preserving the original layout and content hierarchy.


The llm description states that the tool supports DOC files. However, the README.md file lists DOC and DOCX support as a future feature. This creates an inconsistency and could mislead users. Please update the description to accurately reflect the currently supported file types.

llm: A precise and reliable tool that utilizes the Somark API to convert documents (including PDFs and images) into clean, structured Markdown or JSON format, preserving the original layout and content hierarchy.

crazywoola

See comments.

crazywoola · 2026-01-26T06:26:35Z

tools/somark/manifest.yaml

@@ -0,0 +1,31 @@
+version: 0.0.1
+type: plugin
+author: soulcode


The author in this repo should be langgenius, if you want to remain this author unchanged, please submit a packaged plugin to this repo https://github.com/langgenius/dify-plugins/pulls instead

Soul-Code added 3 commits January 26, 2026 13:23

fix: revert dify_plugin version to 0.3.0 for compatibility

2e8c4a0

Merge remote-tracking branch 'upstream/main' into feat：somark-plugins

671f14c

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jan 26, 2026

dosubot bot added the enhancement New feature or request label Jan 26, 2026

gemini-code-assist bot reviewed Jan 26, 2026

View reviewed changes

crazywoola reviewed Jan 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat：somark plugins #2487

Feat：somark plugins #2487

Uh oh!

Soul-Code commented Jan 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

crazywoola left a comment

Uh oh!

crazywoola Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		en_US: https://somark.tech/api/v1/extract
		zh_Hans: https://somark.tech/api/v1/extract

	base_url = self.runtime.credentials.get("base_url", "https://somark.tech/api/v1/extract")
	base_url = self.runtime.credentials.get("base_url", "https://somark.tech/api/v1")


		从基础的版面分割和阅读顺序还原，到复杂的元素（如表格、公式、图片，甚至化学符号），所有组件都能被准确提取和重构。输出的是文档的完整、高度结构化的表示。

		![](./_assets/chatu1.png)

Feat：somark plugins #2487

Are you sure you want to change the base?

Feat：somark plugins #2487

Uh oh!

Conversation

Soul-Code commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues or Context

This PR contains Changes to Non-Plugin

This PR contains Changes to Non-LLM Models Plugin

This PR contains Changes to LLM Models Plugin

Version Control (Any Changes to the Plugin Will Require Bumping the Version)

Dify Plugin SDK Version

Environment Verification (If Any Code Changes)

Local Deployment Environment

SaaS Environment

Uh oh!

gemini-code-assist bot commented Jan 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

crazywoola left a comment

Choose a reason for hiding this comment

Uh oh!

crazywoola Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Soul-Code commented Jan 26, 2026 •

edited

Loading