Skip to content

Conversation

@Soul-Code
Copy link

@Soul-Code Soul-Code commented Jan 26, 2026

Add Somark tool plugin for converting documents (PDFs, images, etc.) into structured Markdown or JSON format using the Somark API.

Features:

  • Document extraction with OXR (Optical Everything Recognition) algorithm

  • Support for multiple file formats (PDF, PNG, JPG, etc.)

  • Configurable API endpoint and authentication

  • Max file size: 50MB/50 pages

Related Issues or Context

This PR contains Changes to Non-Plugin

  • Documentation
  • Other

This PR contains Changes to Non-LLM Models Plugin

  • I have Run Comprehensive Tests Relevant to My Changes

This PR contains Changes to LLM Models Plugin

  • My Changes Affect Message Flow Handling (System Messages and User→Assistant Turn-Taking)
  • My Changes Affect Tool Interaction Flow (Multi-Round Usage and Output Handling, for both Agent App and Agent Node)
  • My Changes Affect Multimodal Input Handling (Images, PDFs, Audio, Video, etc.)
  • My Changes Affect Multimodal Output Generation (Images, Audio, Video, etc.)
  • My Changes Affect Structured Output Format (JSON, XML, etc.)
  • My Changes Affect Token Consumption Metrics
  • My Changes Affect Other LLM Functionalities (Reasoning Process, Grounding, Prompt Caching, etc.)
  • Other Changes (Add New Models, Fix Model Parameters etc.)

Version Control (Any Changes to the Plugin Will Require Bumping the Version)

  • I have Bumped Up the Version in Manifest.yaml (Top-Level Version Field, Not in Meta Section)

Dify Plugin SDK Version

  • I have Ensured dify_plugin>=0.3.0,<0.6.0 is in requirements.txt (SDK docs)

Environment Verification (If Any Code Changes)

Local Deployment Environment

  • Dify Version is: , I have Tested My Changes on Local Deployment Dify with a Clean Environment That Matches the Production Configuration.

SaaS Environment

  • I have Tested My Changes on cloud.dify.ai with a Clean Environment That Matches the Production Configuration

Add Somark tool plugin for converting documents (PDFs, images, etc.) into structured Markdown or JSON format using the Somark API.

Features:

- Document extraction with OXR (Optical Everything Recognition) algorithm

- Support for multiple file formats (PDF, PNG, JPG, etc.)

- Configurable API endpoint and authentication

- Max file size: 50MB/50 pages
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jan 26, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Soul-Code, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the Somark DocAI platform as a new plugin within Dify, significantly enhancing its document processing capabilities. Users can now convert diverse document types, such as PDFs and images, into structured Markdown or JSON outputs. This integration provides advanced document understanding and data extraction, facilitating the incorporation of document content into LLM training, RAG systems, and intelligent agents.

Highlights

  • New Somark Plugin Integration: This pull request introduces a new Dify plugin for Somark, a DocAI platform designed for advanced document processing.
  • Document AI Capabilities: The plugin enables the conversion of various document types, including PDFs and images, into structured Markdown or JSON formats. It leverages Somark's proprietary 'OXR' (Optical Everything Recognition) algorithm for precise content extraction.
  • Seamless Workflow Integration: A new 'Extract Document' tool is provided, allowing users to easily integrate Somark's document parsing capabilities into their Dify workflows. Configuration requires a Somark API Key for authentication.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@dosubot dosubot bot added the enhancement New feature or request label Jan 26, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new somark plugin, which is a great addition. The code is well-structured, but I've found a few issues that should be addressed before merging. These include a critical bug in API URL construction, missing credential validation, potential runtime errors, and several inconsistencies in metadata and documentation. Addressing these points will improve the plugin's robustness and user experience.

Comment on lines +7 to +8
en_US: https://somark.tech/api/v1/extract
zh_Hans: https://somark.tech/api/v1/extract
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The placeholder for base_url is https://somark.tech/api/v1/extract, which is misleading because the code appends /acc_sync to it. To prevent configuration errors, the placeholder should reflect the expected base path. Please update the placeholder to https://somark.tech/api/v1.

      en_US: https://somark.tech/api/v1
      zh_Hans: https://somark.tech/api/v1


# 2. 获取配置
# 默认使用 v1 接口
base_url = self.runtime.credentials.get("base_url", "https://somark.tech/api/v1/extract")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The default base_url is set to https://somark.tech/api/v1/extract, but the code on line 31 appends /acc_sync, resulting in an incorrect URL https://somark.tech/api/v1/extract/acc_sync. This will cause API requests to fail with the default configuration. The base URL should only contain the base path of the API. Please change the default value to https://somark.tech/api/v1. This also needs to be corrected in provider/somark.yaml.

Suggested change
base_url = self.runtime.credentials.get("base_url", "https://somark.tech/api/v1/extract")
base_url = self.runtime.credentials.get("base_url", "https://somark.tech/api/v1")

Comment on lines +5 to +10
def _validate_credentials(self, credentials: dict[str, Any]) -> None:
"""
校验凭证是否有效
"""
# 暂时不做严格校验,直接通过
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The _validate_credentials method is currently a placeholder and does not perform any validation. This means users will only discover that their credentials are wrong when they try to use the tool, which is a poor user experience. It is a security and usability best practice to validate credentials when they are first configured. Please implement this method to make a lightweight API call to an endpoint (like a user status or ping endpoint) to verify that the api_key and base_url are valid.

return

# 7. 处理响应
result = response.json()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The call response.json() will raise a json.JSONDecodeError if the API returns a non-JSON response (e.g., an HTML error page from a proxy or server error). This unhandled exception will be caught by the generic except Exception, but it's better to handle it specifically to provide a more informative error message. Please wrap the response.json() call in a try...except json.JSONDecodeError block.

Comment on lines +3 to +11
author: soulcode
name: somark
label:
en_US: Somark
zh_Hans: Somark
description:
en_US: SoMark is a DocAI that can convert diverse documents—such as PDFs, images, and more—into structured Markdown or JSON. It is designed to work seamlessly across all scenarios.
zh_Hans: SoMark 是一款 DocAI 产品,能够将各种文档(如 PDF、图片等)转换为结构化的 Markdown 或 JSON 格式。它旨在无缝适用于所有场景。
icon: icon.png
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are a couple of metadata inconsistencies across the plugin files:

  1. Missing Icon: icon.png is referenced on line 11 (and also in provider/somark.yaml), but the file is not included in the pull request. Please add the icon file.
  2. Inconsistent Author Name: The author is soulcode on line 3, but SoulCode in provider/somark.yaml and tools/extract.yaml. For consistency, please use the same name across all files. SoulCode is recommended.
author: SoulCode


从基础的版面分割和阅读顺序还原,到复杂的元素(如表格、公式、图片,甚至化学符号),所有组件都能被准确提取和重构。输出的是文档的完整、高度结构化的表示。

![](./_assets/chatu1.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This README file contains references to images (e.g., chatu1.png on this line, and others on lines 127 and 137) located in an _assets directory. However, this directory and the images are not included in the pull request, which will lead to broken images in the documentation. Please add the _assets folder and its contents.

def _invoke(self, tool_parameters: Dict[str, Any]) -> Generator[ToolInvokeMessage, None, None]:
# 1. 获取参数
file = tool_parameters.get("file")
lang = "auto"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The lang parameter is hardcoded to "auto". While this is a reasonable default, allowing users to specify the document's language can improve extraction accuracy. Please consider adding lang as an optional tool parameter in tools/extract.yaml and using its value here.

For example, in tools/extract.yaml:

- name: lang
  type: string
  required: false
  label:
    en_US: Language
    zh_Hans: 语言
  human_description:
    en_US: "Language of the document. Default is 'auto' for automatic detection."
    zh_Hans: "文档语言。默认为 'auto' 自动检测。"
  form: llm
  default: 'auto'

Then you can retrieve it here with lang = tool_parameters.get("lang", "auto").

human:
en_US: Convert various document types—including PDFs, images, and more—into structured Markdown or JSON using the Somark API.
zh_Hans: 使用 Somark API 将各种文档(如 PDF、图片等)转换为结构化的 Markdown 或 JSON 格式。
llm: A precise and reliable tool that utilizes the Somark API to convert documents (including PDFs, images, and DOC files) into clean, structured Markdown or JSON format, preserving the original layout and content hierarchy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The llm description states that the tool supports DOC files. However, the README.md file lists DOC and DOCX support as a future feature. This creates an inconsistency and could mislead users. Please update the description to accurately reflect the currently supported file types.

    llm: A precise and reliable tool that utilizes the Somark API to convert documents (including PDFs and images) into clean, structured Markdown or JSON format, preserving the original layout and content hierarchy.

Copy link
Member

@crazywoola crazywoola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments.

@@ -0,0 +1,31 @@
version: 0.0.1
type: plugin
author: soulcode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The author in this repo should be langgenius, if you want to remain this author unchanged, please submit a packaged plugin to this repo https://github.com/langgenius/dify-plugins/pulls instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants