Skip to content

v0.2.0

Choose a tag to compare

@whhe whhe released this 16 Dec 04:00
· 448 commits to main since this release
575d9a2

Release Notes

v0.2.0

RAGFlow Integration

This release integrates RAGFlow updates from v0.21.1 to v0.22.1, bringing the following improvements:

From RAGFlow v0.22.1:

  • Agent: Supports exporting Agent outputs in Word or Markdown formats
  • Agent: Adds a List operations component
  • Agent: Adds a Variable aggregator component
  • Data sources: Supports S3-compatible data sources, e.g., MinIO
  • Data sources: Adds data synchronization with JIRA
  • Continues the redesign of the Profile page layouts
  • Upgrades the Flask web framework from synchronous to asynchronous, increasing concurrency and preventing blocking issues caused when requesting upstream LLM services

From RAGFlow v0.22.0:

  • Dataset: Supports data synchronization from five online sources (AWS S3, Google Drive, Notion, Confluence, and Discord)
  • Dataset: RAPTOR can be built across an entire dataset or on individual documents
  • Ingestion pipeline: Supports Docling document parsing in the Parser component
  • Launches a new administrative Web UI dashboard for graphical user management and service status monitoring
  • Agent: Supports structured output
  • Agent: Supports metadata filtering in the Retrieval component
  • Agent: Introduces a Variable aggregator component with data operation and session variable definition capabilities
  • Upgrades RAGFlow's document engine Infinity to v0.6.5

New Features

  • Optimized Gotenberg Functions (#7)

    • Enhanced document conversion capabilities
  • OceanBase Docker Configuration (#4)

    • Updated OceanBase docker configuration for better deployment
  • Enhanced Search Performance (#17)

    • Improved search functionality for better performance

Improvements

  • Refactored Title and Regex Based Chunk Method (#16)

    • Improved chunking logic for better document processing
  • Updated Merging Logic in split_with_title_chunks (#8)

    • Enhanced chunk merging algorithm
  • Simplified String Escaping (#13)

    • Refactored string escaping in get_value_str and OBConnection for better maintainability
  • Docker Configuration and Documentation (#12)

    • Updated docker configurations and README
  • Build Workflow (#9)

    • Added workflow to build dev docker image

Bug Fixes

  • Fixed PowerRAG Server Timeout Error (#5)

    • Resolved timeout issues in PowerRAG server
  • Fixed Image Source Lost in Smart Chunks (#11)

    • Fixed issue where image sources were lost during smart chunking
  • Fixed Security Alerts and Chunk Saved Error (#19)

    • Resolved security issues and chunk saving errors

Contributors

Thanks to all contributors who made this release possible:


Full Changelog: v0.1.0...v0.2.0


发布说明

v0.2.0

RAGFlow 集成

本次发布集成了 RAGFlow 从 v0.21.1v0.22.1 的更新,包含以下改进:

来自 RAGFlow v0.22.1:

  • Agent:支持导出 Agent 输出为 Word 或 Markdown 格式
  • Agent:新增列表操作组件
  • Agent:新增变量聚合器组件
  • 数据源:支持 S3 兼容数据源,例如 MinIO
  • 数据源:新增 JIRA 数据同步功能
  • 继续重新设计个人中心页面布局
  • 将 Flask Web 框架从同步升级为异步,提高并发性能,防止请求上游 LLM 服务时出现阻塞问题

来自 RAGFlow v0.22.0:

  • 数据集:支持从五个在线数据源同步数据(AWS S3、Google Drive、Notion、Confluence 和 Discord)
  • 数据集:RAPTOR 可以在整个数据集或单个文档上构建
  • 数据摄取管道:在解析器组件中支持 Docling 文档解析
  • 推出新的管理 Web UI 仪表板,用于图形化用户管理和服务状态监控
  • Agent:支持结构化输出
  • Agent:在检索组件中支持元数据过滤
  • Agent:引入变量聚合器组件,具有数据操作和会话变量定义功能
  • 将 RAGFlow 的文档引擎 Infinity 升级至 v0.6.5

新功能

  • 优化 Gotenberg 功能 (#7)

    • 增强文档转换能力
  • OceanBase Docker 配置 (#4)

    • 更新 OceanBase docker 配置,优化部署体验
  • 增强搜索性能 (#17)

    • 优化搜索功能,提升性能

改进

  • 重构基于标题和正则的分块方法 (#16)

    • 改进分块逻辑,提升文档处理效果
  • 更新 split_with_title_chunks 的合并逻辑 (#8)

    • 增强分块合并算法
  • 简化字符串转义 (#13)

    • 重构 get_value_strOBConnection 中的字符串转义逻辑,提升可维护性
  • Docker 配置和文档 (#12)

    • 更新 docker 配置和 README
  • 构建工作流 (#9)

    • 新增开发版 docker 镜像构建工作流

错误修复

  • 修复 PowerRAG 服务器超时错误 (#5)

    • 解决 PowerRAG 服务器超时问题
  • 修复智能分块中图片源丢失问题 (#11)

    • 修复智能分块过程中图片源丢失的问题
  • 修复安全告警和分块保存错误 (#19)

    • 解决安全问题和分块保存错误

贡献者

感谢所有为本版本做出贡献的开发者:


完整更新日志: v0.1.0...v0.2.0