v0.2.0
Release Notes
v0.2.0
RAGFlow Integration
This release integrates RAGFlow updates from v0.21.1 to v0.22.1, bringing the following improvements:
From RAGFlow v0.22.1:
- Agent: Supports exporting Agent outputs in Word or Markdown formats
- Agent: Adds a List operations component
- Agent: Adds a Variable aggregator component
- Data sources: Supports S3-compatible data sources, e.g., MinIO
- Data sources: Adds data synchronization with JIRA
- Continues the redesign of the Profile page layouts
- Upgrades the Flask web framework from synchronous to asynchronous, increasing concurrency and preventing blocking issues caused when requesting upstream LLM services
From RAGFlow v0.22.0:
- Dataset: Supports data synchronization from five online sources (AWS S3, Google Drive, Notion, Confluence, and Discord)
- Dataset: RAPTOR can be built across an entire dataset or on individual documents
- Ingestion pipeline: Supports Docling document parsing in the Parser component
- Launches a new administrative Web UI dashboard for graphical user management and service status monitoring
- Agent: Supports structured output
- Agent: Supports metadata filtering in the Retrieval component
- Agent: Introduces a Variable aggregator component with data operation and session variable definition capabilities
- Upgrades RAGFlow's document engine Infinity to v0.6.5
New Features
-
Optimized Gotenberg Functions (#7)
- Enhanced document conversion capabilities
-
OceanBase Docker Configuration (#4)
- Updated OceanBase docker configuration for better deployment
-
Enhanced Search Performance (#17)
- Improved search functionality for better performance
Improvements
-
Refactored Title and Regex Based Chunk Method (#16)
- Improved chunking logic for better document processing
-
Updated Merging Logic in split_with_title_chunks (#8)
- Enhanced chunk merging algorithm
-
Simplified String Escaping (#13)
- Refactored string escaping in
get_value_strandOBConnectionfor better maintainability
- Refactored string escaping in
-
Docker Configuration and Documentation (#12)
- Updated docker configurations and README
-
Build Workflow (#9)
- Added workflow to build dev docker image
Bug Fixes
-
Fixed PowerRAG Server Timeout Error (#5)
- Resolved timeout issues in PowerRAG server
-
Fixed Image Source Lost in Smart Chunks (#11)
- Fixed issue where image sources were lost during smart chunking
-
Fixed Security Alerts and Chunk Saved Error (#19)
- Resolved security issues and chunk saving errors
Contributors
Thanks to all contributors who made this release possible:
Full Changelog: v0.1.0...v0.2.0
发布说明
v0.2.0
RAGFlow 集成
本次发布集成了 RAGFlow 从 v0.21.1 到 v0.22.1 的更新,包含以下改进:
来自 RAGFlow v0.22.1:
- Agent:支持导出 Agent 输出为 Word 或 Markdown 格式
- Agent:新增列表操作组件
- Agent:新增变量聚合器组件
- 数据源:支持 S3 兼容数据源,例如 MinIO
- 数据源:新增 JIRA 数据同步功能
- 继续重新设计个人中心页面布局
- 将 Flask Web 框架从同步升级为异步,提高并发性能,防止请求上游 LLM 服务时出现阻塞问题
来自 RAGFlow v0.22.0:
- 数据集:支持从五个在线数据源同步数据(AWS S3、Google Drive、Notion、Confluence 和 Discord)
- 数据集:RAPTOR 可以在整个数据集或单个文档上构建
- 数据摄取管道:在解析器组件中支持 Docling 文档解析
- 推出新的管理 Web UI 仪表板,用于图形化用户管理和服务状态监控
- Agent:支持结构化输出
- Agent:在检索组件中支持元数据过滤
- Agent:引入变量聚合器组件,具有数据操作和会话变量定义功能
- 将 RAGFlow 的文档引擎 Infinity 升级至 v0.6.5
新功能
-
优化 Gotenberg 功能 (#7)
- 增强文档转换能力
-
OceanBase Docker 配置 (#4)
- 更新 OceanBase docker 配置,优化部署体验
-
增强搜索性能 (#17)
- 优化搜索功能,提升性能
改进
-
重构基于标题和正则的分块方法 (#16)
- 改进分块逻辑,提升文档处理效果
-
更新 split_with_title_chunks 的合并逻辑 (#8)
- 增强分块合并算法
-
简化字符串转义 (#13)
- 重构
get_value_str和OBConnection中的字符串转义逻辑,提升可维护性
- 重构
-
Docker 配置和文档 (#12)
- 更新 docker 配置和 README
-
构建工作流 (#9)
- 新增开发版 docker 镜像构建工作流
错误修复
-
修复 PowerRAG 服务器超时错误 (#5)
- 解决 PowerRAG 服务器超时问题
-
修复智能分块中图片源丢失问题 (#11)
- 修复智能分块过程中图片源丢失的问题
-
修复安全告警和分块保存错误 (#19)
- 解决安全问题和分块保存错误
贡献者
感谢所有为本版本做出贡献的开发者:
完整更新日志: v0.1.0...v0.2.0