fix: enforce embedding model max token limit via proper truncation by timon0305 · Pull Request #13178 · infiniflow/ragflow

timon0305 · 2026-02-22T20:39:09Z

Summary

Fixes the embedding model max token limit not being enforced, causing API errors when input text exceeds the model's context window. This was reported across multiple providers (Ollama, vLLM, HuggingFace TEI) with models like bge-large (512 tokens), multilingual-e5-large-instruct (512 tokens), etc.

Root causes fixed

LLMBundle.encode() truncated by character count instead of token count — text[:target_len] slices by character index, but max_length is in tokens. For multi-byte text (Chinese/Japanese), this still exceeds the token limit. Fixed to use truncate() which properly tokenizes and truncates at the token level.
LLMBundle.encode_queries() had no truncation at all — User search queries passed directly to embedding API without any length check. Added token-level truncation using the configured max_length.
12 embedding provider classes lacked input truncation — Added truncate() calls to both encode() and encode_queries() for: OllamaEmbed, XinferenceEmbed, LocalAIEmbed, HuggingFaceEmbed, CoHereEmbed, SILICONFLOWEmbed, ReplicateEmbed, BaiduYiyanEmbed, VoyageEmbed, VolcEngineEmbed, ZhipuEmbed (encode_queries), QWenEmbed (encode_queries character slice → token truncate).
Misleading tooltip for max_tokens setting — The UI tooltip said "Defaults to 512" but the actual default is 8192. Updated en/zh locales to accurately describe the field as "maximum input context length" with correct default.

Changes

api/db/services/llm_service.py — Fixed LLMBundle.encode() to use token-level truncate() instead of character slicing; added truncation to LLMBundle.encode_queries()
rag/llm/embedding_model.py — Added truncate() to 12 provider classes missing input truncation
web/src/locales/en.ts — Fixed maxTokensTip tooltip text and default value
web/src/locales/zh.ts — Fixed maxTokensTip tooltip text and default value (Chinese)

Closes #4683

dosubot · 2026-02-22T20:39:17Z

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

Fix embedding model max token limit enforcement

d25e0dc

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Feb 22, 2026

dosubot bot added 🌈 python Pull requests that update Python code 🐞 bug Something isn't working, pull request that fix bug. 🧰 typescript Pull requests that update Typescript code labels Feb 22, 2026

timon0305 marked this pull request as draft February 23, 2026 14:34

KevinHuSh added the wip label Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: enforce embedding model max token limit via proper truncation#13178

fix: enforce embedding model max token limit via proper truncation#13178
timon0305 wants to merge 1 commit intoinfiniflow:mainfrom
timon0305:fix-embedding-max-token-limit

timon0305 commented Feb 22, 2026

Uh oh!

dosubot bot commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timon0305 commented Feb 22, 2026

Summary

Root causes fixed

Changes

Uh oh!

dosubot bot commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants