Skip to content

fix: enforce embedding model max token limit via proper truncation#13178

Draft
timon0305 wants to merge 1 commit intoinfiniflow:mainfrom
timon0305:fix-embedding-max-token-limit
Draft

fix: enforce embedding model max token limit via proper truncation#13178
timon0305 wants to merge 1 commit intoinfiniflow:mainfrom
timon0305:fix-embedding-max-token-limit

Conversation

@timon0305
Copy link

Summary

Fixes the embedding model max token limit not being enforced, causing API errors when input text exceeds the model's context window. This was reported across multiple providers (Ollama, vLLM, HuggingFace TEI) with models like bge-large (512 tokens), multilingual-e5-large-instruct (512 tokens), etc.

Root causes fixed

  1. LLMBundle.encode() truncated by character count instead of token counttext[:target_len] slices by character index, but max_length is in tokens. For multi-byte text (Chinese/Japanese), this still exceeds the token limit. Fixed to use truncate() which properly tokenizes and truncates at the token level.

  2. LLMBundle.encode_queries() had no truncation at all — User search queries passed directly to embedding API without any length check. Added token-level truncation using the configured max_length.

  3. 12 embedding provider classes lacked input truncation — Added truncate() calls to both encode() and encode_queries() for: OllamaEmbed, XinferenceEmbed, LocalAIEmbed, HuggingFaceEmbed, CoHereEmbed, SILICONFLOWEmbed, ReplicateEmbed, BaiduYiyanEmbed, VoyageEmbed, VolcEngineEmbed, ZhipuEmbed (encode_queries), QWenEmbed (encode_queries character slice → token truncate).

  4. Misleading tooltip for max_tokens setting — The UI tooltip said "Defaults to 512" but the actual default is 8192. Updated en/zh locales to accurately describe the field as "maximum input context length" with correct default.

Changes

  • api/db/services/llm_service.py — Fixed LLMBundle.encode() to use token-level truncate() instead of character slicing; added truncation to LLMBundle.encode_queries()
  • rag/llm/embedding_model.py — Added truncate() to 12 provider classes missing input truncation
  • web/src/locales/en.ts — Fixed maxTokensTip tooltip text and default value
  • web/src/locales/zh.ts — Fixed maxTokensTip tooltip text and default value (Chinese)

Closes #4683

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Feb 22, 2026
@dosubot
Copy link

dosubot bot commented Feb 22, 2026

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@dosubot dosubot bot added 🌈 python Pull requests that update Python code 🐞 bug Something isn't working, pull request that fix bug. 🧰 typescript Pull requests that update Typescript code labels Feb 22, 2026
@timon0305 timon0305 marked this pull request as draft February 23, 2026 14:34
@KevinHuSh KevinHuSh added the wip label Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐞 bug Something isn't working, pull request that fix bug. 🌈 python Pull requests that update Python code size:M This PR changes 30-99 lines, ignoring generated files. 🧰 typescript Pull requests that update Typescript code wip

Projects

None yet

2 participants