Skip to content

Gemma3 Model Token Logits Mismatch with Different max_prefill_predict_length #2179

@babyplutokurt

Description

@babyplutokurt

Bug report

Gemma3 Model Token Logits Mismatch with Different max_prefill_predict_length

Description

When running prefill with the Gemma3 4B model, the output token logits differ depending on the max_prefill_predict_length value.
In contrast, other models (e.g., Qwen3, LLaMA 2, LLaMA 3) produce consistent logits regardless of the prefill length.

Steps to Reproduce

  1. Load the Gemma3 4B model.
  2. Use the same input prompt: The capital of France is
  3. Run inference with:
  • Case 1: max_prefill_predict_length = 1024, max_target_length = 2048
  • Case 2: max_prefill_predict_length = 4096, max_target_length = 8192

Observed Behavior (Gemma3 4B)

Case 1:
[null, {"The": -1.68888}, {" capital": -20.57046}, {" of": -0.15805}, {" France": -3.24782}, {" is": -1.67389}]
Full output: The capital of France is Paris.\n\nParis is a global center for art

Case 2:
[null, {"The": -1.68888}, {" capital": -20.30916}, {" of": -0.16284}, {" France": -3.24021}, {" is": -1.64998}]
Full output: The capital of France is Paris�Ar¡ 3<start_of_image>--- 3

Note: The very first token logits matches! Remaining tokens start to divergence. Strongly suspect related to kv cache.

Note: The small cumulative logit differences lead to large divergence in generated output for longer sequences.

Expected Behavior

Token logits for the same input prompt should remain identical across different max_prefill_predict_length values, as observed with Qwen3, LLaMA 2, and LLaMA 3 models.

Additional Information

  • Affected Model: Gemma3 4B
  • Verified Models (working as expected): Qwen3, LLaMA 2, LLaMA 3

Logs/Output

No response

Environment Information

No response

Additional Context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions