Gemma3 Model Token Logits Mismatch with Different max_prefill_predict_length

### Bug report

# Gemma3 Model Token Logits Mismatch with Different `max_prefill_predict_length`

## Description
When running **prefill** with the Gemma3 4B model, the output token logits differ depending on the `max_prefill_predict_length` value.  
In contrast, other models (e.g., Qwen3, LLaMA 2, LLaMA 3) produce consistent logits regardless of the prefill length.

## Steps to Reproduce
1. Load the Gemma3 4B model.
2. Use the same input prompt: The capital of France is
3. Run inference with:
- **Case 1:** `max_prefill_predict_length = 1024`, `max_target_length = 2048`
- **Case 2:** `max_prefill_predict_length = 4096`, `max_target_length = 8192`

## Observed Behavior (Gemma3 4B)

**Case 1:**
 `[null, {"The": -1.68888}, {" capital": -20.57046}, {" of": -0.15805}, {" France": -3.24782}, {" is": -1.67389}] `
 `Full output: The capital of France is Paris.\n\nParis is a global center for art `


**Case 2:**
 `[null, {"The": -1.68888}, {" capital": -20.30916}, {" of": -0.16284}, {" France": -3.24021}, {" is": -1.64998}] `
 `Full output: The capital of France is Paris�Ar¡ 3<start_of_image>--- 3 `

Note: The very first token logits matches! Remaining tokens start to divergence. Strongly suspect related to kv cache.

Note: The small cumulative logit differences lead to large divergence in generated output for longer sequences.

## Expected Behavior
Token logits for the same input prompt should remain **identical** across different `max_prefill_predict_length` values, as observed with Qwen3, LLaMA 2, and LLaMA 3 models.

## Additional Information
- **Affected Model:** Gemma3 4B
- **Verified Models (working as expected):** Qwen3, LLaMA 2, LLaMA 3


### Logs/Output

_No response_

### Environment Information

_No response_

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gemma3 Model Token Logits Mismatch with Different max_prefill_predict_length #2179

Bug report

Gemma3 Model Token Logits Mismatch with Different `max_prefill_predict_length`

Description

Steps to Reproduce

Observed Behavior (Gemma3 4B)

Expected Behavior

Additional Information

Logs/Output

Environment Information

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gemma3 Model Token Logits Mismatch with Different max_prefill_predict_length #2179

Description

Bug report

Gemma3 Model Token Logits Mismatch with Different max_prefill_predict_length

Description

Steps to Reproduce

Observed Behavior (Gemma3 4B)

Expected Behavior

Additional Information

Logs/Output

Environment Information

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Gemma3 Model Token Logits Mismatch with Different `max_prefill_predict_length`