[BFCL] [Bug] the accuracy of multi-turn is lower than bfcl leardboard when using Qwen3  local model inference

**Describe the issue**
When using local Qwen-3(FC) model for inference, the multi-turn acc is lower than bfcl leardboard, while the live and non-live acc is normal.
Same issue as #1145 #1109 

local model inference with Qwen3-8b-FC
<img width="1402" height="138" alt="Image" src="https://github.com/user-attachments/assets/b7ffe36b-8136-469c-90a2-722e5947efea" />

bfcl leardboard

<img width="2142" height="904" alt="Image" src="https://github.com/user-attachments/assets/d7a18f8a-1e16-487b-8124-6347ca5826d2" />

**Reason**
When inference, the prompt is not align with the training stage. the code of gorilla/berkeley-function-call-leaderboard/bfcl_eval/model_handler/local_inference/qwen_fc.py  lost the content of tool call in the next round
<img width="1382" height="812" alt="Image" src="https://github.com/user-attachments/assets/f6adcb0d-c72a-49f0-8133-2ee2a96c4b0f" />

<img width="1902" height="1072" alt="Image" src="https://github.com/user-attachments/assets/dfba9765-5f21-4bf1-8e9e-085ab69f5f19" />


**Solutions**
update the code  of qwen_fc.py

<img width="1770" height="472" alt="Image" src="https://github.com/user-attachments/assets/4b9187ab-e5cc-4bf0-adc4-5306b3a3589a" />

After update, the acc of multi-turn can match with the leardboard

<img width="1262" height="198" alt="Image" src="https://github.com/user-attachments/assets/cf03a3c8-a77c-424c-98f4-2a6bcd001d93" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BFCL] [Bug] the accuracy of multi-turn is lower than bfcl leardboard when using Qwen3 local model inference #1147

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[BFCL] [Bug] the accuracy of multi-turn is lower than bfcl leardboard when using Qwen3 local model inference #1147

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions