Skip to content

[BFCL] [Bug] the accuracy of multi-turn is lower than bfcl leardboard when using Qwen3 local model inference #1147

@Yayalelelelele

Description

@Yayalelelelele

Describe the issue
When using local Qwen-3(FC) model for inference, the multi-turn acc is lower than bfcl leardboard, while the live and non-live acc is normal.
Same issue as #1145 #1109

local model inference with Qwen3-8b-FC
Image

bfcl leardboard

Image

Reason
When inference, the prompt is not align with the training stage. the code of gorilla/berkeley-function-call-leaderboard/bfcl_eval/model_handler/local_inference/qwen_fc.py lost the content of tool call in the next round
Image

Image

Solutions
update the code of qwen_fc.py

Image

After update, the acc of multi-turn can match with the leardboard

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions