Skip to content

UI-S1的开源评测数据集,标注的click坐标不在图标上 #241

@YokiDia

Description

@YokiDia
Image Image

https://github.com/X-PLUG/MobileAgent/blob/main/UI-S1/datasets/android_control_evaluation_std.jsonl中的第二条测试数据。
{"goal": "I'd like to add this item to my cart.", "is_successful": true, "steps": [{"action_content": {"action": "click", "coordinate": [542, 2226]}, "screenshot": "/datasets/AndroidControl/images/8197_0.png", "step_instruction": "click on the Add to cart button", "check_options": {"action": "click", "candidate_bbox": [[0, 262, 1084, 2339]], "coordinate": [542, 2226]}}, {"action_content": {"action": "wait", "time": 2}, "screenshot": "/datasets/AndroidControl/images/8197_1.png", "step_instruction": "click on the Add to cart button", "check_options": {"action": "wait", "time": 2}}, {"action_content": {"action": "wait", "time": 2}, "screenshot": "/datasets/AndroidControl/images/8197_2.png", "step_instruction": "click on the Add to cart button", "check_options": {"action": "wait", "time": 2}}], "episode_id": 8197}
此{"action": "click", "coordinate": [542, 2226]},动作对应的点击坐标不在图标上(如图1红点)
而开源的ui-s1的7B模型输出为:{"action": "click", "coordinate": [795, 2261]},动作对应的点击坐标在图标中心(如图2红点)。
请问是test数据集的问题还是别的问题?

在运行python /evaluation/eval_qwenvl.py --model_name UI-S1-7B指令评估时,有对image进行resize.
resize前后对比如下:
current_check_pam:{'action': 'click', 'coordinate': [0.5018518518518519, 0.9275], 'candidate_bbox': []}
原始图像:(width, height)= (1080,2400 )
标注数据: {"action": "click", "coordinate": [542, 2226]}

pred_action:'candidate_bbox': []} {'action': 'click', 'coordinate': [0.728021978021978, 0.938953488372093]}
resize图像:(resized_width, resized_height) = (1092,2408)
eval输出数据:{"action": "click", "coordinate": [795, 2261]}
在def check_click(click, candidate_bbox, gt_point)中返回了false. 导致针对8197_0.png的step被判为 not extract_match

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions