Skip to content

A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.

License

Notifications You must be signed in to change notification settings

bytedance/UI-TARS-desktop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UI-TARS

UI-TARS Desktop

UI-TARS Desktop is a GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language.

   📑 Paper    | 🤗 Hugging Face Models   |
🖥️ Desktop Application    |    👓 Midscene (use in browser)

Showcases

Instruction Video
Get the current weather in SF using the web browser
new_mac_action_weather.mp4
Send a twitter with the content "hello world"
new_send_twitter_windows.mp4

Features

  • 🤖 Natural language control powered by Vision-Language Model
  • 🖥️ Screenshot and visual recognition support
  • 🎯 Precise mouse and keyboard control
  • 💻 Cross-platform support (Windows/MacOS)
  • 🔄 Real-time feedback and status display

Quick Start

Download

You can download the latest release version of UI-TARS Desktop from our releases page.

Install

MacOS

  1. Drag UI TARS application into the Applications folder

  1. Enable the permission of UI TARS in MacOS:
  • System Settings -> Privacy & Security -> Accessibility
  • System Settings -> Privacy & Security -> Screen Recording

  1. Then open UI TARS application, you can see the following interface:

Note: If app broken, you can use sudo xattr -dr com.apple.quarantine /Applications/UI\ TARS.app in Terminal to fix it.

Windows

Still to run the application, you can see the following interface:

Settings

VLM (Vision-Language Model)

Support HuggingFace(Cloud) and Ollama(Local) deployment.

We recommend using HuggingFace Inference Endpoints for fast deployment. We provide two docs for users to refer:

GUI Model Deployment Guide

Note: VLM Base Url is OpenAI compatible API endpoints (see OpenAI API protocol document for more details).

Development

Just simple two steps to run the application:

pnpm install
pnpm run dev

System Requirements

  • Node.js >= 20
  • Supported Operating Systems:
    • Windows 10/11
    • macOS 10.15+

License

UI-TARS Desktop is licensed under the Apache License 2.0.

Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝

@article{uitars2025,
  author    = {Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, Wanjun Zhong, Kuanye Li, Jiale Yang, Yu Miao, Woyu Lin, Longxiang Liu, Xu Jiang, Qianli Ma, Jingyu Li, Xiaojun Xiao, Kai Cai, Chuang Li, Yaowei Zheng, Chaolin Jin, Chen Li, Xiao Zhou, Minchao Wang, Haoli Chen, Zhaojian Li, Haihua Yang, Haifeng Liu, Feng Lin, Tao Peng, Xin Liu, Guang Shi},
  title     = {UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
  journal   = {arXiv preprint arXiv:2501.12326},
  url       = {https://github.com/bytedance/UI-TARS},
  year      = {2025}
}