repo owner: Danni (Danqing) Zhang ([email protected])
- [2024-10-01] Completed a major refactoring of LiteWebAgent to make it flexible for importing the package, enabling the addition of web browsing capabilities to any AI agent.
- [2024-09-20] We reimplemented the paper Tree Search for Language Model Agents in the LiteWebAgent framework. Now, the search agent is capable of exploring different trajectories for accomplishing web browsing tasks and returning the most promising one. This is useful for finding the optimal path to complete complex web browsing tasks in an offline manner.
- [2024-08-22] The initial version of LiteWebAgent was released, providing a robust framework for using natural language to control a web agent.
From PyPI: https://pypi.org/project/litewebagent/
pip install litewebagent
Then, a required step is to setup playwright by running
playwright install chromium
Test playwright & chromium installation by running this script
python test_installation.py
Then please create a .env file, and update your API keys:
cp .env.example .env
You are ready to go! Try FunctionCallingAgent on google.com
python examples/google_test.py
Set up locally
First set up virtual environment, and allow your code to be able to see 'litewebagent'
python3 -m venv venv
. venv/bin/activate
pip install -e .
Then please create a .env file, and update your API keys:
cp .env.example .env
Test playwright & chromium installation by running this script
python /Users/danqingzhang/Desktop/test_installation.py
- use prompting-based web agent to finish some task and save the workflow
python -m prompting_main --agent_type PromptAgent --starting_url https://www.google.com --goal 'search dining table' --plan 'search dining table' --log_folder log
- we also provide function-calling-based web agent
python -m function_calling_main --agent_type FunctionCallingAgent --starting_url https://www.google.com --goal 'search dining table' --plan 'search dining table' --log_folder log
python -m function_calling_main --agent_type HighLevelPlanningAgent --starting_url https://www.google.com --goal 'search dining table' --plan 'search dining table' --log_folder log
python -m function_calling_main --agent_type ContextAwarePlanningAgent --starting_url https://www.google.com --goal 'search dining table' --plan 'search dining table' --log_folder log
https://www.loom.com/share/1018bcc4e21c4a7eb517b60c2931ee3c https://www.loom.com/share/aa48256478714d098faac740239c9013 https://www.loom.com/share/89f5fa69b8cb49c8b6a60368ddcba103
- replay the workflow verified by the web agent If you haven't used the web agent to try any tests yet, first copy our example.json file.
cp log/flow/example.json log/flow/steps.json
then you can replay the session
python litewebagent/action/replay.py --log_folder log
- enable user agent interaction
python -m cli_main --agent_type FunctionCallingAgent --log_folder log
python -m cli_main --agent_type HighLevelPlanningAgent --log_folder log
python -m cli_main --agent_type PromptAgent --log_folder log
https://www.loom.com/share/93e3490a6d684cddb0fbefce4813902a
We use axtree by default. Alternatively, you can provide a comma-separated string listing the desired input feature types.
python -m function_calling_main --agent_type FunctionCallingAgent --starting_url https://www.airbnb.com --goal 'set destination as San Francisco, then search the results' --plan '(1) enter the "San Francisco" as destination, (2) and click search' --log_folder log
python -m function_calling_main --agent_type FunctionCallingAgent --starting_url https://www.airbnb.com --goal 'set destination as San Francisco, then search the results' --plan '(1) enter the "San Francisco" as destination, (2) and click search' --features interactive_elements --log_folder log
python -m function_calling_main --agent_type FunctionCallingAgent --starting_url https://www.airbnb.com --goal 'set destination as San Francisco, then search the results' --plan '(1) enter the "San Francisco" as destination, (2) and click search' --features axtree,interactive_elements --log_folder log
python -m search_main --agent_type PromptSearchAgent --starting_url https://www.google.com --goal 'search dining table' --plan 'search dining table' --search_algorithm 'bfs' --log_folder log
https://www.loom.com/share/986f0addf10949d88ae25cd802588a85
Paper | Agent |
---|---|
SoM (Set-of-Mark) Agent | PromptAgent |
Mind2Web | ContextAwarePlanningAgent |
Tree Search for Language Model Agents | PromptSearchAgent |
@misc{zhang2024litewebagent,
title={LiteWebAgent: The Library for LLM-based web-agent applications},
author={Zhang, Danqing and Rama, Balaji and He, Shiying and Ni, Jingyi},
journal={https://github.com/PathOnAI/LiteWebAgent},
year={2024}
}