This repo provides a user-friendly web interface for interacting with the Llama-3.2-11B-Vision model, which generates text responses from image and text prompts.
-
Get a Hugging Face Token
- Sign up for an account here.
- Get a huggingface token to access llama3.2-11b-vision model.
-
Project Setup
- Clone the repository:
git clone https://github.com/spacewalk01/llama3.2-vision-webui.git cd llama3.2-vision-webui
- Install dependencies:
pip install -r requirements.txt
- Clone the repository:
-
Run the Application
- Start the Gradio interface by running:
python main.py --token Your_Hugging_Face_Token
- Access the local URL to upload images and prompts, and view the Llama 3.2 Vision model's responses.
- Start the Gradio interface by running:
This project is licensed under the MIT License. See the LICENSE file for details.