ArtAgents is a prototype framework designed for artists, designers, and creators to experiment with LLM-based prompt engineering and creative content generation. It leverages Ollama for local model serving, allowing users to interact with various text and multimodal models through specialized AI 'agents' and structured, configurable workflows ("Teams").
Select predefined agents, load custom agents, or utilize multi-agent "Teams" to generate detailed prompts, descriptions, image captions, or other text outputs. Provide text instructions and optionally images as input. Fine-tune generation using Ollama API parameters, prompt style limiters, and agent presets. Experiment systematically using the Sweep feature and manage image captions directly within the application.
- (v0.9.5-alpha, October 2025)
This project has undergone a significant technical upgrade and has been enhanced with new creative capabilities.
The application has been successfully migrated from the legacy Gradio 3.x framework to a modern Gradio 5.x+ version, refactoring the user interface and event handling system.
- Enhanced Security: The app now operates under Gradio's secure file-access model.
- Improved Performance & Stability: The new version provides a more robust and performant foundation for future development.
To boost the experimental and artistic value of experiments, we have implemented several new creative assembly strategies. These move beyond simple description towards transformative and conceptually-driven prompt engineering.
Three new teams have been added to agent_teams.json to leverage these strategies:
-
Creative - Metaphorical Vision
- Strategy:
metaphorical_synthesis - How it works: This team gathers rich, multi-sensory input (mood, color, texture) and reinterprets it through a dynamically chosen creative metaphor. It's excellent for generating abstract and evocative results that break creative blocks.
- Strategy:
-
Creative - Hybrid Concept Factory
- Strategy:
conceptual_blend - How it works: This team is built to create productive conflict by forcing the final agent to blend three distinct concepts: a concrete object, a broad world/style, and an abstract theme. This structure is a recipe for generating genuinely unique and unexpected ideas.
- Strategy:
-
Creative - Themed Content Writer
- Strategy:
stylistic_mashup - How it works: This strategy separates what is being described from how it is described. The team builds a complete, detailed picture of a scene's content, and the final synthesis step reframes it by rewriting the entire prompt in a dynamically chosen literary or textual style, creating a powerful juxtaposition between substance and style.
- Strategy:
Core Functionality:
- Ollama Integration: Connects to a running Ollama instance to utilize locally served LLMs (text & multimodal) with startup check.
- Agent System: Define and use specialized agents (Designer, Photographer, Styler, etc.) with unique instructions and optional API overrides (
agent_roles.json,custom_agent_roles.json). - Agent Team / Workflow Execution: Define (
agent_teams.json) and run multi-step agent sequences ("Teams"). Supports sequential execution with context passing and multiple result assembly strategies (concatenate,refine_last,summarize_all,structured_concatenate, and other innovative experimental strategies). - Team Editor: Create, edit, save, and delete Agent Teams via a dedicated UI tab.
- Chat Interface: Main tab for direct interaction with selected agents or teams, including session history and response refinement.
- Multimodal Input: Supports single image upload or processing images within a specified folder for chat or captioning context.
- Image Captioning: Dedicated tab to load images from a folder, view/edit associated
.txtcaption files, save changes, and generate captions using selected agents/teams and vision models. - Experiment Sweeps: Systematically run base prompts across multiple selected Agent Teams and Worker Models. Saves detailed JSON protocol files for each run and separate
.txtfiles containing the raw generated prompts per model. - Configuration Management: External JSON files for easy customization of settings, models, limiters, API profiles, agent roles, and agent teams.
- App Settings UI: Dedicated tab to configure Ollama URL, agent loading preferences, default behaviors, UI theme, and detailed Ollama API parameters (with loadable profiles).
- Persistent History: Logs all single interactions and detailed workflow steps to
core/history.json, viewable and clearable in the "Full History" tab. - Utilities: Copy-to-clipboard for responses, optional prompt artifact cleaning, model release functions, contextual help tooltips, setup scripts.
- Modular Codebase: Organized structure (
core,agents,ui) for maintainability.
ArtAgent/
│
├── app.py # Main Gradio App: UI Structure, Event Wiring, State Mgmt
├── requirements.txt # Python Dependencies (Consider migrating to pyproject.toml/Poetry)
├── settings.json # App Config: Ollama URL, defaults, global API opts, theme
├── models.json # Ollama models known to the app (name, vision)
├── limiters.json # Prompt style limiters (name, tokens, format string)
├── ollama_profiles.json # Presets for Ollama API options
├── agent_teams.json # Stores PREDEFINED & USER-SAVED Agent Team/Workflow definitions
│
├── agents/ # --- Agent Logic & Definitions ---
│ ├── __init__.py
│ ├── roles_config.py # Logic to load/merge roles
│ ├── ollama_agent.py # Interacts with Ollama API (get_llm_response)
│ ├── agent_roles.json # Default agent definitions
│ ├── custom_agent_roles.json # User's custom persistent agents
│ └── examples/ # --- Optional: Example Agent Files ---
│ └── *.json
│
├── core/ # --- Core Logic & Utilities ---
│ ├── __init__.py
│ ├── app_logic.py # Callback logic functions (router, UI callbacks)
│ ├── refinement_logic.py # Logic for comment/refinement feature
│ ├── agent_manager.py # Orchestrates Agent Team Workflows
│ ├── captioning_logic.py # Logic for caption editing & generation
│ ├── history_manager.py # Loads/saves persistent history
│ ├── ollama_checker.py # Ollama startup check logic
│ ├── ollama_manager.py # Ollama model release logic
│ ├── sweep_manager.py # Logic for running experiment sweeps
│ ├── utils.py # Common utilities (JSON loading, cleaning etc.)
│ ├── help_content.py # Stores help text for UI
│ └── history.json # Persistent history data file
│
├── ui/ # --- UI Tab Definitions (Gradio components) ---
│ ├── __init__.py
│ ├── chat_tab.py
│ ├── captions_tab.py # UI for caption editing & generation
│ ├── team_editor_tab.py # UI for editing teams
│ ├── sweep_tab.py # UI for experiment sweeps
│ ├── history_tab.py
│ ├── info_tab.py # Consolidated info tab (replaces roles_tab.py)
│ ├── app_settings_tab.py
│ └── common_ui_elements.py
│
├── scripts/ # --- Utility & Setup Scripts ---
│ ├── (Batch files: setup.bat, setupvenv.bat, go.bat, govenv.bat)
│ └── full_project_creator.py
│ └── (Optional: .sh equivalents)
│
├── docs/ # --- Detailed Documentation ---
│ ├── index.md # Overview (Placeholder)
│ ├── user-guide.md # User manual (Placeholder)
│ ├── architecture.md # System design (Placeholder)
│ └── api.md # Core function details (Placeholder, Optional)
│
├── sweep_runs/ # Default Output folder for Sweep Protocols (add to .gitignore)
│
├── tests/ # --- Automated Tests ---
│ ├── __init__.py
│ └── test_agent.py # Example tests (Needs Expansion)
│ └── (Placeholder: other test files)
│
├── .gitignore
└── README.md # This file
- Install Ollama: Download and install from ollama.com. Ensure the
ollamacommand is available in your terminal. - Clone Repository:
git clone https://github.com/sandner-art/ArtAgents.gitand navigate into theArtAgentdirectory (cd ArtAgent). - Setup Python Environment (Recommended):
- Using Venv (Manual): Create and activate a virtual environment (Python 3.9+ recommended, 3.10+ required for potential Gradio 5 upgrade).
Then install requirements:
python -m venv venv # On Windows: .\venv\Scripts\activate # On Linux/macOS: source venv/bin/activate
pip install --upgrade pip pip install -r requirements.txt
- (Alternative) Using Scripts: Run
.\scripts\setupvenv.bat(Windows) or equivalent.shscript to automate venv creation andpip install. - (Future) Using Poetry: If Poetry is implemented, replace step 3 with
poetry install.
- Using Venv (Manual): Create and activate a virtual environment (Python 3.9+ recommended, 3.10+ required for potential Gradio 5 upgrade).
- Setup Ollama Models: Run
.\scripts\setup.bat(Windows) or equivalent.shscript. This checks Ollama connectivity and downloads recommended models listed inmodels.json. Alternatively, useollama pull <model_name>manually for desired models. - Configure (Optional): Review and edit JSON files (
settings.json,models.json,agent_teams.json, etc.) to customize the application.
- Start Ollama Service: Ensure the Ollama service is running (e.g., launch the Ollama Desktop application or run
ollama servein a separate terminal). - Activate Environment: If using a virtual environment, activate it (
source venv/bin/activateor.\venv\Scripts\activate). - Run ArtAgents:
- If using venv:
python app.py - Using Scripts:
.\scripts\govenv.bat(Windows) or equivalent.shscript. - (Future) Using Poetry:
poetry run python app.py
- If using venv:
- Access UI: Open the local URL provided in the console (usually
http://127.0.0.1:7860) in your web browser.
For more detailed information, please refer to the documents in the /docs directory:
/docs/user-guide.md/docs/architecture.md
Phase 0: Stabilization & Core Refinement (Complete)
- Agent Captioning functionality stabilized.
- Agent Team Editor implemented and stabilized.
- Core assembly strategies (
concatenate,refine_last,summarize_all,structured_concatenate) tested. - Sweep output format implemented (per-model
.txtprompt files + JSON protocols). - Optional prompt artifact cleaner added.
- Copy-to-clipboard button added.
- Consolidated "Info" tab implemented.
- Error handling reviewed and improved.
- Gradio 5.x Upgrade: Evaluate and execute upgrade from Gradio 3.x.
Phase 1: Foundational Expansion & Modernization (Current Focus)
- Implement Select Novel Synthesis Strategies: Add 2-3 creative strategies (e.g., Metaphorical Synthesis, Conceptual Blending) to
agent_manager.pyand Team Editor UI. - NLP Library Integration (
nlpaug): Integrate for noise/synonym capabilities within strategies or as agent steps. - Unit Testing Expansion: Write comprehensive
pytesttests for core logic and new features.
Future / Planned Enhancements (Phase 2+):
- Advanced Agent Teams (Hierarchical agents, conditional logic, feedback loops).
- Advanced Experimentation (Parameter sweeping via Hydra, potentially MLFlow integration).
- Direct Image Generation API Integration (e.g., ComfyUI, A1111).
- Workflow Visualization.
- Enhanced UI/UX (Improved Team Editor, potential Gradio custom components).
- Explainability / XAI Features.
- More Novel Synthesis Strategies & NLP features.
- Hydra Integration: Migrate
.jsonconfigurations to Hydra (.yaml) for improved experiment management.
Contributions are welcome! Please refer to CONTRIBUTING.md for guidelines on reporting issues, suggesting features, or submitting pull requests.
ArtAgents by Daniel Sandner © 2024 - 2025. Adapt and use creatively. No guarantees provided. MIT LICENSE.


