Skip to content

feat: Add ModelsLab LLM provider#4508

Open
adhikjoshi wants to merge 1 commit intocrewAIInc:mainfrom
adhikjoshi:ml
Open

feat: Add ModelsLab LLM provider#4508
adhikjoshi wants to merge 1 commit intocrewAIInc:mainfrom
adhikjoshi:ml

Conversation

@adhikjoshi
Copy link

@adhikjoshi adhikjoshi commented Feb 18, 2026

ModelsLab Provider for CrewAI

ModelsLab + CrewAI
Python 3.10+
License: MIT

A comprehensive multi-modal LLM provider for CrewAI that integrates ModelsLab's powerful AI APIs, enabling your agents to generate text, images, videos, and audio content seamlessly within their workflows.

🚀 Key Features

  • 🤖 Full CrewAI Compatibility: Seamless integration with CrewAI's agent framework
  • 🎨 Multi-Modal Generation: First CrewAI provider supporting text, image, video, and audio generation
  • 🧠 Intelligent Content Detection: Automatically detects when agents need multi-modal content
  • 🛠️ Function Calling Support: Works with CrewAI tools and custom functions
  • ⚡ Async Processing: Handles ModelsLab's async endpoints with intelligent polling
  • 🔧 Flexible Configuration: Easy setup for text-only or full multi-modal capabilities
  • 💰 Cost-Effective: Leverage ModelsLab's competitive pricing for enterprise workloads

📦 Installation

pip install -r requirements.txt

Or install individually:

pip install crewai requests typing-extensions

🔑 Setup

  1. Get your ModelsLab API key from modelslab.com
  2. Set your environment variable:
    export MODELSLAB_API_KEY="your_api_key_here"

🎯 Quick Start

Basic Text Generation Agent

from crewai import Agent, Task, Crew
from modelslab_llm import ModelsLabLLM

# Initialize ModelsLab LLM
llm = ModelsLabLLM(
    api_key="your_modelslab_api_key",
    model="gpt-4",
    temperature=0.7
)

# Create an agent
agent = Agent(
    role="Research Specialist",
    goal="Conduct thorough research on given topics",
    backstory="You are an experienced researcher.",
    llm=llm
)

# Create and run a task
task = Task(
    description="Research the latest AI developments in 2024",
    expected_output="A comprehensive research summary",
    agent=agent
)

crew = Crew(agents=[agent], tasks=[task])
result = crew.kickoff()
print(result.raw)

Multi-Modal Creative Agent

from modelslab_llm import create_modelslab_multimodal_llm

# Create multi-modal LLM
llm = create_modelslab_multimodal_llm(
    api_key="your_api_key",
    model="gpt-4",
    temperature=0.8
)

# Creative agent that can generate images, videos, and audio
creative_agent = Agent(
    role="Creative Content Creator",
    goal="Create engaging multimedia content",
    backstory="You are a creative professional.",
    llm=llm
)

# Task that triggers multi-modal generation
task = Task(
    description=\"\"\"
    Create a marketing campaign for our new AI app:
    1. Write compelling copy
    2. Generate a product hero image
    3. Create a demo video concept
    \"\"\",
    expected_output="Complete multimedia marketing materials",
    agent=creative_agent
)

crew = Crew(agents=[creative_agent], tasks=[task])
result = crew.kickoff()

🎨 Multi-Modal Capabilities

The ModelsLab provider automatically detects when your agents need multi-modal content:

🖼️ Image Generation

# Agent automatically generates images when requested
result = agent.execute("Generate an image of a futuristic city skyline")
# Returns: "I've generated an image for you: https://modelslab.com/output/image.jpg"

🎬 Video Creation

# Agent creates videos from text descriptions
result = agent.execute("Create a video showing our product demo")
# Returns: "I've generated a video for you: https://modelslab.com/output/video.mp4"

🔊 Audio Generation

# Agent generates speech and audio content
result = agent.execute("Generate audio narration for our presentation")
# Returns: "I've generated audio for you: https://modelslab.com/output/audio.mp3"

⚙️ Advanced Configuration

Custom Configuration Options

llm = ModelsLabLLM(
    api_key="your_key",
    model="gpt-4",
    temperature=0.7,
    max_tokens=2048,
    base_url="https://modelslab.com/api/v6",  # Custom endpoint
    timeout=120,  # Request timeout
    enable_multimodal=True,  # Enable multi-modal capabilities
)

Text-Only Mode

from modelslab_llm import create_modelslab_chat_llm

# Create text-only LLM (faster, lower cost)
llm = create_modelslab_chat_llm(
    api_key="your_key",
    model="gpt-3.5-turbo"
)

Multiple Agents with Different Capabilities

# Text-focused analyst
analyst_llm = ModelsLabLLM(
    api_key="your_key",
    model="gpt-4",
    temperature=0.3,  # Lower temperature for analysis
    enable_multimodal=False
)

# Creative multi-modal designer
designer_llm = create_modelslab_multimodal_llm(
    api_key="your_key",
    temperature=0.9  # Higher creativity
)

analyst = Agent(llm=analyst_llm, role="Data Analyst", ...)
designer = Agent(llm=designer_llm, role="Visual Designer", ...)

# Collaborative workflow
crew = Crew(agents=[analyst, designer], tasks=[...])

🛠️ Function Calling & Tools

ModelsLab LLM supports CrewAI's function calling:

def search_web(query: str) -> str:
    return f"Search results for: {query}"

tools = [{
    "function": {
        "name": "search_web",
        "description": "Search the web",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"}
            },
            "required": ["query"]
        }
    }
}]

agent = Agent(
    llm=llm,
    tools=tools,
    ...
)

# Agent can now use tools + generate multi-modal content

📚 Examples

Explore comprehensive examples in examples.py:

  • Basic Text Agent: Simple research and analysis
  • Multi-Modal Creative Agent: Content creation with images/videos
  • Collaborative Agents: Multiple agents with different capabilities
  • Agent with Tools: Combining function calling with multi-modal generation
  • Advanced Workflows: Complex multi-step multimedia production

Run examples:

python examples.py

🧪 Testing

Run the test suite:

pytest test_modelslab_llm.py -v

The test suite covers:

  • ✅ LLM initialization and configuration
  • ✅ Text generation functionality
  • ✅ Multi-modal content detection and generation
  • ✅ Function calling and tool integration
  • ✅ Error handling and edge cases
  • ✅ CrewAI agent integration

🔧 Supported Models & Endpoints

Text Generation

  • Models: GPT-4, GPT-3.5-turbo, Claude-3, Claude-2, and more
  • Endpoint: /uncensored_chat (OpenAI-compatible)

Image Generation

  • Models: Flux, Stable Diffusion, Community models
  • Endpoint: /images/text2img
  • Features: Text-to-image, custom dimensions, style control

Video Generation

  • Models: Zeroscope, Runway, and more
  • Endpoint: /video/text2video
  • Features: Text-to-video, configurable length and quality

Audio Generation

  • Models: ElevenLabs multilingual, voice cloning
  • Endpoint: /tts
  • Features: Text-to-speech, custom voices, audio effects

🌟 Why Choose ModelsLab for CrewAI?

🏆 First Multi-Modal Provider

  • Only CrewAI provider supporting comprehensive multi-modal generation
  • Agents can create visual presentations, video demos, and audio content
  • Seamless switching between content types based on context

💸 Cost-Effective Enterprise Solution

  • Competitive pricing compared to multiple provider setups
  • Single API key for all content types
  • Transparent, usage-based billing

🚀 Production-Ready

  • Built for enterprise-scale agentic workflows
  • Robust async processing and error handling
  • Comprehensive monitoring and observability

🔧 Developer-Friendly

  • Drop-in replacement for existing CrewAI LLMs
  • Extensive documentation and examples
  • Active community support

📖 API Reference

ModelsLabLLM

Main class for CrewAI integration.

ModelsLabLLM(
    api_key: str,                    # ModelsLab API key (required)
    model: str = "gpt-4",           # Model for text generation  
    temperature: float = 0.7,        # Sampling temperature
    max_tokens: int = None,         # Max tokens to generate
    base_url: str = "...",          # API base URL
    timeout: int = 120,             # Request timeout
    enable_multimodal: bool = True, # Enable multi-modal features
    **kwargs                        # Additional parameters
)

Methods

  • call(messages, tools, callbacks, available_functions): Main generation method
  • supports_function_calling(): Returns True (supports CrewAI tools)
  • supports_stop_words(): Returns True (supports stop sequences)
  • get_context_window_size(): Returns model context window size

Convenience Functions

# Text-only LLM
create_modelslab_chat_llm(api_key, model="gpt-4", **kwargs)

# Full multi-modal LLM  
create_modelslab_multimodal_llm(api_key, model="gpt-4", **kwargs)

🤝 Contributing

We welcome contributions! Here's how to get started:

Development Setup

git clone <repository_url>
cd crewai-modelslab
pip install -r requirements.txt

Running Tests

pytest test_modelslab_llm.py -v --cov=modelslab_llm

Code Style

pip install black isort flake8
black modelslab_llm.py
isort modelslab_llm.py
flake8 modelslab_llm.py

Contributing Guidelines

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Write tests for your changes
  4. Ensure all tests pass: pytest
  5. Follow code style: Run black and isort
  6. Commit your changes: git commit -m 'Add amazing feature'
  7. Push to the branch: git push origin feature/amazing-feature
  8. Open a Pull Request

🐛 Issue Reporting

Found a bug? Have a feature request? Please open an issue with:

  • Clear description of the problem or feature
  • Steps to reproduce (for bugs)
  • Expected vs actual behavior
  • Environment details (Python version, CrewAI version, etc.)
  • Code samples demonstrating the issue

📋 Roadmap

  • LiteLLM Integration: Add ModelsLab as a native LiteLLM provider
  • Streaming Support: Real-time response streaming
  • Advanced Multi-Modal: Image-to-video, video-to-video workflows
  • Fine-Tuning: Support for custom model fine-tuning
  • Caching: Response caching for improved performance
  • Monitoring: Built-in metrics and observability

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Links

💬 Community & Support


Built with ❤️ for the AI agent community

Transform your CrewAI agents into multi-modal powerhouses with ModelsLab's comprehensive AI capabilities.


Note

Medium Risk
Introduces a new external-API-backed LLM provider with synchronous polling and heuristic keyword routing, which could affect reliability/latency and error handling but doesn’t modify existing core logic.

Overview
Adds a new ModelsLabLLM provider implementing BaseLLM.call() and routing requests to ModelsLab’s API for standard chat (/uncensored_chat).

When enable_multimodal is on, the provider keyword-detects image/video/audio requests and calls the corresponding ModelsLab endpoints (with async polling support) and also includes a JSON-based, prompt-driven tool/function-calling shim that executes available_functions and feeds results back into the chat.

Written by Cursor Bugbot for commit c55d2c0. This will update automatically on new commits. Configure here.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c55d2c088f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +92 to +96
messages: Union[str, List[Dict[str, str]]],
tools: Optional[List[dict]] = None,
callbacks: Optional[List[Any]] = None,
available_functions: Optional[Dict[str, Any]] = None,
) -> Union[str, Any]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Accept executor kwargs in ModelsLabLLM.call

The call signature is incompatible with CrewAI's executor path: get_llm_response invokes llm.call(..., from_task=..., from_agent=..., response_model=...) (lib/crewai/src/crewai/utilities/agent_utils.py), but this method does not accept those keyword arguments. In normal Agent/Crew runs with this provider, Python raises TypeError for unexpected kwargs before any API request, so the provider cannot be used through the standard runtime flow.

Useful? React with 👍 / 👎.

Comment on lines +124 to +125
if tools and available_functions:
return self._handle_function_calling(messages, tools, available_functions)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle native tool mode when available_functions is None

Native tool execution in CrewAI passes tool schemas with available_functions=None so the model returns tool calls for the executor to run (_invoke_loop_native_tools in crew_agent_executor.py). This implementation only enters tool handling when both tools and available_functions are set, so in native mode it silently skips tool-calling logic and does plain text generation even though supports_function_calling() returns True, preventing tool-enabled agents from emitting executable tool calls.

Useful? React with 👍 / 👎.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

tools: Optional[List[dict]] = None,
callbacks: Optional[List[Any]] = None,
available_functions: Optional[Dict[str, Any]] = None,
) -> Union[str, Any]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing call() parameters causes runtime TypeError

High Severity

The call() override is missing the from_task, from_agent, and response_model parameters that BaseLLM.call() declares and that CrewAI's agent_utils.py passes when invoking llm.call(...). Since the method also lacks a **kwargs catchall, this will raise a TypeError (got an unexpected keyword argument 'from_task') every time an agent tries to use this provider, making it completely non-functional.

Fix in Cursor Fix in Web

if attempt >= max_attempts - 1:
raise RuntimeError(f"Failed to fetch {content_type} result: {str(e)}")
time.sleep(10)
attempt += 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polling swallows "failed" status RuntimeError, retries needlessly

Medium Severity

When the async poll returns "status": "failed", the code raises a RuntimeError on line 333, but the broad except Exception block on line 339 immediately catches it. Instead of propagating the failure, the method sleeps and retries the already-failed request up to 30 times (~5 minutes of wasted polling) before eventually raising a generic error that loses the original failure message.

Fix in Cursor Fix in Web

elif any(keyword in latest_message for keyword in video_keywords):
return self._generate_video(latest_message)
elif any(keyword in latest_message for keyword in audio_keywords):
return self._generate_audio(latest_message)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overly broad keywords cause false multimodal triggers

High Severity

The multimodal keyword lists include extremely common words like "draw", "picture", "render", "show me", "say", "speak", "voice", "sound", "video", "clip", "film". Combined with substring matching (in), normal text prompts such as "draw conclusions from…", "say more about…", "show me the analysis…", or "render a verdict" will incorrectly trigger expensive multimodal API calls instead of text generation, producing unusable responses for routine agent tasks.

Fix in Cursor Fix in Web

messages.extend([
{"role": "assistant", "content": f"I'll use the {func_name} function."},
{"role": "function", "name": func_name, "content": str(result)}
])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function calling mutates caller's messages list in-place

Medium Severity

_handle_function_calling calls messages.extend(...) which mutates the original messages list passed into call(). Since lists are passed by reference, this silently modifies the caller's message history with assistant/function entries, potentially corrupting the conversation state for subsequent calls or retry logic in the CrewAI framework.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments