A Chrome extension that enables you to ask questions about any website you're visiting. The extension crawls web pages, parses and interprets their content, and allows you to chat with the site's content.
- Ask questions about any website's content in natural language
- Smart web crawling that respects site structure and robots.txt
- Chat-like interface with streaming responses
- Context-aware responses with references to specific pages
- Maintains conversation history per domain
- Real-time streaming responses
- Highly configurable crawling parameters
- Secure API key management
- Markdown formatting support in responses
- Automatic link references in answers
-
Initial Scan: When you click "Start Crawl", the extension:
- Validates your API keys
- Checks the current domain
- Initializes a new conversation history
- Starts the Firecrawl process
-
Progress Tracking:
- Shows a progress bar with real-time updates
- Displays number of pages crawled
- Indicates crawling status and completion
-
Content Processing:
- Converts HTML to clean markdown format
- Extracts page titles and URLs
- Maintains source references for citations
- Truncates content to stay within API limits
-
Content Preparation:
- Organizes crawled content by relevance
- Maintains page structure and hierarchy
- Preserves source URLs for citations
-
AI Processing:
- Streams requests to GPT-4o-mini
- Maintains conversation context
- Generates responses with source citations
-
Response Handling:
- Real-time streaming of answers
- Markdown formatting for readability
- Automatic link insertion
- Source reference preservation
- API Key: Your Firecrawl authentication key
- Max Depth (default: 3):
- Controls how many links deep the crawler will go
- Higher values explore more nested pages
- Recommended range: 1-5 for optimal performance
-
Page Limit (default: 50):
- Maximum number of pages to crawl
- Higher limits allow more comprehensive coverage
- Consider API usage when adjusting
- Range: 1-1000 pages
-
Allow Backward Links (default: true):
- When enabled, crawler follows links to previously visited domains
- Useful for sites with cross-referenced content
- Disable to stay within single domain
-
Timeout (default: 20000ms):
- Maximum time to wait for each page
- Prevents hanging on slow-loading pages
- Recommended range: 5000-30000ms
-
Wait For (default: 2000ms):
- Delay between page requests
- Helps respect server rate limits
- Adjust based on site's robustness
- Range: 0-5000ms
- Max Content Length (default: 250,000 characters):
- Maximum characters sent to GPT-4o-mini
- Balances comprehensiveness with API limits
- Range: 1-500,000 characters
- Higher values may increase API costs
- Model Selection:
gpt-4o-mini
: Faster, more concise responsesgpt-4o
: More detailed, nuanced answers, much more expensive- Choose based on your needs for speed vs. detail and cost
The extension maintains separate conversation histories for each domain:
{
"example.com": [
{"role": "user", "content": "What is this site about?"},
{"role": "assistant", "content": "Based on [Home Page](https://example.com), this site..."},
// Additional messages...
]
}
The extension processes content in multiple stages:
-
HTML Processing:
// Example of content processing { "title": "Page Title", "url": "https://example.com/page", "content": "Processed markdown content...", "metadata": { "sourceURL": "https://example.com/page", "crawlTime": "2024-01-01T00:00:00Z" } }
-
Content Organization:
- Groups related content
- Maintains hierarchy
- Preserves source references
Responses are streamed in real-time:
// Example streaming response format
{
"role": "assistant",
"content": "According to [About Page](https://example.com/about)...",
"references": [
{"title": "About Page", "url": "https://example.com/about"},
// Additional references...
]
}
- Optimal Depth: 2-3 levels for most sites
- Recommended Limits:
- Small sites: 20-50 pages
- Medium sites: 50-200 pages
- Large sites: 200-500 pages
- Rate Limiting: Automatically handles server restrictions
The extension manages memory by:
- Truncating large pages
- Clearing old conversation histories
- Optimizing content storage
- API keys stored securely in Chrome storage
- Content sanitization for XSS prevention
- Secure communication with APIs
- Rate limiting protection
-
Crawling Failures:
- Check API key validity
- Verify site accessibility
- Adjust timeout settings
- Check for rate limiting
-
Response Issues:
- Verify content length limits
- Check API quota
- Validate model selection
-
Performance Problems:
- Reduce crawl depth
- Lower page limits
- Increase timeouts
// Example Firecrawl API request
{
"url": "https://example.com",
"scrapeOptions": {
"formats": ["markdown"],
"waitFor": 2000,
"timeout": 20000
},
"limit": 50,
"allowBackwardLinks": true,
"maxDepth": 3
}
// Example OpenAI API request
{
"model": "your-preferred-model-deployment-name",
"messages": [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."}
],
"stream": true,
"temperature": 0.1
}
For help:
- Check Troubleshooting Guide
- Review existing GitHub issues
- Create new issues with:
- Problem description
- Steps to reproduce
- Relevant settings/configuration
- Error messages if any
- Screenshots if possible
Made with ❤️ by Rohit
Use responsibly and in accordance with all applicable terms of service and robots.txt policies.