QA-with-website - AI powered Chrome Extension

A Chrome extension that enables you to ask questions about any website you're visiting. The extension crawls web pages, parses and interprets their content, and allows you to chat with the site's content.

🌟 Features

Ask questions about any website's content in natural language
Smart web crawling that respects site structure and robots.txt
Chat-like interface with streaming responses
Context-aware responses with references to specific pages
Maintains conversation history per domain
Real-time streaming responses
Highly configurable crawling parameters
Secure API key management
Markdown formatting support in responses
Automatic link references in answers

How It Works

Crawling Process

Initial Scan: When you click "Start Crawl", the extension:
- Validates your API keys
- Checks the current domain
- Initializes a new conversation history
- Starts the Firecrawl process
Progress Tracking:
- Shows a progress bar with real-time updates
- Displays number of pages crawled
- Indicates crawling status and completion
Content Processing:
- Converts HTML to clean markdown format
- Extracts page titles and URLs
- Maintains source references for citations
- Truncates content to stay within API limits

Question-Answering Flow

Content Preparation:
- Organizes crawled content by relevance
- Maintains page structure and hierarchy
- Preserves source URLs for citations
AI Processing:
- Streams requests to GPT-4o-mini
- Maintains conversation context
- Generates responses with source citations
Response Handling:
- Real-time streaming of answers
- Markdown formatting for readability
- Automatic link insertion
- Source reference preservation

Detailed Configuration Options

Firecrawl Settings

Basic Configuration

API Key: Your Firecrawl authentication key
Max Depth (default: 3):
- Controls how many links deep the crawler will go
- Higher values explore more nested pages
- Recommended range: 1-5 for optimal performance

Crawling Parameters

Page Limit (default: 50):
- Maximum number of pages to crawl
- Higher limits allow more comprehensive coverage
- Consider API usage when adjusting
- Range: 1-1000 pages
Allow Backward Links (default: true):
- When enabled, crawler follows links to previously visited domains
- Useful for sites with cross-referenced content
- Disable to stay within single domain

Performance Settings

Timeout (default: 20000ms):
- Maximum time to wait for each page
- Prevents hanging on slow-loading pages
- Recommended range: 5000-30000ms
Wait For (default: 2000ms):
- Delay between page requests
- Helps respect server rate limits
- Adjust based on site's robustness
- Range: 0-5000ms

Content Processing Settings

Max Content Length (default: 250,000 characters):
- Maximum characters sent to GPT-4o-mini
- Balances comprehensiveness with API limits
- Range: 1-500,000 characters
- Higher values may increase API costs

Model Settings

Model Selection:
- gpt-4o-mini: Faster, more concise responses
- gpt-4o: More detailed, nuanced answers, much more expensive
- Choose based on your needs for speed vs. detail and cost

Advanced Usage

Conversation Management

The extension maintains separate conversation histories for each domain:

{
  "example.com": [
    {"role": "user", "content": "What is this site about?"},
    {"role": "assistant", "content": "Based on [Home Page](https://example.com), this site..."},
    // Additional messages...
  ]
}

Content Formatting

The extension processes content in multiple stages:

HTML Processing:

// Example of content processing
{
  "title": "Page Title",
  "url": "https://example.com/page",
  "content": "Processed markdown content...",
  "metadata": {
    "sourceURL": "https://example.com/page",
    "crawlTime": "2024-01-01T00:00:00Z"
  }
}

Content Organization:
- Groups related content
- Maintains hierarchy
- Preserves source references

API Response Handling

Responses are streamed in real-time:

// Example streaming response format
{
  "role": "assistant",
  "content": "According to [About Page](https://example.com/about)...",
  "references": [
    {"title": "About Page", "url": "https://example.com/about"},
    // Additional references...
  ]
}

Performance Considerations

Crawling Performance

Optimal Depth: 2-3 levels for most sites
Recommended Limits:
- Small sites: 20-50 pages
- Medium sites: 50-200 pages
- Large sites: 200-500 pages
Rate Limiting: Automatically handles server restrictions

Memory Usage

The extension manages memory by:

Truncating large pages
Clearing old conversation histories
Optimizing content storage

Security Features

API keys stored securely in Chrome storage
Content sanitization for XSS prevention
Secure communication with APIs
Rate limiting protection

Troubleshooting

Common Issues

Crawling Failures:
- Check API key validity
- Verify site accessibility
- Adjust timeout settings
- Check for rate limiting
Response Issues:
- Verify content length limits
- Check API quota
- Validate model selection
Performance Problems:
- Reduce crawl depth
- Lower page limits
- Increase timeouts

API Documentation

Firecrawl API

// Example Firecrawl API request
{
  "url": "https://example.com",
  "scrapeOptions": {
    "formats": ["markdown"],
    "waitFor": 2000,
    "timeout": 20000
  },
  "limit": 50,
  "allowBackwardLinks": true,
  "maxDepth": 3
}

OpenAI API

// Example OpenAI API request
{
  "model": "your-preferred-model-deployment-name",
  "messages": [
    {"role": "system", "content": "..."},
    {"role": "user", "content": "..."}
  ],
  "stream": true,
  "temperature": 0.1
}

Contributions and Help

For help:

Check Troubleshooting Guide
Review existing GitHub issues
Create new issues with:
- Problem description
- Steps to reproduce
- Relevant settings/configuration
- Error messages if any
- Screenshots if possible

Made with ❤️ by Rohit

Use responsibly and in accordance with all applicable terms of service and robots.txt policies.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
icons		icons
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
background.js		background.js
manifest.json		manifest.json
options.html		options.html
options.js		options.js
popup.html		popup.html
popup.js		popup.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QA-with-website - AI powered Chrome Extension

🌟 Features

How It Works

Crawling Process

Question-Answering Flow

Detailed Configuration Options

Firecrawl Settings

Basic Configuration

Crawling Parameters

Performance Settings

Content Processing Settings

Model Settings

Advanced Usage

Conversation Management

Content Formatting

API Response Handling

Performance Considerations

Crawling Performance

Memory Usage

Security Features

Troubleshooting

Common Issues

API Documentation

Firecrawl API

OpenAI API

Contributions and Help

About

Releases

Packages

Languages

License

TRohit20/QA-with-website

Folders and files

Latest commit

History

Repository files navigation

QA-with-website - AI powered Chrome Extension

🌟 Features

How It Works

Crawling Process

Question-Answering Flow

Detailed Configuration Options

Firecrawl Settings

Basic Configuration

Crawling Parameters

Performance Settings

Content Processing Settings

Model Settings

Advanced Usage

Conversation Management

Content Formatting

API Response Handling

Performance Considerations

Crawling Performance

Memory Usage

Security Features

Troubleshooting

Common Issues

API Documentation

Firecrawl API

OpenAI API

Contributions and Help

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages