Advanced Bot Detection System 🤖🛡️

A production-ready, AI-powered bot detection system with deep learning, SQLite persistence, CAPTCHA integration, and FastAPI.

Features ✨

🧠 Deep Learning: Neural network with 64→32→16 architecture for advanced pattern recognition
💾 Persistent Storage: SQLite database for tracking, training data, and model checkpoints
🔐 CAPTCHA Integration: Support for Google reCAPTCHA, hCaptcha, and Cloudflare Turnstile
⚡ FastAPI Integration: RESTful API with built-in rate limiting
📊 Real-time Detection: 7-feature analysis including timing, entropy, and burst detection
🎯 Online Learning: Continuous model improvement from feedback
🚦 Multi-level Actions: ALLOW, MONITOR, CHALLENGE, BLOCK
📈 Analytics Dashboard: Track bot traffic, IP reputation, and model performance

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    FastAPI Endpoints                         │
│  /check  /feedback  /captcha/verify  /stats  /train         │
└────────────────────┬────────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────────────────┐
│              Advanced Bot Detector                           │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │   Feature    │  │ Deep Learning│  │   CAPTCHA    │      │
│  │  Extraction  │  │    Model     │  │   Manager    │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
└────────────────────┬────────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────────────────┐
│                  SQLite Database                             │
│  Requests | IP Reputation | Training Data | Checkpoints     │
└─────────────────────────────────────────────────────────────┘

Installation 🚀

Prerequisites

Python 3.8+
pip

Quick Start

# 1. Clone or download the files
git clone <repository-url>
cd bot-detection

# 2. Install dependencies
pip install -r requirements.txt

# 3. Run the demo
python bot_detector.py

# 4. Start the API server
uvicorn bot_detector:app --reload

# 5. Visit API documentation
# Open browser: http://localhost:8000/docs

Configuration ⚙️

Edit config.yaml to customize:

Detection thresholds
Rate limiting rules
CAPTCHA provider settings
Deep learning parameters
Database settings

Usage Examples

Python Integration

from bot_detector import AdvancedBotDetector

# Initialize
detector = AdvancedBotDetector()

# Check a request
result = detector.check_request(
    ip="192.168.1.100",
    user_agent="Mozilla/5.0...",
    endpoint="/api/products"
)

print(f"Is Bot: {result['is_bot']}")
print(f"Score: {result['score']:.2f}/100")
print(f"Action: {result['action']}")

# Provide feedback for learning
detector.provide_feedback(
    ip="192.168.1.100",
    user_agent="Mozilla/5.0...",
    endpoint="/api/products",
    is_bot=False  # Confirmed human
)

# Train model
detector.batch_train(epochs=10, batch_size=32)

API Usage

Check Request

curl -X POST "http://localhost:8000/check" \
  -H "Content-Type: application/json" \
  -d '{
    "user_agent": "python-requests/2.28.0",
    "endpoint": "/api/data"
  }'

Response:

{
  "is_bot": true,
  "score": 85.6,
  "ml_score": 89.2,
  "ml_probability": 0.892,
  "action": "CHALLENGE",
  "captcha_token": "abc123...",
  "features": [0.1, 0.8, 0.15, 0.2, 0.5, 0.0, 0.0]
}

Provide Feedback

curl -X POST "http://localhost:8000/feedback" \
  -H "Content-Type: application/json" \
  -d '{
    "user_agent": "curl/7.68.0",
    "endpoint": "/api/data",
    "is_bot": true
  }'

Verify CAPTCHA

curl -X POST "http://localhost:8000/captcha/verify" \
  -H "Content-Type: application/json" \
  -d '{
    "token": "captcha_token_here",
    "response": "user_captcha_response"
  }'

Get Statistics

curl http://localhost:8000/stats

Response:

{
  "total_requests": 1523,
  "bot_requests": 456,
  "human_requests": 1067,
  "bot_percentage": 29.9,
  "unique_ips": 234,
  "blocked_ips": 12,
  "training_examples": 567,
  "deep_learning_enabled": true
}

Trigger Training

curl -X POST "http://localhost:8000/train?epochs=10&batch_size=32"

FastAPI Middleware Integration

from fastapi import FastAPI, Request
from bot_detector import AdvancedBotDetector

app = FastAPI()
detector = AdvancedBotDetector()

@app.middleware("http")
async def bot_detection_middleware(request: Request, call_next):
    # Check request
    result = detector.check_request(
        ip=request.client.host,
        user_agent=request.headers.get("user-agent", ""),
        endpoint=request.url.path
    )
    
    # Block bots
    if result["action"] == "BLOCK":
        return JSONResponse(
            status_code=403,
            content={"error": "Access denied"}
        )
    
    # Challenge suspicious requests
    if result["action"] == "CHALLENGE":
        return JSONResponse(
            status_code=403,
            content={
                "error": "CAPTCHA required",
                "captcha_token": result["captcha_token"]
            }
        )
    
    # Continue to endpoint
    response = await call_next(request)
    return response

Features Explained 📚

7-Feature Detection System

Request Rate: Normalized request frequency per IP
User Agent Score: Detection of bot signatures in UA strings
Endpoint Pattern: Analysis of accessed endpoints
Timing Regularity: Bots have consistent intervals, humans vary
Request Entropy: Diversity of behavior patterns
Failed Login Rate: Credential stuffing detection
Burst Detection: Sudden spikes in activity

Deep Learning Model

Architecture: 7 → 64 → 32 → 16 → 1
Activation: ReLU + Sigmoid output
Regularization: Batch normalization + Dropout (0.3)
Training: Adam optimizer, BCE loss
Online Learning: Updates in real-time from feedback

Action Levels

Score	Action	Description
0-49	ALLOW	Normal traffic, no intervention
50-69	MONITOR	Suspicious, logged for analysis
70-89	CHALLENGE	Show CAPTCHA to verify human
90-100	BLOCK	High confidence bot, deny access

Database Schema 🗄️

Tables

requests: All request logs with scores and features
ip_reputation: IP-level aggregated reputation scores
failed_logins: Failed authentication attempts
model_checkpoints: Saved model states
training_data: Labeled examples for training
captcha_challenges: CAPTCHA token tracking

Performance Optimization 🚄

In-memory caching: Recent requests cached for fast lookup
Batch processing: Database writes optimized
Async operations: Non-blocking I/O with FastAPI
Model inference: Optimized PyTorch forward pass
Index optimization: Database indexes on IP and timestamp

Security Considerations 🔒

SQL Injection: Parameterized queries throughout
Rate Limiting: Per-endpoint configurable limits
Input Validation: Pydantic models validate all inputs
CAPTCHA Tokens: One-time use, expiring tokens
IP Whitelisting: Protect known good IPs
Data Privacy: PII not stored, only behavioral data

Monitoring & Maintenance 📊

Check System Health

# Get statistics
stats = detector.db.get_stats()
print(f"Bot detection rate: {stats['bot_percentage']:.1f}%")

# Get IP reputation
rep = detector.db.get_ip_reputation("192.168.1.100")
print(f"Reputation score: {rep['reputation_score']}")

Clean Old Data

# Remove data older than 30 days
detector.db.cleanup_old_data(days=30)

Model Retraining

# Retrain with latest data
detector.batch_train(epochs=20, batch_size=64)

# Save checkpoint
detector.save_model()

Production Deployment 🌐

Using Gunicorn

pip install gunicorn
gunicorn -w 4 -k uvicorn.workers.UvicornWorker bot_detector:app

Using Docker

FROM python:3.10-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["uvicorn", "bot_detector:app", "--host", "0.0.0.0", "--port", "8000"]

Environment Variables

export DB_PATH="/var/lib/bot_detection/bot_detection.db"
export CAPTCHA_SECRET="your_secret_key"
export API_HOST="0.0.0.0"
export API_PORT="8000"

Advanced Features 🎯

IP Reputation API Integration

# Integrate with AbuseIPDB, IPQualityScore, etc.
import requests

def check_ip_reputation(ip):
    response = requests.get(
        f"https://api.abuseipdb.com/api/v2/check",
        params={"ipAddress": ip},
        headers={"Key": "YOUR_API_KEY"}
    )
    return response.json()

Fingerprinting (Browser-based)

// Client-side fingerprinting
const fingerprint = {
    canvas: getCanvasFingerprint(),
    webgl: getWebGLFingerprint(),
    fonts: getFontFingerprint(),
    plugins: getPluginFingerprint()
};

// Send with requests
fetch('/check', {
    headers: {
        'X-Fingerprint': JSON.stringify(fingerprint)
    }
});

Troubleshooting 🔧

PyTorch Not Available

# Install PyTorch
pip install torch torchvision

# For CPU-only (smaller)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu

Database Locked Error

# Increase timeout
import sqlite3
conn.execute("PRAGMA busy_timeout = 30000")

High False Positives

# Adjust config.yaml thresholds
detection:
  block_threshold: 95  # More lenient
  challenge_threshold: 80

Contributing 🤝

Contributions welcome! Areas for improvement:

Additional ML models (Random Forest, XGBoost)
More sophisticated fingerprinting
Real-time dashboard UI
Distributed deployment support
More CAPTCHA providers

License 📄

MIT License - See LICENSE file

Support 💬

Issues: GitHub Issues
Documentation: /docs
API Reference: http://localhost:8000/docs

Built with ❤️ for protecting web applications from automated abuse.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__pycache__		__pycache__
README.md		README.md
bot_detection.db		bot_detection.db
bot_detection_demo.db		bot_detection_demo.db
bot_detection_system.py		bot_detection_system.py
requirements.txt		requirements.txt

melisasvr/Advanced-Bot-Detection-System

Folders and files

Latest commit

History

Repository files navigation

Advanced Bot Detection System 🤖🛡️

Features ✨

Architecture

Installation 🚀

Prerequisites

Quick Start

Configuration ⚙️

Usage Examples

Python Integration

API Usage

Check Request

Provide Feedback

Verify CAPTCHA

Get Statistics

Trigger Training

FastAPI Middleware Integration

Features Explained 📚

7-Feature Detection System

Deep Learning Model

Action Levels

Database Schema 🗄️

Tables

Performance Optimization 🚄

Security Considerations 🔒

Monitoring & Maintenance 📊

Check System Health

Clean Old Data

Model Retraining

Production Deployment 🌐

Using Gunicorn

Using Docker

Environment Variables

Advanced Features 🎯

IP Reputation API Integration

Fingerprinting (Browser-based)

Troubleshooting 🔧

PyTorch Not Available

Database Locked Error

High False Positives

Contributing 🤝

License 📄

Support 💬

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages