Skip to content

melisasvr/Advanced-Bot-Detection-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Advanced Bot Detection System πŸ€–πŸ›‘οΈ

A production-ready, AI-powered bot detection system with deep learning, SQLite persistence, CAPTCHA integration, and FastAPI.

Features ✨

  • 🧠 Deep Learning: Neural network with 64β†’32β†’16 architecture for advanced pattern recognition
  • πŸ’Ύ Persistent Storage: SQLite database for tracking, training data, and model checkpoints
  • πŸ” CAPTCHA Integration: Support for Google reCAPTCHA, hCaptcha, and Cloudflare Turnstile
  • ⚑ FastAPI Integration: RESTful API with built-in rate limiting
  • πŸ“Š Real-time Detection: 7-feature analysis including timing, entropy, and burst detection
  • 🎯 Online Learning: Continuous model improvement from feedback
  • 🚦 Multi-level Actions: ALLOW, MONITOR, CHALLENGE, BLOCK
  • πŸ“ˆ Analytics Dashboard: Track bot traffic, IP reputation, and model performance

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    FastAPI Endpoints                         β”‚
β”‚  /check  /feedback  /captcha/verify  /stats  /train         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Advanced Bot Detector                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚  β”‚   Feature    β”‚  β”‚ Deep Learningβ”‚  β”‚   CAPTCHA    β”‚      β”‚
β”‚  β”‚  Extraction  β”‚  β”‚    Model     β”‚  β”‚   Manager    β”‚      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  SQLite Database                             β”‚
β”‚  Requests | IP Reputation | Training Data | Checkpoints     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Installation πŸš€

Prerequisites

  • Python 3.8+
  • pip

Quick Start

# 1. Clone or download the files
git clone <repository-url>
cd bot-detection

# 2. Install dependencies
pip install -r requirements.txt

# 3. Run the demo
python bot_detector.py

# 4. Start the API server
uvicorn bot_detector:app --reload

# 5. Visit API documentation
# Open browser: http://localhost:8000/docs

Configuration βš™οΈ

Edit config.yaml to customize:

  • Detection thresholds
  • Rate limiting rules
  • CAPTCHA provider settings
  • Deep learning parameters
  • Database settings

Usage Examples

Python Integration

from bot_detector import AdvancedBotDetector

# Initialize
detector = AdvancedBotDetector()

# Check a request
result = detector.check_request(
    ip="192.168.1.100",
    user_agent="Mozilla/5.0...",
    endpoint="/api/products"
)

print(f"Is Bot: {result['is_bot']}")
print(f"Score: {result['score']:.2f}/100")
print(f"Action: {result['action']}")

# Provide feedback for learning
detector.provide_feedback(
    ip="192.168.1.100",
    user_agent="Mozilla/5.0...",
    endpoint="/api/products",
    is_bot=False  # Confirmed human
)

# Train model
detector.batch_train(epochs=10, batch_size=32)

API Usage

Check Request

curl -X POST "http://localhost:8000/check" \
  -H "Content-Type: application/json" \
  -d '{
    "user_agent": "python-requests/2.28.0",
    "endpoint": "/api/data"
  }'

Response:

{
  "is_bot": true,
  "score": 85.6,
  "ml_score": 89.2,
  "ml_probability": 0.892,
  "action": "CHALLENGE",
  "captcha_token": "abc123...",
  "features": [0.1, 0.8, 0.15, 0.2, 0.5, 0.0, 0.0]
}

Provide Feedback

curl -X POST "http://localhost:8000/feedback" \
  -H "Content-Type: application/json" \
  -d '{
    "user_agent": "curl/7.68.0",
    "endpoint": "/api/data",
    "is_bot": true
  }'

Verify CAPTCHA

curl -X POST "http://localhost:8000/captcha/verify" \
  -H "Content-Type: application/json" \
  -d '{
    "token": "captcha_token_here",
    "response": "user_captcha_response"
  }'

Get Statistics

curl http://localhost:8000/stats

Response:

{
  "total_requests": 1523,
  "bot_requests": 456,
  "human_requests": 1067,
  "bot_percentage": 29.9,
  "unique_ips": 234,
  "blocked_ips": 12,
  "training_examples": 567,
  "deep_learning_enabled": true
}

Trigger Training

curl -X POST "http://localhost:8000/train?epochs=10&batch_size=32"

FastAPI Middleware Integration

from fastapi import FastAPI, Request
from bot_detector import AdvancedBotDetector

app = FastAPI()
detector = AdvancedBotDetector()

@app.middleware("http")
async def bot_detection_middleware(request: Request, call_next):
    # Check request
    result = detector.check_request(
        ip=request.client.host,
        user_agent=request.headers.get("user-agent", ""),
        endpoint=request.url.path
    )
    
    # Block bots
    if result["action"] == "BLOCK":
        return JSONResponse(
            status_code=403,
            content={"error": "Access denied"}
        )
    
    # Challenge suspicious requests
    if result["action"] == "CHALLENGE":
        return JSONResponse(
            status_code=403,
            content={
                "error": "CAPTCHA required",
                "captcha_token": result["captcha_token"]
            }
        )
    
    # Continue to endpoint
    response = await call_next(request)
    return response

Features Explained πŸ“š

7-Feature Detection System

  1. Request Rate: Normalized request frequency per IP
  2. User Agent Score: Detection of bot signatures in UA strings
  3. Endpoint Pattern: Analysis of accessed endpoints
  4. Timing Regularity: Bots have consistent intervals, humans vary
  5. Request Entropy: Diversity of behavior patterns
  6. Failed Login Rate: Credential stuffing detection
  7. Burst Detection: Sudden spikes in activity

Deep Learning Model

  • Architecture: 7 β†’ 64 β†’ 32 β†’ 16 β†’ 1
  • Activation: ReLU + Sigmoid output
  • Regularization: Batch normalization + Dropout (0.3)
  • Training: Adam optimizer, BCE loss
  • Online Learning: Updates in real-time from feedback

Action Levels

Score Action Description
0-49 ALLOW Normal traffic, no intervention
50-69 MONITOR Suspicious, logged for analysis
70-89 CHALLENGE Show CAPTCHA to verify human
90-100 BLOCK High confidence bot, deny access

Database Schema πŸ—„οΈ

Tables

  • requests: All request logs with scores and features
  • ip_reputation: IP-level aggregated reputation scores
  • failed_logins: Failed authentication attempts
  • model_checkpoints: Saved model states
  • training_data: Labeled examples for training
  • captcha_challenges: CAPTCHA token tracking

Performance Optimization πŸš„

  • In-memory caching: Recent requests cached for fast lookup
  • Batch processing: Database writes optimized
  • Async operations: Non-blocking I/O with FastAPI
  • Model inference: Optimized PyTorch forward pass
  • Index optimization: Database indexes on IP and timestamp

Security Considerations πŸ”’

  • SQL Injection: Parameterized queries throughout
  • Rate Limiting: Per-endpoint configurable limits
  • Input Validation: Pydantic models validate all inputs
  • CAPTCHA Tokens: One-time use, expiring tokens
  • IP Whitelisting: Protect known good IPs
  • Data Privacy: PII not stored, only behavioral data

Monitoring & Maintenance πŸ“Š

Check System Health

# Get statistics
stats = detector.db.get_stats()
print(f"Bot detection rate: {stats['bot_percentage']:.1f}%")

# Get IP reputation
rep = detector.db.get_ip_reputation("192.168.1.100")
print(f"Reputation score: {rep['reputation_score']}")

Clean Old Data

# Remove data older than 30 days
detector.db.cleanup_old_data(days=30)

Model Retraining

# Retrain with latest data
detector.batch_train(epochs=20, batch_size=64)

# Save checkpoint
detector.save_model()

Production Deployment 🌐

Using Gunicorn

pip install gunicorn
gunicorn -w 4 -k uvicorn.workers.UvicornWorker bot_detector:app

Using Docker

FROM python:3.10-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["uvicorn", "bot_detector:app", "--host", "0.0.0.0", "--port", "8000"]

Environment Variables

export DB_PATH="/var/lib/bot_detection/bot_detection.db"
export CAPTCHA_SECRET="your_secret_key"
export API_HOST="0.0.0.0"
export API_PORT="8000"

Advanced Features 🎯

IP Reputation API Integration

# Integrate with AbuseIPDB, IPQualityScore, etc.
import requests

def check_ip_reputation(ip):
    response = requests.get(
        f"https://api.abuseipdb.com/api/v2/check",
        params={"ipAddress": ip},
        headers={"Key": "YOUR_API_KEY"}
    )
    return response.json()

Fingerprinting (Browser-based)

// Client-side fingerprinting
const fingerprint = {
    canvas: getCanvasFingerprint(),
    webgl: getWebGLFingerprint(),
    fonts: getFontFingerprint(),
    plugins: getPluginFingerprint()
};

// Send with requests
fetch('/check', {
    headers: {
        'X-Fingerprint': JSON.stringify(fingerprint)
    }
});

Troubleshooting πŸ”§

PyTorch Not Available

# Install PyTorch
pip install torch torchvision

# For CPU-only (smaller)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu

Database Locked Error

# Increase timeout
import sqlite3
conn.execute("PRAGMA busy_timeout = 30000")

High False Positives

# Adjust config.yaml thresholds
detection:
  block_threshold: 95  # More lenient
  challenge_threshold: 80

Contributing 🀝

Contributions welcome! Areas for improvement:

  • Additional ML models (Random Forest, XGBoost)
  • More sophisticated fingerprinting
  • Real-time dashboard UI
  • Distributed deployment support
  • More CAPTCHA providers

License πŸ“„

  • MIT License - See LICENSE file

Support πŸ’¬


Built with ❀️ for protecting web applications from automated abuse.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages