A production-ready, AI-powered bot detection system with deep learning, SQLite persistence, CAPTCHA integration, and FastAPI.
- π§ Deep Learning: Neural network with 64β32β16 architecture for advanced pattern recognition
- πΎ Persistent Storage: SQLite database for tracking, training data, and model checkpoints
- π CAPTCHA Integration: Support for Google reCAPTCHA, hCaptcha, and Cloudflare Turnstile
- β‘ FastAPI Integration: RESTful API with built-in rate limiting
- π Real-time Detection: 7-feature analysis including timing, entropy, and burst detection
- π― Online Learning: Continuous model improvement from feedback
- π¦ Multi-level Actions: ALLOW, MONITOR, CHALLENGE, BLOCK
- π Analytics Dashboard: Track bot traffic, IP reputation, and model performance
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FastAPI Endpoints β
β /check /feedback /captcha/verify /stats /train β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββ
β Advanced Bot Detector β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Feature β β Deep Learningβ β CAPTCHA β β
β β Extraction β β Model β β Manager β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββ
β SQLite Database β
β Requests | IP Reputation | Training Data | Checkpoints β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Python 3.8+
- pip
# 1. Clone or download the files
git clone <repository-url>
cd bot-detection
# 2. Install dependencies
pip install -r requirements.txt
# 3. Run the demo
python bot_detector.py
# 4. Start the API server
uvicorn bot_detector:app --reload
# 5. Visit API documentation
# Open browser: http://localhost:8000/docsEdit config.yaml to customize:
- Detection thresholds
- Rate limiting rules
- CAPTCHA provider settings
- Deep learning parameters
- Database settings
from bot_detector import AdvancedBotDetector
# Initialize
detector = AdvancedBotDetector()
# Check a request
result = detector.check_request(
ip="192.168.1.100",
user_agent="Mozilla/5.0...",
endpoint="/api/products"
)
print(f"Is Bot: {result['is_bot']}")
print(f"Score: {result['score']:.2f}/100")
print(f"Action: {result['action']}")
# Provide feedback for learning
detector.provide_feedback(
ip="192.168.1.100",
user_agent="Mozilla/5.0...",
endpoint="/api/products",
is_bot=False # Confirmed human
)
# Train model
detector.batch_train(epochs=10, batch_size=32)curl -X POST "http://localhost:8000/check" \
-H "Content-Type: application/json" \
-d '{
"user_agent": "python-requests/2.28.0",
"endpoint": "/api/data"
}'Response:
{
"is_bot": true,
"score": 85.6,
"ml_score": 89.2,
"ml_probability": 0.892,
"action": "CHALLENGE",
"captcha_token": "abc123...",
"features": [0.1, 0.8, 0.15, 0.2, 0.5, 0.0, 0.0]
}curl -X POST "http://localhost:8000/feedback" \
-H "Content-Type: application/json" \
-d '{
"user_agent": "curl/7.68.0",
"endpoint": "/api/data",
"is_bot": true
}'curl -X POST "http://localhost:8000/captcha/verify" \
-H "Content-Type: application/json" \
-d '{
"token": "captcha_token_here",
"response": "user_captcha_response"
}'curl http://localhost:8000/statsResponse:
{
"total_requests": 1523,
"bot_requests": 456,
"human_requests": 1067,
"bot_percentage": 29.9,
"unique_ips": 234,
"blocked_ips": 12,
"training_examples": 567,
"deep_learning_enabled": true
}curl -X POST "http://localhost:8000/train?epochs=10&batch_size=32"from fastapi import FastAPI, Request
from bot_detector import AdvancedBotDetector
app = FastAPI()
detector = AdvancedBotDetector()
@app.middleware("http")
async def bot_detection_middleware(request: Request, call_next):
# Check request
result = detector.check_request(
ip=request.client.host,
user_agent=request.headers.get("user-agent", ""),
endpoint=request.url.path
)
# Block bots
if result["action"] == "BLOCK":
return JSONResponse(
status_code=403,
content={"error": "Access denied"}
)
# Challenge suspicious requests
if result["action"] == "CHALLENGE":
return JSONResponse(
status_code=403,
content={
"error": "CAPTCHA required",
"captcha_token": result["captcha_token"]
}
)
# Continue to endpoint
response = await call_next(request)
return response- Request Rate: Normalized request frequency per IP
- User Agent Score: Detection of bot signatures in UA strings
- Endpoint Pattern: Analysis of accessed endpoints
- Timing Regularity: Bots have consistent intervals, humans vary
- Request Entropy: Diversity of behavior patterns
- Failed Login Rate: Credential stuffing detection
- Burst Detection: Sudden spikes in activity
- Architecture: 7 β 64 β 32 β 16 β 1
- Activation: ReLU + Sigmoid output
- Regularization: Batch normalization + Dropout (0.3)
- Training: Adam optimizer, BCE loss
- Online Learning: Updates in real-time from feedback
| Score | Action | Description |
|---|---|---|
| 0-49 | ALLOW | Normal traffic, no intervention |
| 50-69 | MONITOR | Suspicious, logged for analysis |
| 70-89 | CHALLENGE | Show CAPTCHA to verify human |
| 90-100 | BLOCK | High confidence bot, deny access |
- requests: All request logs with scores and features
- ip_reputation: IP-level aggregated reputation scores
- failed_logins: Failed authentication attempts
- model_checkpoints: Saved model states
- training_data: Labeled examples for training
- captcha_challenges: CAPTCHA token tracking
- In-memory caching: Recent requests cached for fast lookup
- Batch processing: Database writes optimized
- Async operations: Non-blocking I/O with FastAPI
- Model inference: Optimized PyTorch forward pass
- Index optimization: Database indexes on IP and timestamp
- SQL Injection: Parameterized queries throughout
- Rate Limiting: Per-endpoint configurable limits
- Input Validation: Pydantic models validate all inputs
- CAPTCHA Tokens: One-time use, expiring tokens
- IP Whitelisting: Protect known good IPs
- Data Privacy: PII not stored, only behavioral data
# Get statistics
stats = detector.db.get_stats()
print(f"Bot detection rate: {stats['bot_percentage']:.1f}%")
# Get IP reputation
rep = detector.db.get_ip_reputation("192.168.1.100")
print(f"Reputation score: {rep['reputation_score']}")# Remove data older than 30 days
detector.db.cleanup_old_data(days=30)# Retrain with latest data
detector.batch_train(epochs=20, batch_size=64)
# Save checkpoint
detector.save_model()pip install gunicorn
gunicorn -w 4 -k uvicorn.workers.UvicornWorker bot_detector:appFROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "bot_detector:app", "--host", "0.0.0.0", "--port", "8000"]export DB_PATH="/var/lib/bot_detection/bot_detection.db"
export CAPTCHA_SECRET="your_secret_key"
export API_HOST="0.0.0.0"
export API_PORT="8000"# Integrate with AbuseIPDB, IPQualityScore, etc.
import requests
def check_ip_reputation(ip):
response = requests.get(
f"https://api.abuseipdb.com/api/v2/check",
params={"ipAddress": ip},
headers={"Key": "YOUR_API_KEY"}
)
return response.json()// Client-side fingerprinting
const fingerprint = {
canvas: getCanvasFingerprint(),
webgl: getWebGLFingerprint(),
fonts: getFontFingerprint(),
plugins: getPluginFingerprint()
};
// Send with requests
fetch('/check', {
headers: {
'X-Fingerprint': JSON.stringify(fingerprint)
}
});# Install PyTorch
pip install torch torchvision
# For CPU-only (smaller)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu# Increase timeout
import sqlite3
conn.execute("PRAGMA busy_timeout = 30000")# Adjust config.yaml thresholds
detection:
block_threshold: 95 # More lenient
challenge_threshold: 80Contributions welcome! Areas for improvement:
- Additional ML models (Random Forest, XGBoost)
- More sophisticated fingerprinting
- Real-time dashboard UI
- Distributed deployment support
- More CAPTCHA providers
- MIT License - See LICENSE file
- Issues: GitHub Issues
- Documentation: /docs
- API Reference: http://localhost:8000/docs
Built with β€οΈ for protecting web applications from automated abuse.