AI-powered parking management system with real-time object detection and license plate recognition.
User Browser
โ Opens webpage
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FRONTEND (Port 5169) โ React App
โ - Displays UI โ
โ - Shows video stream โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Makes HTTP requests
โ GET /stream/detect
โ POST /api/plate-detect
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ BACKEND (Port 8069) โ FastAPI Server
โ - Runs AI models (YOLO) โ
โ - Processes video frames โ
โ - Stores to Firebase โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Fetches stream
โ GET /stream
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ESP32 SERVER (Port 5069) โ Video Source
โ - Provides MJPEG stream โ
โ - Dev: Video files โ
โ - Prod: Real camera โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
What it does:
- Shows web interface to user
- Displays video stream using
<img>tag - Lets user adjust detection settings
- Makes API calls to backend
What it DOESN'T do:
- No AI processing
- No video file handling
- No direct ESP32 connection (goes through backend)
Example calls:
// Display stream with detection
<img src="http://localhost:8069/stream/detect?conf=0.25" />
// Detect license plate
fetch('http://localhost:8069/api/plate-detect', {
method: 'POST',
body: JSON.stringify({ imageData: '...' })
})What it does:
- Runs YOLO AI model (with CUDA)
- Processes video frames in real-time
- Adds bounding boxes to detections
- Proxies stream from ESP32 server
- Saves detection results to Firebase
What it DOESN'T do:
- No user interface
- No video file storage
- No direct browser interaction
How it processes a frame:
1. Frontend requests: GET /stream/detect
2. Backend connects to: http://localhost:5069/stream
3. For each frame:
a. Read JPEG frame from ESP32
b. Run YOLO detection (GPU accelerated)
c. Draw bounding boxes + labels
d. Send annotated frame to frontend
4. Loop continuouslyAPIs it provides:
GET /streamโ Raw stream proxy (no AI)GET /stream/detectโ Stream with AI detectionPOST /api/plate-detectโ Detect license platesPOST /api/object-trackingโ Track objects in videoGET /healthโ Check if backend is alive
What it does:
- Provides raw video stream (MJPEG format)
- In development: Streams from video files
- In production: Streams from real ESP32-CAM camera
What it DOESN'T do:
- No AI processing
- No detection or tracking
- No data storage
- Just streams video
How to switch modes:
# Development (video files)
python start_mock.py --video parking.mp4 --port 5069
# Production (real hardware)
# Flash ESP32-CAM firmware, it runs on port 81
# Update backend: VITE_ESP32_URL=http://192.168.33.122:81โโโโโโโโโโโโ
โ User โ Opens browser โ http://localhost:5169
โโโโโโฌโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FRONTEND (React) โ
โ StreamViewerPageESP32.tsx โ
โ โ
โ <img src="http://localhost:8069/ โ
โ stream/detect?conf=0.25" /> โ
โโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ HTTP GET Request
โ (Browser automatically requests image)
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ BACKEND (FastAPI) โ
โ main_fastapi.py โ
โ โ
โ @app.get("/stream/detect") โ
โ 1. Connect to ESP32 โ โโโโโโ
โ 2. Read frame from ESP32 โ โ
โ 3. Run YOLO model (GPU) โ โ
โ 4. Draw bounding boxes โ โ
โ 5. Send back to frontend โ โ
โ 6. Repeat for next frame โ โ
โโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ HTTP GET /stream โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ESP32 SERVER โ
โ mock_esp32_server.py โ
โ โ
โ Reads video file โ
โ Sends MJPEG frames โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโ
โ User โ Clicks "Detect Plate" button
โโโโโโฌโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FRONTEND โ
โ Captures current frame โ
โ Converts to base64 โ
โ fetch('http://localhost:8069/ โ
โ api/plate-detect', { โ
โ body: { imageData: 'base64...' } โ
โ }) โ
โโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ HTTP POST
โ { imageData: "data:image/jpeg;base64,..." }
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ BACKEND โ
โ 1. Decode base64 image โ
โ 2. Run YOLO (detect vehicles) โ
โ 3. Run ALPR (read plate text) โ
โ 4. Save to Firebase โ โโโ Firebase
โ 5. Return results โ
โโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Response
โ { plates: [{ text: "ABC123", confidence: 0.95 }] }
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FRONTEND โ
โ Displays plate number to user โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- Frontend: User interface only (React is good at this)
- Backend: Heavy AI processing (Python is good at this)
- ESP32: Video streaming only (cheap hardware)
- โ Frontend stays simple - No AI models to download
- โ Backend can use GPU - Fast CUDA processing
- โ ESP32 is lightweight - Just streams video
- โ Easy to scale - Add more backends for load balancing
- โ Development friendly - Can use video files instead of real hardware
- CORS: Browsers block direct camera connections
- Processing: Need server-side GPU for AI
- Security: Don't expose ESP32 directly to internet
- Flexibility: Can switch between dev/prod streams easily
| Connection | Protocol | Format | Purpose |
|---|---|---|---|
| Browser โ Frontend | HTTP/HTTPS | HTML/JS/CSS | Load webpage |
| Frontend โ Backend | HTTP REST | JSON | API calls, commands |
| Frontend โ Backend | HTTP MJPEG | JPEG frames | Video stream |
| Backend โ ESP32 | HTTP | MJPEG | Fetch video |
| Backend โ Firebase | HTTPS | JSON | Store data |
localhost:5169 โ Frontend (React dev server)
localhost:8069 โ Backend (FastAPI + AI)
localhost:5069 โ ESP32 Server (Video source)
Key Point: Frontend NEVER talks to ESP32 directly. Always goes through Backend.
โโโโโโโโโโโโโโโ
โ Video File โ (parking.mp4)
โ or Camera โ
โโโโโโโโฌโโโโโโโ
โ 30 FPS
โผ
โโโโโโโโโโโโโโโโโโโ
โ ESP32 Server โ Encodes frames โ MJPEG
โ (Port 5069) โ Sends continuous stream
โโโโโโโโฌโโโโโโโโโโโ
โ MJPEG Stream (~30 FPS)
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Backend (Port 8069) โ
โ โ
โ For each frame: โ
โ 1. Decode JPEG โ โ 10ms (CPU)
โ 2. Run YOLO detection โ โ 10ms (GPU) โก
โ 3. Draw bounding boxes โ โ 2ms (CPU)
โ 4. Encode back to JPEG โ โ 5ms (CPU)
โ Total: ~27ms = ~37 FPS โ
โโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MJPEG with annotations (~30 FPS)
โผ
โโโโโโโโโโโโโโโโโโโ
โ Frontend โ Browser decodes & displays
โ (Port 5169) โ User sees annotated video
โโโโโโโโโโโโโโโโโโโ
Note: GPU processes ~100 FPS but stream is 30 FPS, so detection is real-time with overhead.
To verify everything is connected correctly:
# 1. Check ESP32 is streaming
curl http://localhost:5069/status
# Expected: {"device":"ESP32-CAM Mock","status":"idle",...}
# 2. Check backend can reach ESP32
curl http://localhost:8069/stream | head -c 1000
# Expected: Binary JPEG data (should show bytes)
# 3. Check frontend can reach backend
curl http://localhost:8069/health
# Expected: {"status":"ok","models_loaded":true,...}
# 4. Check frontend is running
curl http://localhost:5169
# Expected: HTML contentIf all 4 work โ Architecture is correctly set up! โ
| Service | Port | URL | Purpose |
|---|---|---|---|
| Frontend | 5169 | http://localhost:5169 | React Vite dev server |
| Backend | 8069 | http://localhost:8069 | FastAPI REST API + AI |
| ESP32 Dev | 5069 | http://localhost:5069 | Development streaming |
| ESP32 Prod | 81 | http://192.168.x.x:81 | Real hardware streaming |
SmartParking/
โโโ frontend/ # React + TypeScript frontend
โ โโโ src/
โ โ โโโ pages/ # Page components
โ โ โ โโโ StreamViewerPageESP32.tsx # Main stream viewer
โ โ โโโ components/ # Reusable components
โ โ โโโ config/ # API configuration
โ โ โโโ services/ # API services
โ โโโ .env # Environment variables
โ โโโ package.json
โ
โโโ server/ # FastAPI backend
โ โโโ main_fastapi.py # Main API server (CUDA enabled)
โ โโโ services/
โ โ โโโ ai_service.py # YOLO + ALPR (GPU accelerated)
โ โ โโโ firebase_service.py # Firebase integration
โ โโโ yolov8s_car_custom.pt # Custom trained model
โ โโโ yolov8n.pt # Default YOLO model
โ โโโ requirements.txt
โ
โโโ ESP32/ # ESP32-CAM integration
โ โโโ mock_esp32_server.py # Development server
โ โโโ esp32_cam_firmware.ino # Real hardware firmware
โ โโโ esp32_client.py # Python client library
โ โโโ start_mock.py # Quick start script
โ โโโ test_esp32_connection.py # Testing utilities
โ โโโ stream/ # Video files (dev)
โ โโโ HARDWARE_SETUP.md
โ
โโโ docs/ # Documentation
โโโ QUICK_START_OBJECT_TRACKING.md
โโโ PORT_CONFIGURATION.md
โโโ ENVIRONMENT_VARIABLES.md
โโโ ESP32_REFACTOR.md
- Python 3.10+ (conda environment:
scheduler) - Node.js 18+
- CUDA 11.8+ (for GPU acceleration)
- NVIDIA GPU with 4GB+ VRAM (recommended)
cd ESP32
python start_mock.py --video videos/parking_c.mp4 --port 5069cd server
eval "$(conda shell.bash hook)" && conda activate scheduler
python main_fastapi.pyExpected output:
๐ Starting FastAPI SmartParking Server...
๐ฆ Loading AI models...
๐ฅ Using CUDA device: NVIDIA GeForce RTX 3090
โ
YOLO model loaded on cuda:0
โ
ALPR model loaded
๐น Connecting to ESP32: http://localhost:5069
โ
ESP32 connected
cd frontend
npm install # First time only
npm run devNavigate to: http://localhost:5169
Select viewing mode:
- ๐ฏ Object Detection - Real-time YOLO detection with bounding boxes
- ๐น Raw Stream - Original stream without processing
- โก Direct Stream - Bypass backend proxy
- โ YOLOv8s Custom Model - Trained on parking lot dataset (mAP50: 99.49%)
- โ CUDA Acceleration - 10-30x faster inference on GPU
- โ Real-time Object Detection - Cars, motorcycles, persons
- โ Object Tracking - ByteTrack algorithm for consistent IDs
- โ License Plate Recognition - Fast-ALPR with ONNX runtime
- ๐ฏ Object Detection Mode - Annotated stream with bounding boxes
- ๐น Raw Stream Mode - Original video feed
- โก Direct Stream Mode - Direct ESP32 connection
- โ๏ธ Adjustable Settings - Confidence threshold, labels on/off
- ๐ Hot Reload - Frontend and backend auto-reload on changes
- ๐ฌ Mock Streaming - Test without ESP32 hardware
- ๐ API Documentation - Auto-generated at
/docs - ๐ Health Checks - Monitor service status
# Raw stream proxy
GET http://localhost:8069/stream
# Stream with real-time detection
GET http://localhost:8069/stream/detect?conf=0.25&show_labels=true
# Parameters:
# conf: Confidence threshold (0.1-0.9, default: 0.25)
# show_labels: Show detection labels (true/false, default: true)# License plate detection
POST http://localhost:8069/api/plate-detect
Body: { "imageData": "data:image/jpeg;base64,..." }
# Object tracking on video
POST http://localhost:8069/api/object-tracking
Body: {
"videoData": "data:video/mp4;base64,...",
"confThreshold": 0.25,
"iouThreshold": 0.45
}
# ESP32 snapshot
GET http://localhost:8069/api/esp32/snapshot
# ESP32 status
GET http://localhost:8069/api/esp32/status# Backend health check
GET http://localhost:8069/health
# Test ESP32 connection
GET http://localhost:8069/test/esp32
# API documentation (Swagger)
GET http://localhost:8069/docs# Backend API
VITE_BACKEND_URL=http://localhost:8069
# ESP32-CAM URL (development or production)
VITE_ESP32_URL=http://localhost:5069
# Firebase (optional)
VITE_FIREBASE_API_KEY=your_key
VITE_FIREBASE_AUTH_DOMAIN=your_domain
VITE_FIREBASE_PROJECT_ID=your_project_id# ESP32 Configuration
USE_MOCK_ESP32=true # false for real hardware
MOCK_ESP32_URL=http://localhost:5069 # Development server
ESP32_URL=http://192.168.33.122:81 # Real ESP32-CAM IP
# CUDA Configuration (automatic detection)
# Set CUDA_VISIBLE_DEVICES=0 to select GPU
# Model automatically uses CUDA if available- Model: YOLOv8s Custom (parking lot trained)
- mAP50: 99.49%
- Classes: Car, Motorcycle, Person, Truck
- Input Size: 640x640
- Framework: Ultralytics YOLO
| Hardware | FPS (Detection) | Latency | VRAM Usage |
|---|---|---|---|
| NVIDIA RTX 3090 | ~100 FPS | 10ms | 2.5GB |
| NVIDIA RTX 3080 | ~80 FPS | 12ms | 2.5GB |
| NVIDIA GTX 1080 | ~50 FPS | 20ms | 2.0GB |
| CPU (16 cores) | ~8 FPS | 125ms | N/A |
- Check ESP32 server:
curl http://localhost:5069/status - Check backend:
curl http://localhost:8069/health - Test stream:
curl http://localhost:8069/stream | head -c 1000 - Restart services in order: ESP32 โ Backend โ Frontend
# Check CUDA availability
python -c "import torch; print(torch.cuda.is_available())"
# Check GPU
nvidia-smi
# Install CUDA-enabled PyTorch
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118- Verify CUDA is enabled (check backend startup logs)
- Lower confidence threshold
- Use smaller model (yolov8n.pt instead of yolov8s)
- Reduce input resolution
# Check what's using ports
lsof -i :5069 # ESP32
lsof -i :8069 # Backend
lsof -i :5169 # Frontend
# Kill process on port
kill -9 $(lsof -ti :5069)Detailed guides available in project root:
QUICK_START_OBJECT_TRACKING.md- Complete setup guidePORT_CONFIGURATION.md- Port management and troubleshootingENVIRONMENT_VARIABLES.md- Configuration referenceESP32/README.md- ESP32 integration guideESP32/HARDWARE_SETUP.md- Hardware wiring and setupESP32_REFACTOR.md- Architecture overview
- Frontend
.envvariables are PUBLIC (embedded in JS bundle) - Never put secrets in
VITE_*variables - Firebase config is safe to expose (protected by Security Rules)
- Backend environment variables are PRIVATE (server-only)
- Add
.envto.gitignore
- Flash ESP32 firmware (see
ESP32/HARDWARE_SETUP.md) - Configure production URLs:
# Frontend .env VITE_BACKEND_URL=https://api.yourserver.com VITE_ESP32_URL=http://192.168.33.122:81 # Backend export USE_MOCK_ESP32=false export ESP32_URL=http://192.168.33.122:81
- Build frontend:
npm run build - Deploy
frontend/dist/to web server - Run backend with production settings
- Fork the repository
- Create feature branch:
git checkout -b feature/YourFeature - Commit changes:
git commit -m 'Add YourFeature' - Push to branch:
git push origin feature/YourFeature - Open Pull Request
[Your License Here]
[Your Team Info]
Tech Stack: React ยท TypeScript ยท Vite ยท Python ยท FastAPI ยท YOLOv8 ยท PyTorch ยท CUDA ยท OpenCV ยท Firebase ยท ESP32-CAM