A comprehensive Django-based Telegram message crawler with automated GitHub URL archiving, modern web interface, and Celery task management.
- Telegram Message Crawling: Automated crawling of Telegram channels and groups
- GitHub URL Detection: Automatic detection and archiving of GitHub repositories
- Web Interface: Modern Bootstrap-based dashboard with message filtering and analytics
- Background Tasks: Celery-powered background processing with Redis broker
- Database Integration: PostgreSQL database with Django ORM
- Docker Support: Complete containerization with Docker Compose
- Monitoring: Built-in logging and task monitoring
- Admin Interface: Django admin panel for data management
├── Django Web Application
│ ├── Dashboard & Analytics
│ ├── Message List & Filtering
│ ├── Channel Management
│ └── Admin Interface
├── Celery Background Tasks
│ ├── Message Crawling
│ ├── URL Processing
│ ├── GitHub Archiving
│ └── Periodic Cleanup
├── Database (PostgreSQL)
│ ├── Telegram Channels
│ ├── Messages
│ ├── Archived URLs
│ └── Crawler Logs
└── External Services
├── Redis (Celery Broker)
├── Telegram API
- Python 3.8+
- PostgreSQL
- Redis
- Docker & Docker Compose (optional)
-
Clone the repository
git clone <repository-url> cd telecrawl
-
Create virtual environment
python -m venv venv # Windows venv\Scripts\activate # Linux/Mac source venv/bin/activate
-
Install dependencies
pip install -r requirements.txt python3 telegram_session_creater.py after enter the app id and hash create telegrm sesssion file. -
Environment configuration Copy
env/.env.exampletoenv/.envand configure:# Database DB_NAME=telecrawl DB_USER=postgres DB_PASSWORD=your_password DB_HOST=localhost DB_PORT=5432 # Redis REDIS_URL=redis://localhost:6379/0 # Telegram TELEGRAM_API_ID= DO NOT FILL USE UI TELEGRAM_API_HASH=DO NOT FILL USE UI TELEGRAM_PHONE=DO NOT FILL USE UI # Django SECRET_KEY=your_secret_key DEBUG=True ALLOWED_HOSTS=localhost,127.0.0.1
-
Database setup
python manage.py migrate python manage.py createsuperuser
-
Start services
# Terminal 1: Django python manage.py runserver # Terminal 2: Celery Worker celery -A telecrawl worker -l info # Terminal 3: Celery Beat (Scheduler) celery -A telecrawl beat -l info
-
Environment configuration Configure
env/.envas described above -
Start all services
docker-compose up -d --build
- Dashboard: http://localhost:8000/
- Admin Panel: http://localhost:8000/admin/
- Message List: http://localhost:8000/messages/
- Channel List: http://localhost:8000/channels/
-
Run Telecrawl
python manage.py run_telecrawl
-
Run Web Archiver
python manage.py run_web_archiver
-
Clean Old Messages
python manage.py clean_messages --days 30
GET /api/messages/- List messages with filteringGET /api/channels/- List monitored channelsGET /api/stats/- Dashboard statisticsPOST /api/channels/- Add new channel
- Get API credentials from https://my.telegram.org/
- Add credentials to
.envfile - Run the crawler to authenticate
Add channels through:
- Django Admin interface
- Web dashboard
- Direct database entry
Configure periodic tasks in telecrawl/celery.py:
app.conf.beat_schedule = {
'crawl-messages': {
'task': 'crawler.tasks.crawl_messages_task',
'schedule': crontab(minute='*/10'), # Every 10 minutes
},
'archive-urls': {
'task': 'crawler.tasks.archive_urls_task',
'schedule': crontab(minute=30), # Every hour at minute 30
},
}- Channel metadata and monitoring settings
- Last crawl timestamps
- Active status
- Message content and metadata
- Classification (link, text, media)
- Timestamps and indexing
- GitHub repository information
- Archive status and metadata
- Processing timestamps
- System logs and error tracking
- Performance metrics
- Debug information
- Application logs:
logs/telecrawl.log - Celery logs: Console output
- Docker logs:
docker-compose logs
- Database connectivity
- Redis connectivity
- Celery worker status
- Telegram API status
- Models: Add to
crawler/models.py - Tasks: Add to
crawler/tasks.py - Views: Add to
crawler/views.py - Templates: Add to
crawler/templates/ - Tests: Add to
crawler/tests.py
python manage.py testpython manage.py makemigrations
python manage.py migrate- Set
DEBUG=False - Configure proper
SECRET_KEY - Set up SSL/HTTPS
- Configure static file serving
- Use PostgreSQL with connection pooling
- Use Redis with persistence
- Use Gunicorn with multiple workers
- Set up reverse proxy (Nginx)
- Set up logging aggregation
- Configure health checks
- Monitor Celery queues
- Track database performance
-
Database Connection Error
- Check PostgreSQL service
- Verify connection credentials
- Check network connectivity
-
Celery Tasks Not Running
- Check Redis connectivity
- Verify Celery worker is running
- Check task queue status
-
Telegram API Errors
- Verify API credentials
- Check account status
- Monitor rate limits
-
Memory Issues
- Monitor message processing batch sizes
- Check database query optimization
- Review log file rotation
# Check database status
python manage.py dbshell
# Monitor Celery
celery -A telecrawl inspect stats
# Check migrations
python manage.py showmigrations
# Clear cache
python manage.py shell -c "from django.core.cache import cache; cache.clear()"This project is licensed under the MIT License.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
For issues and questions:
- Create an issue on GitHub
- Check the troubleshooting section
- Review the logs for error details