CLI tool for Twitter/X scraping and semantic search. Scrape tweets, generate embeddings, ask questions, discover users.
bun install
cp .env.example .env # Add OPENAI_KEY, AUTH_TOKEN, CT0
bun run src/cli.ts db --init
bun dev # Start web UI at localhost:3002# Development
bun dev # Start web UI at localhost:3002
bun cli # Run CLI directly
# Core Commands
xgpt interactive # Guided setup
xgpt scrape <username> # Scrape tweets from user
xgpt search "terms" # Search tweets by topic/phrase
xgpt users discover "query" # Find Twitter profiles by bio/name
xgpt embed # Generate embeddings
xgpt ask "question" # Semantic search + GPT answer
xgpt read <tweet> # Fetch a single tweet by ID or URL
xgpt thread <tweet> # Fetch the author thread for a tweet
xgpt replies <tweet> # Fetch replies to a tweet
xgpt user-tweets <username> # Fetch a user timeline
xgpt mentions --user <name> # Fetch tweets mentioning a user
xgpt serve # Start web UI
xgpt db --stats # Database stats
xgpt config list # Show configStart a browser-based interface with all CLI functionality:
bun dev # http://localhost:3002
xgpt serve --port 8080 # Custom portFeatures:
- Dashboard - Stats overview, quick actions
- Scrape - Scrape tweets from any user
- Search - Topic-based search with filters
- Discover - Find Twitter profiles by bio/keywords
- Ask - AI Q&A with relevant tweets
- Config - Edit settings inline
- Job Taskbar - Real-time progress for long operations
Find tweets by topic using Twitter's search API:
# Find AI startup discussions from last 7 days
xgpt search "building in public, indie hacker, shipped" --days 7
# Track trending tech topics
xgpt search "AGI, GPT-5, foundation models" --name "AI Trends" --max 1000
# Preview query without executing
xgpt search "rust lang, rustacean" --dry-run
# Search and auto-embed for semantic queries
xgpt search "YC demo day, fundraising" --mode top --embed
# Resume interrupted search
xgpt search --resume 42Search and scrape operations count against your account's rate limits. Excessive usage may trigger Twitter's anti-bot detection.
Best Practices:
- Start with
--max 100to test queries - Use
--dry-runto preview before executing - Avoid running multiple concurrent searches
- Space out large searches (1000+ tweets) by several hours
Rate Limit Handling:
- Searches automatically wait and retry when rate limited
- Use
--resume <session-id>if you need to restart - Wait at least 15 minutes before retrying manually
Find Twitter profiles by bio, name, or keywords:
# Find Google engineers
xgpt users discover "google engineer" --max 20 --save
# Find AI researchers
xgpt users discover "AI researcher" --max 50
# Output as JSON
xgpt users discover "indie hacker" --json
# Script-friendly output (stable JSON envelope)
xgpt search "AGI, GPT-5" --scriptDiscovered profiles can be saved to the database with --save, storing bio, location, follower counts, and verification status.
- Scrape tweets from Twitter/X using session cookies
- Generate vector embeddings via OpenAI
- Query with natural language - finds relevant tweets via cosine similarity, generates answer with GPT
- Architecture - Project structure, data flow, dependencies
- Database - Schema, migrations, optimization
- Server - Web server architecture, routes, templates
- Error Handling - Error categories, recovery suggestions, API errors
- Job Tracking - Job lifecycle, cancellation, SSE updates
- Commands - Command runner pattern, execution flow
- Validation - Input validation with TypeBox
- API Reference - REST API endpoints
- Configuration - All config options and defaults
- Components - UI component library
- Utilities - Retry logic, formatting, helpers
- Testing - Unit, integration, and E2E testing
- Bun runtime
- SQLite + Drizzle ORM
- OpenAI API (embeddings + chat)
- @the-convocation/twitter-scraper v0.21.0
- Commander.js (CLI)
- Elysia + HTMX (Web UI)
bun dev # Start web UI (localhost:3002)
bun cli # Run CLI
bun test # Run tests
bun run typecheck # Type checkMIT