Darwin - LLM-Assisted Crypto Trading Research Platform

Darwin is a replay-first, LLM-assisted crypto trading strategy research system designed to answer questions like:

When does an LLM add value over deterministic technical analysis?
Which market regimes or playbooks benefit from LLM judgment?
Can we learn a gate/budget policy that filters LLM decisions better than rules?
How do strategy variants evolve over time, not just perform once?

Philosophy

Replay First: Evaluation comes first; live trading comes much later
Reproducible: Every run must be reproducible and explainable
Versioned: Every artifact must be versioned, schematized, and auditable
Learning After Observation: Learning happens after observing outcomes, not inline
Comparative: The system must support many runs and comparisons over time

Features

✅ Deterministic Playbooks: Breakout and Pullback strategies with precise entry/exit rules
✅ LLM Evaluation: Use Claude/GPT to evaluate trade candidates with rich market context
✅ Reinforcement Learning System: Three RL agents that progressively learn from outcomes
- Gate Agent: Pre-LLM filtering (20-40% API cost reduction)
- Portfolio Agent: Position sizing optimization (Sharpe >1.5)
- Meta-Learner Agent: LLM decision override (+10-15% Sharpe improvement)
✅ Automated Graduation: Agents start in observe mode and automatically promote when meeting criteria
✅ Comprehensive Feature Pipeline: 80+ features including price, volatility, trend, momentum, volume
✅ Advanced Exit Logic: Stop loss, take profit, time stops, and trailing stops
✅ Position Ledger: Single source of truth for all PnL and trade history
✅ Candidate Cache: Stores ALL opportunities (taken and skipped) for future learning
✅ Meta Analysis: Compare multiple runs, generate frontier plots, detect regressions
✅ Error Recovery: Rate limiting, retry logic, checkpointing for long runs
✅ Production Safety: Circuit breakers, performance monitoring, automatic fallback
✅ Comprehensive Testing: 115+ tests including RL agents, 35% RL module coverage

Target Markets

Venue: Coinbase spot markets (USD pairs)
Assets: BTC-USD, ETH-USD, SOL-USD
Primary Timeframe: 15m (configurable)

Quick Start

Installation

# 1. Clone repositories
git clone https://github.com/recallnet/darwin.git
# Optional: clone replay-lab for real market data
git clone https://github.com/recallnet/replay-lab.git

# 2. Install Darwin dependencies (requires Python 3.11+)
cd darwin
pip install -e ".[dev,ta]"

# 3. Set up Vercel AI Gateway
# Go to https://vercel.com/ai-gateway and create an API key
cp .env.example .env
# Edit .env and set:
#   AI_GATEWAY_API_KEY=your-key-here
#   MODEL_ID=google/gemini-2.0-flash  (or anthropic/claude-sonnet-4-5, deepseek/deepseek-v3.2)

# 4. (Optional) Start replay-lab for real market data
cd ../replay-lab
pnpm install
docker-compose up -d
pnpm db:migrate:sql
pnpm dev --filter replay-lab
# Set REPLAY_LAB_URL=http://localhost:3301 in darwin/.env

Run Your First Experiment

# Run a simple backtest with mock LLM
darwin run examples/basic_breakout.json --mock-llm

# Run with real LLM
darwin run examples/basic_breakout.json

# View results
darwin report runs/<run_id>

# Compare multiple runs
darwin meta-report

RL Quick Start

To use the reinforcement learning system:

# 1. Install RL dependencies
pip install -e ".[rl]"

# 2. Run experiments to collect training data (1000+ candidates needed)
darwin run examples/basic_breakout.json
darwin run examples/basic_pullback.json
darwin run examples/multi_playbook.json

# 3. Train RL agents on historical data
python -m darwin.rl.training.offline_batch \
  --agent-name gate \
  --run-ids run_001,run_002,run_003 \
  --total-timesteps 100000 \
  --output-dir artifacts/rl_models/gate

python -m darwin.rl.training.offline_batch \
  --agent-name portfolio \
  --run-ids run_001,run_002,run_003 \
  --total-timesteps 100000 \
  --output-dir artifacts/rl_models/portfolio

python -m darwin.rl.training.offline_batch \
  --agent-name meta_learner \
  --run-ids run_001,run_002,run_003 \
  --total-timesteps 100000 \
  --output-dir artifacts/rl_models/meta_learner

# 4. Deploy in observe mode (agents predict but don't affect decisions)
darwin run examples/rl_enabled_run.json

# 5. Check graduation status after validation period
python -m darwin.rl.cli.graduation_status gate \
  --db artifacts/rl_state/agent_state.sqlite

# 6. Activate agents (change "mode": "observe" → "active" in config)
#    Do this gradually: gate → portfolio → meta_learner
#    Monitor performance between each activation

# 7. Monitor and maintain
python -m darwin.rl.cli.evaluate_agent gate \
  --db artifacts/rl_state/agent_state.sqlite \
  --window-days 7

See: RL Quick Start Guide for detailed instructions

Architecture

Runner → Market Data → Feature Pipeline → Playbook Engine
   ↓                                              ↓
   │                                     ┌────────────────┐
   │                                     │  Gate Agent    │ (RL)
   │                                     │  (pre-filter)  │
   │                                     └───────┬────────┘
   │                                             ↓
   │                                       LLM Harness
   │                                             ↓
   │                                     ┌────────────────┐
   │                                     │ Meta-Learner   │ (RL)
   │                                     │ (override)     │
   │                                     └───────┬────────┘
   │                                             ↓
   │                                       Decision Parser
   │                                             ↓
   │                                     ┌────────────────┐
   │                                     │ Portfolio Agent│ (RL)
   │                                     │ (position size)│
   │                                     └───────┬────────┘
   │                                             ↓
   └─────────────────────────────────> Position Manager
                                                 ↓
                                      Position Ledger (SQLite)
                                                 ↓
                                         Evaluation → Reports
                                                 ↓
                                      Candidate Cache (SQLite)
                                                 ↓
                                           Outcome Labels
                                                 ↓
                                     ┌──────────────────────┐
                                     │  RL Training         │
                                     │  (offline, periodic) │
                                     └──────────────────────┘
                                                 ↓
                                      Model Store (versioned)
                                                 ↓
                                      Graduation & Monitoring

Key Design Principles

Global Runner: Runner code lives once, globally (not copied into run folders)
Self-Contained Runs: Every run has its own directory with all decisions auditable
Ledger is Source of Truth: All PnL, drawdown, exits come from position ledger
Candidate Cache is Learning Substrate: Every opportunity cached, labels attached later
Schemas are Law: All artifacts conform to versioned Pydantic schemas

Integrations

Vercel AI Gateway (LLM Routing)

Darwin calls Vercel AI Gateway directly from Python for unified multi-provider LLM access:

Single API Key: One AI_GATEWAY_API_KEY routes to all providers (no per-provider keys needed)
Multi-Provider: Supports Anthropic, OpenAI, Google Gemini, xAI, Mistral, Perplexity, DeepSeek, and more
Model Format: provider/model-name (e.g., anthropic/claude-sonnet-4-5, google/gemini-2.0-flash)
Swappable Models: Each run can specify a different model in its configuration
No External Services: Direct HTTP calls from Python (no Node.js middleware)

# Set in .env
AI_GATEWAY_BASE_URL=https://ai-gateway.vercel.sh/v1
AI_GATEWAY_API_KEY=your-key-from-vercel
MODEL_ID=google/gemini-2.0-flash

Supported Models: Darwin supports ALL 17+ models available through Vercel AI Gateway with 100% test pass rate:

Anthropic: Claude Sonnet/Opus/Haiku 4.5
OpenAI: GPT-4o, GPT-4o Mini, o1 (reasoning)
Google: Gemini 2.0/2.5 Flash, Gemini 2.5 Pro, Gemini 3 Pro (reasoning)
DeepSeek: DeepSeek v3.2, DeepSeek Reasoner
xAI: Grok 2 Vision, Grok 4 Fast Reasoning
Mistral: Pixtral Large, Ministral 8B
Perplexity: Sonar Pro

Recommended Models:

Production: anthropic/claude-sonnet-4-5 (best quality)
Default: google/gemini-2.0-flash (fast, cheap, good performance)
High-volume: deepseek/deepseek-v3.2 (very cheap)
Reasoning: openai/o1, google/gemini-3-pro-preview (automatic 4x token boost)

Auto-Optimization: Reasoning models automatically get 4000 max_tokens (vs 1000 for standard models)

See SUPPORTED_MODELS.md for full details and test results.

Replay-Lab (Market Data)

Darwin integrates with replay-lab for real market data:

Architecture: REST API client calls replay-lab's OHLCV endpoints
Fallbacks: Gracefully falls back to CSV files or synthetic data
Symbol Mapping: Converts BTC-USD → COINBASE_SPOT_BTC_USD
Data Quality: Validates all OHLCV data before use

# Start replay-lab locally
cd replay-lab
pnpm dev --filter replay-lab

# Set in .env
REPLAY_LAB_URL=http://localhost:3301

Data Priority:

Replay-Lab API (if REPLAY_LAB_URL set and reachable)
CSV files in data directory (if --data-dir specified)
Synthetic data (GBM-based price generation for testing)

Playbooks

Breakout Playbook

Entry: Price breaks 32-bar range with ADX ≥ 18, volume confirmation, trend alignment
Stop Loss: 1.2 × ATR
Take Profit: 2.4 × ATR (~2R)
Time Stop: 32 bars (8 hours on 15m)
Trailing: Activates at +1.0R, trails at 1.2 × ATR

Pullback Playbook

Entry: Price tags EMA20 and reclaims in uptrend (EMA50 > EMA200, ADX ≥ 16)
Stop Loss: 1.0 × ATR
Take Profit: 1.8 × ATR (~1.8R)
Time Stop: 48 bars (12 hours on 15m)
Trailing: Activates at +0.8R, trails at 1.0 × ATR

Feature Pipeline

Darwin computes 80+ features per candidate across multiple categories:

Price/Returns: Close, returns over multiple horizons, range
Volatility: ATR, realized volatility, volatility z-scores
Trend/Regime: EMAs (20/50/200), ADX, directional indicators
Momentum: RSI, MACD, Stochastic (optional)
Range/Levels: Donchian channels, Bollinger Bands, breakout/pullback distances
Volume: Turnover, ADV, volume ratios and z-scores
Microstructure: Spread estimates, slippage models
Portfolio State: Open positions, exposure, drawdown
Derivatives (optional): Funding rates, open interest

LLM Integration

Darwin sends structured JSON payloads to LLMs with:

Global market regime (BTC 4h context)
Asset-specific state (15m + 1h)
Candidate setup details (playbook-specific)
Policy constraints

LLMs return decisions with:

decision: "take" or "skip"
setup_quality: "A+", "A", "B", or "C"
confidence: 0.0 to 1.0
risk_flags: Array of concerns
notes: Brief explanation

Testing

# Run all tests
pytest

# Run specific test categories
pytest tests/unit/
pytest tests/integration/
pytest tests/property/

# Run with coverage
pytest --cov=darwin --cov-report=html

# Run property-based tests
pytest tests/property/test_simulator_invariants.py

Project Structure

darwin/
├── schemas/           # Pydantic models with validation
├── storage/           # SQLite stores with abstract interfaces
├── features/          # Incremental feature pipeline
├── playbooks/         # Breakout and Pullback implementations
├── simulator/         # Position management and exit logic
├── llm/               # LLM harness with rate limiting
├── runner/            # Global runner with error recovery
├── evaluation/        # Ledger-driven evaluation and reporting
├── rl/                # Reinforcement learning system
│   ├── agents/        #   Three RL agents (gate, portfolio, meta-learner)
│   ├── envs/          #   Gymnasium environments
│   ├── training/      #   Offline training and hyperparameters
│   ├── graduation/    #   Automated graduation policies
│   ├── integration/   #   Runner integration hooks
│   ├── storage/       #   Model store and agent state
│   ├── monitoring/    #   Alerts and safety mechanisms
│   └── utils/         #   State encoding and reward shaping
└── utils/             # Logging, validation, helpers

tools/                 # CLI entry points
tests/                 # Comprehensive test suite (115+ tests)
docs/                  # Documentation
examples/              # Example configurations

Documentation

Core System

Reinforcement Learning

RL System Architecture - Detailed RL system design
RL Quick Start Guide - Step-by-step setup and deployment
Deployment Checklist - Production deployment procedures
Example RL Config - Complete RL configuration

Development

# Install development dependencies
pip install -e ".[dev]"

# Set up pre-commit hooks
pre-commit install

# Run linters
black darwin/ tests/
ruff check darwin/ tests/
mypy darwin/

# Format code
black darwin/ tests/

Roadmap

Current (v0.2.0 - RL System):

✅ Replay-based evaluation
✅ Breakout and Pullback playbooks
✅ LLM-assisted decision-making
✅ Comprehensive testing and error recovery
✅ Three-agent RL system (gate, portfolio, meta-learner)
✅ Automated graduation policies (observe → active promotion)
✅ Production safety mechanisms (circuit breakers, monitoring, fallback)
✅ Offline training pipeline with PPO algorithm
✅ 115+ comprehensive tests including end-to-end RL workflow

Near Future (v0.3.0):

Supervised learning for LLM decision prediction
Additional playbooks (mean reversion, momentum)
Multi-timeframe analysis
Hyperparameter optimization (grid search, Bayesian)
Real-time performance dashboards

Long Term (v1.0.0+):

Paper trading mode
Live trading support (with extensive safeguards)
Multi-agent coordination strategies
Real-time risk management
Auto-scaling position sizes based on account growth

Contributing

See CONTRIBUTING.md for development guidelines.

License

MIT License - see LICENSE for details.

Citation

If you use Darwin in your research, please cite:

@software{darwin2024,
  title={Darwin: LLM-Assisted Crypto Trading Research Platform},
  author={Recall Net},
  year={2024},
  url={https://github.com/recallnet/darwin}
}

Acknowledgments

Built on nullagent-tutorial for LLM harness
Uses replay-lab for market data
Inspired by research in LLM-assisted trading and reinforcement learning

Support

📖 Documentation: docs/
🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
darwin-ui		darwin-ui
darwin		darwin
docs		docs
examples		examples
scripts		scripts
tests		tests
tools		tools
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
DATABASE_SETUP.md		DATABASE_SETUP.md
DEPLOYMENT_NEXT_STEPS.md		DEPLOYMENT_NEXT_STEPS.md
DEPLOYMENT_STATUS.md		DEPLOYMENT_STATUS.md
DOCKER.md		DOCKER.md
Dockerfile.api		Dockerfile.api
Dockerfile.worker		Dockerfile.worker
ENV_VARIABLES_REFERENCE.md		ENV_VARIABLES_REFERENCE.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
LICENSE		LICENSE
RAILWAY_SETUP.md		RAILWAY_SETUP.md
README.md		README.md
SERVICES.md		SERVICES.md
STATUS.md		STATUS.md
SUPPORTED_MODELS.md		SUPPORTED_MODELS.md
VERCEL_SETUP.md		VERCEL_SETUP.md
alembic.ini		alembic.ini
docker-compose.yml		docker-compose.yml
init_database.py		init_database.py
pyproject.toml		pyproject.toml
setup.py		setup.py
setup_database.sh		setup_database.sh
start_services.sh		start_services.sh
stop_services.sh		stop_services.sh
test_ppo_training.py		test_ppo_training.py

License

recallnet/darwin

Folders and files

Latest commit

History

Repository files navigation

Darwin - LLM-Assisted Crypto Trading Research Platform

Philosophy

Features

Target Markets

Quick Start

Installation

Run Your First Experiment

RL Quick Start

Architecture

Key Design Principles

Integrations

Vercel AI Gateway (LLM Routing)

Replay-Lab (Market Data)

Playbooks

Breakout Playbook

Pullback Playbook

Feature Pipeline

LLM Integration

Testing

Project Structure

Documentation

Core System

Reinforcement Learning

Development

Roadmap

Contributing

License

Citation

Acknowledgments

Support

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages