EpiSim

Research papers in, interactive simulators out.

Drop a PDF or paste an arxiv ID. EpiSim reads the paper, extracts the complete mathematical model, generates an interactive simulator with parameter sliders, validates it against the paper's own results, and produces a downloadable standalone script. Six AI agents. One pipeline. Zero code written by the user.

Built for epidemiology. Works on any ODE paper.

Example output: SEIQR quarantine model extracted from a research paper and simulated automatically.

Problem Statement 2: Break the Barriers -- Mathematical modeling is locked behind programming expertise. EpiSim puts it in everyone's hands.

The Problem

During COVID-19, thousands of epidemic modeling papers were published. Each contained mathematical models that could inform public health decisions -- if anyone could run them. But reproducing a paper's model requires reading dense mathematics, implementing ODE systems in Python, calibrating parameters, and debugging numerical solvers. This takes a trained computational scientist days of work per paper.

The same barrier exists across science. Ecology papers describe predator-prey dynamics no one simulates. Pharmacology papers model drug interactions no clinician explores interactively. The knowledge exists in papers. The barrier is implementation.

EpiSim removes that barrier entirely. Upload the paper, get an interactive simulator and downloadable code.

What It Does

Input: A research paper (PDF upload or arxiv ID like 2003.09861)

Output: Three things, automatically:

Plain-English Summary -- Paper digest with key findings, methodology, limitations, and public health implications. Written for non-specialists.
Interactive Simulator -- Real-time dynamic curves with parameter sliders. Drag a slider, watch the curves change instantly. See peak timing, peak magnitude, and key metrics update live.
Standalone Script -- A clean, downloadable Python file (numpy/scipy/matplotlib only) that reproduces the paper's model. Runnable with python script.py, no dependencies beyond standard scientific Python.

End-to-End Pipeline

1. Paste arxiv ID: 2003.09861
2. AI reads the 30-page paper (extended thinking, ~60 seconds)
3. Extracts: 8 compartments, 16 parameters, full ODE system
4. Three agents run IN PARALLEL: Summarizer + Builder + Coder
5. Validator checks results against the paper's reported metrics
6. If any metric deviates >5%: Debugger patches and retries (up to 3x)
7. Three-tab result: Summary | Simulation | Code

Tested Papers

Paper	Model	Domain	Compartments	Status
Giordano et al. 2020	SIDARTHE	COVID-19	8	Working
Ghosh & Bhattacharya 2020	SEIQR	COVID-19	5	Working
Hridoy & Mustaquim 2024	SEIR Seasonal	Dengue	4	Working
Lotka-Volterra dynamics	Predator-Prey	Ecology	2	Working
Rachah & Torres 2017	SEIR	Ebola	4	Working

How We Push Opus 4.6

This section details how EpiSim uses capabilities exclusive to Claude Opus 4.6 -- features no other model offers.

Adaptive Thinking with Per-Agent Effort Levels

Opus 4.6 introduced adaptive thinking -- the model decides how deeply to reason based on task complexity, guided by an effort parameter (low, medium, high, max). EpiSim is the first application to use different effort levels for different agents in the same pipeline:

Agent	Effort	Why
Reader	`max`	Extracting ODEs from a 30-page paper requires deep mathematical reasoning. No shortcuts.
Summarizer	`medium`	Summarization needs clarity, not mathematical depth. Faster response.
Builder	`high`	Code generation benefits from reasoning but doesn't need max depth.
Coder	`high`	Standalone script needs good structure, not exhaustive analysis.
Debugger	`high`	Bug diagnosis needs reasoning about code + math simultaneously.

This demonstrates fine-grained reasoning control: the same model, tuned per task, running in parallel. The Reader thinks for 60 seconds at max effort while the Summarizer finishes in 15 seconds at medium effort -- same API, same model, different cognitive allocation.

Visible Thinking ("Thinking Out Loud")

The Reader Agent's extended thinking streams into the Streamlit UI in real-time. Users watch Opus 4.6 reason through the paper:

Phase detection classifies thinking into 7 stages (Reading Paper, Identifying Compartments, Extracting Parameters, Formulating ODE System, Setting Initial Conditions, Cross-referencing Results, Synthesizing Model)
A dark typewriter console shows the current reasoning with phase transitions highlighted
During parallel execution, the thinking replays with rotating excerpts so users stay engaged
After pipeline completes, full thinking is available in an expandable console on the results page

This builds trust. When the model writes "I see 8 compartments: S, I, D, A, R, T, H, E" and then produces code with exactly those compartments, users verify the reasoning chain.

1M Context Window (No RAG Needed)

The Reader Agent receives the entire paper plus a knowledge base of model formulations, parameter ranges, and ODE solver best practices -- all in a single prompt. No chunking. No retrieval. No information loss.

This matters because mathematical models are defined across multiple sections of a paper. The ODE system is in Section 3, but the parameter values are in Table 2, the initial conditions in the supplementary material, and the validation targets in Figure 5. Chunking would lose these cross-references. The 1M context window holds everything.

128K Output Tokens (One-Shot Code Generation)

The Builder Agent generates a complete Streamlit application -- model.py, solver.py, app.py, config.json, and requirements.txt -- in a single API call. No multi-turn generation, no template stitching.

This produces coherent code where the solver imports from the model, the app imports from the solver, and the config matches both. Fragmented generation produces fragmented code.

Parallel Multi-Agent Execution

After the Reader completes, three agents run simultaneously via ThreadPoolExecutor:

Reader (max) --> [ Summarizer(medium) | Builder(high) | Coder(high) ] --> Validator --> Debugger
                         PARALLEL via ThreadPoolExecutor                    SEQUENTIAL

Each agent creates its own Anthropic client, makes its own API call at its own effort level, and returns independently. The UI shows live agent status badges with effort labels, updating as each completes.

Structured Output via Tool Use

Every agent returns data through Pydantic v2 schemas enforced via tool_use. The EpidemicModel schema is the contract between all agents -- 9 fields, strictly typed, validated on both send and receive. No free-text parsing. No regex extraction. Type-safe inter-agent communication.

Architecture

Paper (PDF/arxiv)
    |
    v
Paper Loader --> Context Builder --> Reader Agent (adaptive, max effort)
                                         |
                        +----------------+----------------+
                        |                |                |
                   Summarizer       Builder           Coder
                   (medium)         (high)            (high)
                        |                |                |
                        v                v                v
                   PaperSummary    Simulator Files   Standalone Script
                                         |
                                    Validator
                                         |
                                    Debugger (if needed, high effort)
                                         |
                                    3-Tab Streamlit UI

Component	File	Opus 4.6 Feature
Reader	`agents/reader.py`	Adaptive thinking (max), 1M context, tool use
Summarizer	`agents/summarizer.py`	Adaptive thinking (medium), tool use
Builder	`agents/builder.py`	Adaptive thinking (high), 128K output, tool use
Validator	`agents/validator.py`	Pure Python -- subprocess execution, metric comparison
Debugger	`agents/debugger.py`	Adaptive thinking (high), code analysis
Coder	`agents/coder.py`	Adaptive thinking (high), tool use
Thinking Stream	`core/thinking_stream.py`	Real-time thinking display with phase classification

Iteration Journey

v1 -- Basic Pipeline (commits 4746ce6 to f0ad239) Scaffolding, schemas, paper loader, agents wired sequentially. Reader used fixed budget_tokens. No UI. CLI only. Worked for SIR but failed on complex models.

v2 -- Streamlit App + Validation Loop (commits 12c9b4a to fc4e0ba) Added the Streamlit interface, the Validator + Debugger self-healing loop, and fixed R0 formula handling for complex models (SIDARTHE has 8 compartments -- the simple beta/gamma formula doesn't apply). 76 tests covering SIR, SEIR, pipeline integration, and edge cases.

v3 -- Three-Tab UI + New Agents (commit 8a8e3c1) Added Summarizer and Coder agents. Rewrote the app with Summary | Simulation | Code tabs. Realized the demo needed more than just charts -- judges want to see AI understanding, not just AI generating.

v4 -- Dark Scientific Theme (commit 1882fd9) Complete visual redesign. Custom typography (Fraunces + Outfit + JetBrains Mono), dark blue-black palette with amber accents, glass-morphism cards, staggered animations. The UI went from "default Streamlit" to "scientific intelligence platform."

v5 -- Thinking Out Loud + Parallel Agents (commits 4800bba to 51a0ba1) The breakthrough iteration. Switched from post-hoc thinking display to real-time streaming of thinking blocks into the UI. Added adaptive thinking with per-agent effort levels. Parallelized Summarizer + Builder + Coder for ~40% speed improvement. Added the typewriter console with phase classification, replay mode during parallel execution, and persistent thinking display on results page.

v6 -- Beyond Epidemiology (commit 51a0ba1+) Discovered the pipeline is domain-agnostic. Tested on a Lotka-Volterra predator-prey ecology paper -- the Reader extracted the ODE system, the Builder generated a working simulator, no code changes needed. The architecture already reads any differential equation paper, not just epidemic ones. EpiSim started as epidemic modeling. It became a universal paper-to-simulator engine.

The key insight: showing the AI's reasoning process isn't just a demo trick -- it's a trust mechanism. When users can watch the model identify variables, extract parameters, and formulate ODEs step by step, they trust the output regardless of the domain.

Beyond Epidemiology

EpiSim's pipeline -- read paper, extract math, generate simulator, validate, self-heal -- is domain-agnostic. Any field that publishes differential equation models in academic papers works today, with zero code changes:

Domain	Tested Paper	What the Simulator Does
Epidemiology	SIDARTHE COVID-19	8-compartment epidemic curves with intervention parameter sliders
Epidemiology	SEIR Dengue	Seasonal dengue dynamics with transmission rate controls
Ecology	Predator-Prey dynamics	Lotka-Volterra population cycles with predation rate sliders

The Reader Agent doesn't know what domain it's reading. It knows it's reading differential equations. The same 1M context window that extracts a COVID-19 SIDARTHE model extracts a predator-prey Lotka-Volterra model -- same pipeline, same agents, same output format.

Opus 4.6's extended thinking at max effort reasons through any dense academic paper -- epidemiology, ecology, pharmacokinetics, climate science, neuroscience. The architecture already supports it. The knowledge base is the only epidemic-specific component, and it's optional context, not a hard dependency.

Quick Start

git clone https://github.com/wpn10/episim.git
cd episim
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
export ANTHROPIC_API_KEY=your_key_here

Web App

streamlit run app.py

Upload a PDF or paste an arxiv ID (e.g. 2003.09861) and click Generate Simulator.

CLI

python -m episim.core.orchestrator --paper 2003.09861

Demo Video

Suggested demo flow:

Paste 2003.09861 (SIDARTHE COVID-19 model) into the sidebar
Watch the thinking console stream the Reader's reasoning in real-time
See parallel agents execute with effort-level badges
Explore the Summary tab (paper digest)
Drag parameter sliders on the Simulation tab, watch curves update
Download the standalone script from the Code tab

Tests

pytest tests/ -v

76 tests across 13 files: schema validation, PDF extraction, SIR/SEIR ODE solvers, pipeline integration, mocked agent tests, edge cases. All passing.

Tech Stack

License

MIT

Built for the "Built with Opus 4.6" Claude Code Hackathon, February 2026.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.streamlit		.streamlit
docs		docs
episim		episim
tests		tests
.gitignore		.gitignore
01-hackathon-context.md		01-hackathon-context.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
SPEC.md		SPEC.md
app.py		app.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EpiSim

Research papers in, interactive simulators out.

The Problem

What It Does

End-to-End Pipeline

Tested Papers

How We Push Opus 4.6

Adaptive Thinking with Per-Agent Effort Levels

Visible Thinking ("Thinking Out Loud")

1M Context Window (No RAG Needed)

128K Output Tokens (One-Shot Code Generation)

Parallel Multi-Agent Execution

Structured Output via Tool Use

Architecture

Iteration Journey

Beyond Epidemiology

Quick Start

Web App

CLI

Demo Video

Tests

Tech Stack

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

wpn10/episim

Folders and files

Latest commit

History

Repository files navigation

EpiSim

Research papers in, interactive simulators out.

The Problem

What It Does

End-to-End Pipeline

Tested Papers

How We Push Opus 4.6

Adaptive Thinking with Per-Agent Effort Levels

Visible Thinking ("Thinking Out Loud")

1M Context Window (No RAG Needed)

128K Output Tokens (One-Shot Code Generation)

Parallel Multi-Agent Execution

Structured Output via Tool Use

Architecture

Iteration Journey

Beyond Epidemiology

Quick Start

Web App

CLI

Demo Video

Tests

Tech Stack

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages