vibecode arena

Competitive multiplayer coding game where players pick an AI model and race to build UI components. Prompt your AI, watch your code render live, and outscore your friends. This project was created over the weekend to play with E2B's sandboxing capabilities.

Note: This is still a work in progress - not where I want it to be yet! Check out Planned Features or open an issue with ideas.

Why?

I've always enjoyed playing casual multiplayer games like skribbl.io with friends to kill some time. Now that everyone's vibecoding, I thought - why not make a game out of it? Instead of drawing, you prompt. Instead of guessing, you watch code render in real-time. Same energy, new skills.

Why UI components? If the challenge was "implement quicksort", you'd just paste that into the AI and it's done - the algorithm is already explained in words. But with UI, you see a visual reference. You have to describe colors, spacing, animations, interactions - that's where prompting skill actually matters.

How it works

Create a room — Get a 6-character code to share with friends (or play solo to practice)
Pick your AI — Choose from Claude, GPT, Gemini, Llama, and more (each with different score multipliers - tougher models yield higher points)
Compete in rounds — See a reference UI component and prompt your AI to recreate it
Watch it render — Your code runs live in a sandboxed environment
Get scored — Points for accuracy, speed, and prompt efficiency

Tech Stack

Layer	Technology
Frontend	SvelteKit + Svelte 5
Styling	Tailwind CSS v4
Real-time	Cloudflare Durable Objects + WebSocket
AI	Vercel AI SDK with OpenRouter
Sandboxes	E2B for isolated code execution
Validation	Valibot

Why this stack?

SvelteKit — Single repo for frontend + API routes = fast prototyping. Svelte 5's runes make reactive state dead simple.
Cloudflare — Durable Objects give you stateful WebSockets at the edge without managing servers. Perfect for real-time multiplayer.
OpenRouter — One API to access all the models (Claude, GPT, Gemini, Llama). Players can pick their favorite.
E2B — Spin up sandboxes in seconds, run untrusted code safely, get a preview URL. Exactly what this game needs.

Project Structure

frontend/
├── src/
│   ├── routes/              # SvelteKit pages
│   │   ├── +page.svelte     # Home
│   │   ├── create/          # Create room flow
│   │   ├── join/            # Join room flow
│   │   ├── [code]/          # Game room (dynamic route)
│   │   └── api/             # API endpoints
│   ├── lib/
│   │   ├── components/      # Svelte components
│   │   │   ├── game/        # Game UI (Lobby, GameHeader, etc.)
│   │   │   └── challenges/  # Challenge display components
│   │   ├── hooks/           # Svelte 5 runes (useGame, useChat, etc.)
│   │   ├── config/          # Game settings, models, challenges
│   │   ├── game/            # Game logic (scoring)
│   │   ├── utils/           # Utility functions
│   │   ├── validation/      # Valibot schemas
│   │   ├── types/           # TypeScript types
│   │   └── server/          # Server-side logic
│   │       ├── ai/          # AI chat and prompts
│   │       │   ├── agents/  # Judge agents (CodeAnalyzer, VisualMatcher, etc.)
│   │       │   └── tools/   # AI tools (hints)
│   │       ├── e2b/         # E2B sandbox management
│   │       └── do-client.ts # Durable Object RPC client
│   └── app.html
├── worker/
│   └── src/
│       ├── index.ts         # Worker entry point
│       └── GameRoom.ts      # Durable Object (game state)
├── tests/
│   ├── unit/                # Unit tests (Vitest)
│   ├── integration/         # Integration tests
│   └── e2e/                 # E2E tests (Playwright)
├── sandbox/                 # E2B sandbox template files
├── wrangler.toml            # Cloudflare config
└── package.json

Development

Prerequisites

Setup

# Install dependencies
bun install

# Set up environment variables
cp .env.example .env
# Edit .env with your API keys

# Run both frontend and worker
bun run dev:all

This starts:

SvelteKit dev server on http://localhost:5173
Wrangler dev server on http://localhost:8788

Scripts

Command	Description
`bun run dev`	Start SvelteKit dev server
`bun run dev:worker`	Start Wrangler dev server
`bun run dev:all`	Start both in parallel
`bun run build`	Build for production
`bun run check`	TypeScript + Svelte checks
`bun run lint`	ESLint
`bun run format`	Prettier

Pre-commit Hook

The project uses husky + lint-staged to validate code before commits:

Type check — Runs bun run check
Lint & fix — ESLint with --fix on staged .ts, .js, .svelte files
Format — Prettier on all staged files

Commits are blocked if type errors or unfixable lint errors exist.

Testing

Recommended Commands for Development

Command	Description
`bun run test`	Run unit/integration tests (Vitest)
`bun run test:e2e:ui`	Playwright UI for non-sandbox tests only
`bun run test:e2e:ui:sandbox`	Playwright UI for sandbox tests only

Other Test Commands

Command	Description
`bun run test:watch`	Run tests in watch mode
`bun run test:coverage`	Run tests with coverage report
`bun run test:e2e:quick`	Run E2E tests excluding `@sandbox` tests (no E2B)
`bun run test:e2e:sandbox`	Run only `@sandbox` tests (shared sandbox, sequential)

Test Categories:

Quick tests (CI) (test:e2e:quick) — Lobby, forms, errors, navigation. No E2B API needed, runs fast.
Sandbox tests (CI) (test:e2e:sandbox) — Full game flow with real E2B sandboxes. Tests share ONE sandbox via worker-scoped fixture to avoid rate limits.

Tests tagged with @sandbox require E2B_API_KEY and spin up real sandboxes.

CI/CD

The project uses GitHub Actions with two workflows:

CI (.github/workflows/ci.yml) — Runs on every push and PR:

Lint & type check
Unit & integration tests
Build verification
E2E tests:
- On push: Quick tests only (test:e2e:quick, no sandbox)
- On PR: Full tests including sandbox tests to verify full functionality

Deploy (.github/workflows/deploy.yml) — Runs after CI passes on main:

Worker deploys only if worker/ or wrangler.toml changed
App always deploys

Environment Variables

# Required
OPENROUTER_API_KEY=sk-or-...
E2B_API_KEY=e2b_...

# Optional
PUBLIC_DO_URL=http://localhost:8788  # Durable Object URL (default for dev)

Deployment

Everything runs on Cloudflare:

# Deploy the Durable Object worker (api.vibecodearena.dev)
bun run deploy:worker

# Deploy the SvelteKit app (vibecodearena.dev)
bun run deploy:app

Environment Secrets (set via Wrangler):

# For the Pages app
bun x wrangler pages secret put E2B_API_KEY --project-name vibecode-arena
bun x wrangler pages secret put OPENROUTER_API_KEY --project-name vibecode-arena

Architecture

                              WebSocket (game events)
┌──────────────┐            ┌─────────────────────────┐
│    Browser   │◄──────────►│  Cloudflare Worker (DO) │
└──────┬───────┘            │  api.vibecodearena.dev  │
       │                    │  - Game state           │
       │ HTTP               │  - Room management      │
       ▼                    └───────────▲─────────────┘
┌──────────────┐                        │
│   SvelteKit  │────────────────────────┘ HTTP (RPC)
│  (Cloudflare │
│    Pages)    │───────────► OpenRouter (AI chat)
│              │───────────► E2B (sandboxes)
└──────────────┘

Durable Object maintains game state and broadcasts events via WebSocket
SvelteKit on Cloudflare Pages serves UI, proxies AI chat, manages sandboxes, and calls DO for game actions
E2B runs player code in isolated sandboxes with live preview
OpenRouter routes to Claude, GPT, Gemini, Llama, etc.

Note: Currently using one E2B sandbox per room (shared by all players). Ideally, each player would have their own sandbox for better isolation. My E2B plan allows only 20 concurrent sandboxes, limiting the app to ~20 simultaneous rooms (or fewer if using per-player sandboxes).

Planned Features

Truly Agentic Judges

Currently, the judge "agents" (CodeAnalyzer, VisualMatcher, InteractionTester) are single-shot LLM evaluators. The plan is to make them genuinely agentic:

Tool use — Agents can interact with sandboxes, take screenshots, simulate user interactions, MCPs?
Observation loops — "I'm not confident about the hover state, let me check" → takes screenshot → adjusts score
Multi-step reasoning — Break down evaluation into steps, verify assumptions
Cross-agent communication — VisualMatcher can ask InteractionTester to verify a behavior

Advanced E2B Features

See which more advanced E2B features provide - Further research E2B docs and try to use as many features as possible :D
Different sandbox types — Use React/Vue/Angular sandboxes for specific challenges

Game Modes

Shared LLM — Everyone uses the same model, pure prompting skill competition
Configurable rounds — Set number of rounds (3, 5, 10) or play until time runs out
Time limits — Per-challenge time (30s, 60s, 120s) or total game time
Difficulty levels — Controls how strict the AI judge is and complexity of challenges
No preview/code mode — Disable live rendering and code output for hardcore mode

AI-Generated Challenges

Dynamic challenge generation — LLM creates new UI challenges on the fly
Difficulty scaling — Generates easier/harder challenges based on player performance
Themed rounds — "Retro UI", "Glassmorphism", "Brutalist" themed challenge sets

Bigger Challenges

Full apps — Go beyond components. Build entire landing pages, dashboards, or mini-apps
Multi-file projects — Challenges that require multiple components working together
Longer time limits — 5-10 minute rounds for complex builds

Test Coverage

Server code tests — Unit tests for AI agents, E2B sandbox management, DO client
Hook tests — Svelte component tests for reactive hooks (useGame, useChat, etc.)
Integration tests — Test full game flow with mocked external services
E2E tests — Round review, leaderboard, AI chat interactions, multi-round games

Any further ideas?

Open to suggestions! Feel free to open issues or PRs with ideas.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
.husky		.husky
.vscode		.vscode
sandbox		sandbox
src		src
static		static
tests		tests
worker		worker
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
bun.lock		bun.lock
eslint.config.js		eslint.config.js
package.json		package.json
playwright.config.ts		playwright.config.ts
svelte.config.js		svelte.config.js
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts
wrangler.toml		wrangler.toml
wrangler.worker.toml		wrangler.worker.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

vibecode arena

Why?

How it works

Tech Stack

Project Structure

Development

Prerequisites

Setup

Scripts

Pre-commit Hook

Testing

Recommended Commands for Development

Other Test Commands

CI/CD

Environment Variables

Deployment

Architecture

Planned Features

Truly Agentic Judges

Advanced E2B Features

Game Modes

AI-Generated Challenges

Bigger Challenges

Test Coverage

Any further ideas?

License

About

Uh oh!

Uh oh!

Languages

License

bxxf/vibecode-arena

Folders and files

Latest commit

History

Repository files navigation

vibecode arena

Why?

How it works

Tech Stack

Project Structure

Development

Prerequisites

Setup

Scripts

Pre-commit Hook

Testing

Recommended Commands for Development

Other Test Commands

CI/CD

Environment Variables

Deployment

Architecture

Planned Features

Truly Agentic Judges

Advanced E2B Features

Game Modes

AI-Generated Challenges

Bigger Challenges

Test Coverage

Any further ideas?

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages