Inspiration: Developed from Gabriel Peyré's tweet demonstrating optimal transport concepts.
Collaboration: Scene design was jointly reasoned through by #DeepSeek and #Google AI systems.
% Generate with:
% pdflatex Benamou-Brenier-Wasserstein.tex
\documentclass{article}
\usepackage{tikz}
\begin{document}
\begin{figure}[h]
\centering
\begin{tikzpicture}
% TikZ code for animation frames
\node at (0,0) {Frame 1: Initial Density};
\node at (4,0) {Frame 2: Intermediate Flow};
\node at (8,0) {Frame 3: Final Transport};
\end{tikzpicture}
\caption{Wasserstein geodesics visualization sequence}
\end{figure}
\end{document}
-
Clone & Setup
git clone https://github.com/HarleyCoops/DeepSeek-Manim-Animation-Generator cd DeepSeek-Manim-Animation-Generator
-
API Configuration
echo "DEEPSEEK_API_KEY=your_key_here" > .env
-
Install Dependencies
pip install -r requirements.txt
-
Launch Interface
python app.py
-
Use the Interface
- Use the chat interface to ask DeepSeek to create any Manim animation
- Copy the returned Python script to a new .py file
- Run the animation using Manim's CLI
The chat interface now shows the AI's reasoning process in real-time! As you interact with the model, you'll see:
- A gray box above each response showing the model's chain of thought
- The final response below the reasoning
- Both updating in real-time as the model thinks
This feature helps you understand how the AI arrives at its conclusions. The reasoning window shows the intermediate steps and thought process before the final answer is given.
First, compile the LaTeX scene guide:
# Navigate to the project directory
cd DeepSeek-Manim-Animation-Generator
# Compile the LaTeX file
pdflatex Benamou-Brenier-Wasserstein.tex
This will generate Benamou-Brenier-Wasserstein.pdf
, which contains the visual guide for the animation sequence.
For convenience, I've included a pre-rendered version of the scene guide: Benamou-Brenier-Wasserstein.pdf
This comprehensive guide includes:
- Detailed explanations of each animation scene
- Mathematical concepts broken down into intuitive metaphors
- Visual descriptions of the cosmic probability distributions
- Step-by-step breakdowns of the optimal transport equations
- Inspiration credit to Gabriel Peyré's tweet
After reviewing the scene guide, you can render the animation using Manim:
# For development/preview (480p with preview)
python -m manim -pql CosmicProbabilityScene.py CosmicProbabilityScene
# For final render (1080p high quality)
python -m manim -qh CosmicProbabilityScene.py CosmicProbabilityScene
# For creating a shareable GIF
python -m manim -qm --format gif CosmicProbabilityScene.py CosmicProbabilityScene
-ql
(480p, fastest, best for development)-qm
(720p, good balance)-qh
(1080p, high quality)-qk
(4K, very high quality)
-p
Preview the animation when done-f
Show the output file in file browser
The rendered animation will be saved in:
media/videos/CosmicProbabilityScene/[quality]/CosmicProbabilityScene.[format]
- Use
-pql
during development for quick previews - Use
-qh
for final renders - Add
-f
to easily locate output files - Use
--format gif
for easily shareable animations
For example:
# During development (preview QEDJourney scene from QED.py in low quality)
python -m manim -pql QED.py QEDJourney
# Final render (render QEDJourney scene from QED.py in high quality)
python -m manim -qh QED.py QEDJourney
DeepSeek R1-Zero is a custom, instruction-tuned large language model (LLM) designed for advanced reasoning and knowledge completion tasks. Although it derives conceptual inspiration from Google's T5 framework, it features substantial architectural modifications allowing for an extended context window, refined attention mechanisms, and robust performance across zero-shot and few-shot paradigms.
- Introduction
- Philosophical & Theoretical Foundations
- Model Architecture
- Installation & Quickstart
- Quantization & Memory Footprint
- Implementation Details
- Performance Benchmarks
- Potential Limitations & Future Work
- Usage Examples
- Citation
- License & Usage Restrictions
DeepSeek R1-Zero represents the culmination of multi-year research at DeepSeek AI into transfer learning, instruction tuning, and long-context neural architectures. Its central objective is to provide a single, all-purpose encoder-decoder model that can handle:
- Complex reading comprehension (up to 8,192 tokens)
- Scenario-based instruction following (e.g., "Given a set of constraints, produce a short plan.")
- Technical and coding tasks (including code generation, transformation, and debugging assistance)
Though R1-Zero is a "descendant" of T5, the modifications to attention, context management, and parameter initialization distinguish it significantly from vanilla T5 implementations.
While standard Transformer models rely on the "Attention is All You Need" paradigm (Vaswani et al., 2017), DeepSeek R1-Zero extends this by:
-
Expanded Context Window
- By employing distributed positional encodings and segment-based attention, R1-Zero tolerates sequences up to 8,192 tokens.
- The extended context window leverages blockwise local attention (in certain layers) to mitigate quadratic scaling in memory usage.
-
Instruction Tuning
- Similar to frameworks like FLAN-T5 or InstructGPT, R1-Zero was exposed to curated prompts (instructions, Q&A, conversation) to improve zero-shot and few-shot performance.
- This approach helps the model produce more stable, context-aware answers and reduces "hallucination" events.
-
Semantic Compression
- The encoder can compress textual segments into "semantic slots," enabling more efficient cross-attention in the decoder stage.
- This is theoretically grounded in Manifold Hypothesis arguments, where the textual input can be seen as lying on a lower-dimensional manifold, thus amenable to a compressed representation.
From a cognitive science perspective, R1-Zero aspires to mimic a layered approach to knowledge assimilation, balancing short-term "working memory" (sequence tokens) with long-term "knowledge representation" (model parameters).
- Parameter Count: ~6.7B
- Encoder-Decoder: Maintains T5's text-to-text approach but with specialized gating and partial reordering in cross-attention blocks.
- Context Window: 8,192 tokens (a 4× expansion over many standard T5 models).
- Layer Stacking: The modifications allow some dynamic scheduling of attention heads, facilitating better throughput in multi-GPU environments.
Aspect | Specification |
---|---|
Architecture Type | Modified T5 (custom config named deepseek_v3 ) |
Heads per Attention | 32 heads (in deeper layers) |
Layer Count | 36 encoder blocks, 36 decoder blocks |
Vocabulary Size | 32k tokens (SentencePiece-based) |
Positional Encoding | Absolute + Learned segment-based for 8k tokens |
Training Paradigm | Instruction-tuned + Additional domain tasks |
Precision | FP32, FP16, 4-bit, 8-bit quantization (via BnB) |
Below are simplified instructions for installing DeepSeek R1-Zero:
- Python >= 3.8
- PyTorch >= 2.0
- Transformers >= 4.34.0
- Accelerate >= 0.24.0
- bitsandbytes >= 0.39.0 (if using 4-bit/8-bit)
- FFmpeg (required for video rendering)
FFmpeg is required for Manim to render animations. Here's how to install it:
- Download from https://www.gyan.dev/ffmpeg/builds/
- Recommended: "ffmpeg-release-essentials.7z"
- Extract the archive
- Add the
bin
folder to your system PATH- Or install via package manager:
choco install ffmpeg
- Or install via package manager:
sudo apt-get update
sudo apt-get install ffmpeg
brew install ffmpeg
pip install --upgrade torch transformers accelerate bitsandbytes
If your environment's default PyTorch is older than 2.0, consider updating or installing from PyPI/conda channels that provide a recent version.
After installing prerequisites, you can load the model from the Hugging Face Hub. For example:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained(
"deepseek-ai/DeepSeek-R1-Zero",
trust_remote_code=True
)
model = AutoModelForSeq2SeqLM.from_pretrained(
"deepseek-ai/DeepSeek-R1-Zero",
trust_remote_code=True,
torch_dtype=torch.float16, # or torch.float32
device_map="auto" # automatically move model to GPU if available
)
Note:
trust_remote_code=True
is essential because R1-Zero uses custom code.- Download times may be substantial (potentially hours) depending on your bandwidth and how Hugging Face shards large models.
DeepSeek R1-Zero supports multi-bit quantization to optimize memory usage:
-
4-Bit Quantization
- Pros: Minimizes VRAM usage (~8GB).
- Cons: Potentially minor losses in numeric accuracy or generative quality.
-
8-Bit Quantization
- Pros: Still significantly reduces memory (~14GB VRAM).
- Cons: Slight overhead vs. 4-bit but often better fidelity.
-
Full Precision (FP32)
- Pros: The highest theoretical accuracy.
- Cons: ~28GB VRAM usage, not feasible on smaller GPUs.
Sample quantized load (4-bit) with bitsandbytes:
model_4bit = AutoModelForSeq2SeqLM.from_pretrained(
"deepseek-ai/DeepSeek-R1-Zero",
trust_remote_code=True,
device_map="auto",
load_in_4bit=True
)
- Sharded Checkpoints: The model is split into multiple shards; each shard is verified upon download. Large shards can be memory-mapped, so your system requirements also include disk I/O overhead.
- Accelerate Integration: By leveraging Accelerate, you can distribute model shards across multiple GPUs or perform CPU offloading if GPU memory is insufficient.
- Rotary & Segment Encodings: At large sequence lengths, standard absolute positions can degrade performance. R1-Zero's hybrid approach (inspired by [T5], [LongT5], and [RoFormer]) helps maintain stable gradients even at 8k tokens.
- Parallel Cross-Attention: The decoder employs a specialized parallel cross-attention mechanism in certain layers, which can reduce overhead in multi-GPU setups.
DeepSeek R1-Zero typically competes near GPT-3.5 performance in standard generative benchmarks:
-
Inference Latency
- 4-bit: ~100–200ms per token (varies by GPU)
- FP16: ~200–400ms per token
- FP32: ~400–800ms per token
-
Quality Metrics
- BLEU & ROUGE: On summarization tasks (CNN/DailyMail), R1-Zero hovers at ~1–2 points below GPT-3.5.
- Open Domain QA: On NaturalQuestions, R1-Zero closely matches strong baselines (e.g., T5-XXL) when properly instructed.
Keep in mind that your hardware setup and parallelism strategies can influence these benchmarks significantly.
Despite R1-Zero's strengths, several limitations persist:
- Token Context Limit: 8,192 tokens is high, but certain extreme use cases (e.g., full-text searching in large documents) may require bridging or chunking.
- Training Biases: While instruction-tuning reduces hallucinations, domain gaps remain. For heavily specialized or newly emerging knowledge, the model may produce uncertain or dated information.
- Interpretability: Like all Transformer-based LLMs, R1-Zero functions as a "black box." Advanced interpretability tools are still an active research area.
Future Directions:
- Integrating advanced memory systems to handle prompts beyond 8k tokens.
- Incorporating flash attention for further speed-ups.
- Investigating retrieval-augmented generation modules to reduce outdated knowledge reliance.
Below are a few quick examples to illustrate R1-Zero's capabilities:
prompt = "Write a short sci-fi story about artificial intelligence."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output_ids = model.generate(inputs["input_ids"], max_length=150)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
prompt = "Explain the concept of gradient descent as if speaking to a first-year PhD student."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output_ids = model.generate(inputs["input_ids"], max_length=200)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
Feel free to refine these prompts and tune generation parameters (num_beams
, temperature
, top_k
, etc.) to shape the style.
If you use this project in your research or work, please cite it as:
@misc{cooper2025deepseekmanim,
title={DeepSeek-Manim Animation Generator: Automated Mathematical Animations using DeepSeek API},
author={Cooper, Christian H.},
year={2025},
howpublished={\url{https://github.com/HarleyCoops/Deepseek-R1-Zero}},
note={A tool for generating Manim animations using DeepSeek's API}
}