Lists (1)
Sort Name ascending (A-Z)
Stars
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Scalable RL solution for advanced reasoning of language models
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Fully open reproduction of DeepSeek-R1
A series of technical report on Slow Thinking with LLM
An Open Large Reasoning Model for Real-World Solutions
A reading list on LLM based Synthetic Data Generation 🔥
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
LLM Agora, debating between open-source LLMs to refine the answers
A complete computer science study plan to become a software engineer.
Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"
We present the first systematic study on the scaling property of raw agents instantiated by LLMs. We find that performance scales with the increase in the number of agents, using the simple(st) way…
A compilation of the best multi-agent papers
From scratch implementation of a vision language model in pure PyTorch
An autoregressive character-level language model for making more things
ICML 2024: Improving Factuality and Reasoning in Language Models through Multiagent Debate