Hey, I'm jianzhnie. Thanks for stopping by!
I'm an AI engineer focusing on LLMs, RLHF, Reinforcement Learning, and production-grade code.
Code Repo | About |
---|---|
LLMReasoning | Techniques and toolkit for reasoning with LLMs. |
LLMEval | A modular framework to evaluate LLMs across tasks and settings. |
LLMToolkit | A PyTorch toolkit for NLP and LLM development. |
LLamaTuner | Easy and efficient finetuning pipelines for LLMs. |
Open-R1 | Open-source DeepSeek-R1-style and RLHF training pipeline. |
awesome-instruction-datasets | Curated instruction/prompt datasets for training ChatLLMs. |
Code Repo | About |
---|---|
Deep-RL-Toolkit | Single-agent RL toolkit (DQN, Rainbow, DDPG, PPO, SAC, TD3, …). |
Deep-MARL-Toolkit | Multi-agent RL toolkit (VDN, QMIX, MADDPG, MAPPO, …). |
RLZero | MCTS for general sequential decision making (AlphaZero, MuZero, …). |
ScaleRL | Simple, scalable distributed RL (A3C, Ape-X, IMPALA, …). |
CyberAttackSimulator | RL environment for autonomous cyber attack and defense on simulated networks. |
- Diffuser Toolkit for image/audio generation in PyTorch: diffusion-toolkit
- AutoML for deep learning and tabular tasks: AutoTimm | AutoTabular
- Trying to reduce the Learning Machine Learning (LML) loss 😂
- Coding every day to become a better research engineer
- RL for Reasoning and GRPO
- LLM systems and AGI
- Large-scale distributed RL systems
- Email: [email protected]
- Homepage: https://jianzhnie.github.io
- Blog: https://jianzhnie.github.io/llmtech/
- ZhiHu: https://www.zhihu.com/column/fengnie
- Hugging Face Org: https://huggingface.co/GaussianTech
- LinkedIn: https://www.linkedin.com/in/jianzheng-nie-2749b7156/
- Ask me about: statistics, machine learning, LLMs, and RL.
- ❤️ Sponsor me on GitHub
Have an awesome day!