Robin jianzhnie

Hi there, I'm Robin 👋

Hey, I'm jianzhnie. Thanks for stopping by!

I'm an AI engineer focusing on LLMs, RLHF, Reinforcement Learning, and production-grade code.

Code Repo	About
LLMReasoning	Techniques and toolkit for reasoning with LLMs.
LLMEval	A modular framework to evaluate LLMs across tasks and settings.
LLMToolkit	A PyTorch toolkit for NLP and LLM development.
LLamaTuner	Easy and efficient finetuning pipelines for LLMs.
Open-R1	Open-source DeepSeek-R1-style and RLHF training pipeline.
awesome-instruction-datasets	Curated instruction/prompt datasets for training ChatLLMs.

Code Repo	About
Deep-RL-Toolkit	Single-agent RL toolkit (DQN, Rainbow, DDPG, PPO, SAC, TD3, …).
Deep-MARL-Toolkit	Multi-agent RL toolkit (VDN, QMIX, MADDPG, MAPPO, …).
RLZero	MCTS for general sequential decision making (AlphaZero, MuZero, …).
ScaleRL	Simple, scalable distributed RL (A3C, Ape-X, IMPALA, …).
CyberAttackSimulator	RL environment for autonomous cyber attack and defense on simulated networks.

Have an awesome day!