Skip to content
View jianzhnie's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report jianzhnie

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
jianzhnie/README.md

Hi there, I'm Robin 👋

jianzhnie's GitHub Streak


Welcome 👋

Hey, I'm jianzhnie. Thanks for stopping by!

I'm an AI engineer focusing on LLMs, RLHF, Reinforcement Learning, and production-grade code.

What I’m working on 🔭

Large Language Models

Code Repo About
LLMReasoning Techniques and toolkit for reasoning with LLMs.
LLMEval A modular framework to evaluate LLMs across tasks and settings.
LLMToolkit A PyTorch toolkit for NLP and LLM development.
LLamaTuner Easy and efficient finetuning pipelines for LLMs.
Open-R1 Open-source DeepSeek-R1-style and RLHF training pipeline.
awesome-instruction-datasets Curated instruction/prompt datasets for training ChatLLMs.

Reinforcement Learning

Code Repo About
Deep-RL-Toolkit Single-agent RL toolkit (DQN, Rainbow, DDPG, PPO, SAC, TD3, …).
Deep-MARL-Toolkit Multi-agent RL toolkit (VDN, QMIX, MADDPG, MAPPO, …).
RLZero MCTS for general sequential decision making (AlphaZero, MuZero, …).
ScaleRL Simple, scalable distributed RL (A3C, Ape-X, IMPALA, …).
CyberAttackSimulator RL environment for autonomous cyber attack and defense on simulated networks.

Others

  • Diffuser Toolkit for image/audio generation in PyTorch: diffusion-toolkit
  • AutoML for deep learning and tabular tasks: AutoTimm | AutoTabular
  • Trying to reduce the Learning Machine Learning (LML) loss 😂
  • Coding every day to become a better research engineer

I’m currently learning 🌱

  • RL for Reasoning and GRPO
  • LLM systems and AGI
  • Large-scale distributed RL systems

How to reach me 📫

Have an awesome day!

Pinned Loading

  1. LLamaTuner LLamaTuner Public

    Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

    Python 613 65

  2. Open-R1 Open-R1 Public

    The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1

    Python 270 52

  3. deep-marl-toolkit deep-marl-toolkit Public

    MARLToolkit: The Multi-Agent Rainforcement Learning Toolkit. Include implementation of MAPPO, MADDPG, QMIX, VDN, COMA, IPPO, QTRAN, MAT...

    Python 143 19

  4. deep-rl-toolkit deep-rl-toolkit Public

    RLToolkit is a flexible and high-efficient reinforcement learning framework. Include implementation of DQN, AC,A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....

    Python 9 2

  5. LLMToolkit LLMToolkit Public

    LLMToolkit is a toolkit for NLP(Natural Language Processing) and LLM(Large Language Models) using Pytorch.

    Python 6 2

  6. llmtech llmtech Public

    LLMTechSite, 专注于通用人工智能领域的技术生态。

    Python 11 4