-
Fudan Univ | BIT
- Shanghai, China
Highlights
- Pro
Lists (8)
Sort Name ascending (A-Z)
Stars
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
Public repository containing METR's DVC pipeline for eval data analysis
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
An Open-source RL System from ByteDance Seed and Tsinghua AIR
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.
MM-EUREKA: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning
Explore the Multimodal “Aha Moment” on 2B Model
Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking
Ola: Pushing the Frontiers of Omni-Modal Language Model
[Official Repo] CrossEarth: Geospatial Vision Foundation Model for Cross-Domain Generalization in Remote Sensing Semantic Segmentation
Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as DeepSeek-R1 and OpenAI o1, which are currently very popular.
Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"
Awesome RL-based LLM Reasoning
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
TrustEval: A modular and extensible toolkit for comprehensive trust evaluation of generative foundation models (GenFMs)
Code for Paper: Teaching Language Models to Critique via Reinforcement Learning
Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"
Official Repo for Open-Reasoner-Zero
A fork to add multimodal model training to open-r1