Highlights
- Pro
Stars
A Unified Tokenizer for Visual Generation and Understanding
PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437
[CVPR 2025] DEIM: DETR with Improved Matching for Fast Convergence
Official implementation of the WACV 2025 ( Oral ) paper. RT-DETRv3: Real-time End-to-End Object Detection with Hierarchical Dense Positive Supervision.
Code and models for the paper "The effectiveness of MAE pre-pretraining for billion-scale pretraining" https://arxiv.org/abs/2303.13496
Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed
EVE Series: Encoder-Free Vision-Language Models from BAAI
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
[CVPR 2025] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
[NeurIPS 2023] HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
Pytorch Implementation of "SMITE: Segment Me In TimE" (ICLR 2025)
Code of AAAI2025 Paper 《VIoTGPT: Learning to Schedule Vision Tools in LLMs towards Intelligent Video Internet of Things》
An official implementation of "Hulk: A Universal Knowledge Translator for Human-Centric Tasks"
DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding
High-resolution models for human tasks.
Free, simple, and intuitive online database diagram editor and SQL generator.
[COLM 2024] OpenAgents: An Open Platform for Language Agents in the Wild
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
A programming language exclusively designed for cybersecurity
RTMPose series (RTMPose, DWPose, RTMO, RTMW) without mmcv, mmpose, mmdet etc.
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation