Skip to content
View fujingling's full-sized avatar

Block or report fujingling

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 9,279 642 Updated Mar 27, 2025

[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,293 56 Updated Mar 24, 2025

Index of URLs to pdf files all over the internet and scripts

Shell 23 3 Updated May 2, 2023

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 5,045 527 Updated Mar 25, 2025

A high-performance inference system for large language models, designed for production environments.

C++ 426 34 Updated Mar 20, 2025
Python 3,617 334 Updated Feb 24, 2025

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Python 1,735 121 Updated Mar 20, 2025

Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train Dataset for table understanding and develop a generalist tab…

Python 190 7 Updated Sep 27, 2024

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 19,083 1,374 Updated Mar 3, 2025

A python wrapper for the Doc2X API and comes with native texts processing (to improve PDF recall in RAG). | Doc2X API的python封装,同时附带本地的文本处理(提升PDF在RAG中的召回率)。

Python 247 15 Updated Feb 19, 2025

LVBench: An Extreme Long Video Understanding Benchmark

Python 85 1 Updated Aug 30, 2024

Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁,一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、FastAPI 搭…

Python 3,109 481 Updated Mar 8, 2025

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 14,738 1,592 Updated Dec 25, 2024

MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)

Python 133 3 Updated Jan 24, 2025

On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)

Python 576 41 Updated Feb 14, 2025

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 5,698 433 Updated Aug 7, 2024

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery (EMNLP'24)

553 31 Updated Feb 26, 2025

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Python 6,122 494 Updated Jul 11, 2024

A collection of OCR-related datasets

156 6 Updated Sep 7, 2022

Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining

100 10 Updated Feb 25, 2022

收集并整理有关OCR的数据集并统一标注格式,以便实验需要

Python 903 195 Updated Nov 28, 2023

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Python 2,033 353 Updated Mar 24, 2025

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Jupyter Notebook 4,017 334 Updated Jan 13, 2025

High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

Python 2,284 197 Updated Sep 23, 2024

MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation

Python 2,497 181 Updated Mar 5, 2025

Large Language Model Text Generation Inference

Python 9,937 1,175 Updated Mar 26, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 42,923 6,514 Updated Mar 28, 2025

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Rust 9,538 870 Updated Mar 18, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 37,643 4,320 Updated Mar 28, 2025

Materials for the Hugging Face Diffusion Models Course

Jupyter Notebook 3,939 431 Updated Feb 12, 2025
Next
Showing results