Releases: hiyouga/LLaMA-Factory
Releases · hiyouga/LLaMA-Factory
v0.9.3: Llama4, Gemma3, Qwen3, InternVL3, Qwen2.5-Omni
We will attend the AWS Summit Shanghai 2025 on June 20th! See you in Shanghai 👋
New features
- 🔥 InternVL2.5/InternVL3 model by @Kuangdd01 in #7258
- 🔥 Qwen2.5-Omni model by @Kuangdd01 in #7537
- 🔥 Llama 4 and Gemma 3 multimodal model by @hiyouga in #7273 and #7611
- 🔥 Official GPU docker image by @yzoaim in #8181
- 🔥 SGLang inference by @Qiaolin-Yu and @jhinpan in #7278
- GLM-4-0414 and GLM-Z1 model by @zRzRzRzRzRzRzR in #7695
- Kimi-VL model by @Kuangdd01 in #7719
- Qwen3 model by @hiyouga in #7885
- MiMo and MiMo-VL model by @Kuangdd01 in #7946 #8249
- SmolLM/SmolLM2 model by @akshatsehgal in #8050 #8220
- MiniCPM4 model by @LDLINGLINGLING in #8314
- Mistral-Small-3.1 model by @Kuangdd01 in #8335
- Add
scripts/eval_bleu_rouge.py
by @SnowFox4004 in #7419 - Add Muon optimizer by @tianshijing in #7749
- Support video/audio inference with vLLM by @hiyouga in #7566
- Support S3/GCS cloud data by @erictang000 in #7567
- Support vLLM-ascend by @leo-pony in #7739
- Support OmegaConf by @hiyouga in #7793
- Support early-stopping by @hiyouga in #7797
- Add
enable_thinking
argument for reasoning models by @hiyouga in #7928 - PyTorch-elastic and fault-tolerant launch by @hubutui in #8286
- Length Desensitization DPO (LD-DPO) by @amangup in #8362
New models
- Base models
- SmolLM/SmolLM2 (135M/360M/1.7B) 📄
- Qwen3 Base (0.6B/1.7B/4B/8B/14B/30B) 📄
- Gemma 3 (1B/4B/12B/27B) 📄🖼️
- MedGemma (4B) 📄🩺
- MiMo Base (7B) 📄
- Seed-Coder Base (8B) 📄⌨️
- Mistral-Small-3.1 Base (24B) 📄🖼️
- GLM-4-0414 Base (32B) 📄
- Llama 4 (109B/492B) 📄🖼️
- Instruct/Chat models
- SmolLM/SmolLM2 Instruct (135M/360M/1.7B) 📄🤖
- MiniCPM4 (0.5B/8B) 📄🤖
- Qwen3 (0.6B/1.7B/4B/8B/14B/32B/30B/235B) 📄🤖🧠
- Gemma 3 Instruct (1B/4B/12B/27B) 📄🤖🖼️
- InternVL2.5/3 Instruct/MPO (1B/2B/8B/14B/38B/78B) 📄🤖🖼️
- Qwen2.5-Omni (3B/7B) 📄🤖🖼️🔈
- MedGemma Instruct (4B/27B) 📄🤖🩺
- MiMo SFT/RL (7B) 📄🤖
- MiMo-VL SFT/RL (7B) 📄🤖🖼️
- Hunyuan Instruct (7B) 📄🤖
- Seed-Coder Instruct/Reasoning (8B) 📄🤖🧠⌨️
- GLM-4-0414/GLM-Z1 Instruct (9B/32B) 📄🤖🧠
- DeepSeek-R1-0528 (8B/671B) 📄🤖🧠
- Kimi-VL Instruct/Thinking (17B) 📄🤖🧠🖼️
- Mistral-Small-3.1 Instruct (24B) 📄🤖🖼️
- Qwen2.5-VL Instruct (32B) 📄🤖🖼️
- Llama 4 Instruct (109B/492B) 📄🤖🖼️
New datasets
- Preference datasets
- COIG-P (zh) 📄
Bug fix
- Fix add new tokens by @flashJd in #7253
- Fix ultrachat_200k dataset by @felladrin in #7259
- Add efficient 4D attention mask for neat packing by @BlackWingedKing in #7272
- Fix WSD lr scheduler by @x22x22 in #7304
- Fix position ids in neat packing by @BlackWingedKing in #7318
- Fix proxy setting in webui by @taoharry in #7332
- Improve entrypoint by @ENg-122 in #7345
- Fix ray destroy process group by @erictang000 in #7395
- Fix SGLang dependencies by @guoquan in #7432
- Upgrade docker package version by @rumichi2210 in #7442
- Update liger kernel for qwen2.5-vl by @xiaosu-zhu in #7453
- Fix lora on quant models by @GuoCoder in #7456
- Enable liger kernel for gemma3 by @kennylam777 in #7462
- Enable liger kernel for paligemma by @eljandoubi in #7466
- Add Swanlab lark notification by @Xu-pixel in #7481
- Fix gemma3 use cache attribute by @ysjprojects in #7500
- Fix pixtral plugin by @Kuangdd01 in #7505
- Fix KTO mismatch pair strategy by @himalalps in #7509
- Support
dataset_shards
by @aliencaocao in #7530 - Fix qwen2.5omni plugin by @Kuangdd01 in #7573 #7578 #7883
- Fix ppo trainer by @gechengze in #7576
- Fix workflow by @Shawn-Tao in #7635
- Support qwen2.5omni audio+video2text by @Kuangdd01 in #7638
- Upgrade deps for SGLang by @adarshxs in #7639
- Allow ray env setting by @erictang000 in #7647
- Fix CUDA warning on intel xpus by @jilongW in #7655
- Fix liger kernel patch by @danny980521 in #7660
- Fix rocm dockerfile by @fluidnumerics-joe in #7725
- Fix qwen2vl with neat packing by @GeoffreyChen777 in #7754
- Fix a constant by @AlphaBladez in #7765
- Fix autogptq for Gemma by @ddddng in #7786
- Fix internvl models by @Kuangdd01 in #7801 #7803 #7817 #8129
- Fix DeepSpeed ZeRO3 on moe models by @hiyouga in #7826 #7879
- Fix gradient checkpoint func for vit by @hiyouga in #7830
- Support S3 ray storage by @erictang000 in #7854
- Fix Kimi-VL attention by @Kuangdd01 in #7867
- Fix minicpm-o vllm inference by @hiyouga in #7870
- Unfreeze muiltimodal projector in freeze training by @zhaop-l in #7872
- Fix Qwen2.5-omni plugin by @hiyouga in #7875 #7962
- Add warp support link by @ericdachen in #7887
- Replace eos token for base model by @hiyouga in #7911
- Add
eval_on_each_dataset
arg by @hiyouga in #7912 - Fix qwen3 loss by @hiyouga in #7923 #8109
- Add repetition_penalty to api by @wangzhanxd in #7958
- Add graphgen to readme by @tpoisonooo in #7974
- Support video params in vllm batch infer by @Kuangdd01 in #7992
- Fix tool formatter by @yunhao-tech in #8000
- Fix kimi vl plugin by @hiyouga in #8015
- Support batch preprocess in vllm batch infer by @Shawn-Tao in #8051
- Support loading remote folder by @erictang000 in #8078
- Fix video utils import by @Kuangdd01 in #8077
- Fix SGLang LoRA inference by @Kiko-RWan in #8067
- Fix cli by @Wangbiao2 in #8095
- Fix pretrain workflow by @SunnyHaze in #8099
- Fix rope args for yarn by @piamo in #8101
- Add no build isolation in installing by @hiyouga in #8103
- Switch to GPTQModel and deprecate AutoGPTQ by @hiyouga in #8108
- Support llama3 parallel function call by @hiyouga in #8124
- Add
data_shared_file_system
by @hiyouga in #8179 - Fix load remote files by @youngwookim in #8183
- Fix dataset info by @Muqi1029 in #8197
- Fix qwen2.5 omni merge script by @Kuangdd01 in #8227 #8293
- Add unittest for VLM save load by @Kuangdd01 in #8248
- Add tag in swanlab by @Zeyi-Lin in #8258
- Support input video frames by @Kuangdd01 in #8264
- Fix empty template by @hiyouga in #8312
- Support full-finetuning with unsloth by @Remorax in #8325
- Add awesome work by @MING-ZCH in #8333
- Release v0.9.3 by @hiyouga in #8386
- Fix qwen2vl position ids by @hiyouga in #8387
- Fix vlm utils by @hiyouga in #8388
- Fix #3802 #4443 #5548 #6236 #6322 #6432 #6708 #6739 #6881 #6919 #7080 #7105 #7119 #7225 #7267 #7327 #7389 #7416 #7427 #7428 #7443 #7447 #7454 #7490 #7501 #7502 #7513 #7520 #7541 #7545 #7552 #7563 #7598 #7600 #7613 #7636 #7678 #7680 #7687 #7688 #7730 #7743 #7772 #7791 #7800 #7816 #7829 #7845 #7865 #7874 #7889 #7905 #7906 #7907 #7909 #7916 #7918 #7919 #7939 #7953 #7965 #7990 #8008 #8056 #8061 #8066 #8069 #8087 #8091 #8092 #8096 #8097 #8111 #8119 #8147 #8166 #8169 #8174 #8182 #8189 #8223 #8241 #8247 #8253 #8294 #8309 #8324 #8326 #8332
Full Changelog: v0.9.2...v0.9.3
v0.9.2: MiniCPM-o, SwanLab, APOLLO
We will attend the vLLM Beijing Meetup on Mar 16th! See you in Beijing 👋
New features
- 🔥 APOLLO optimizer by @zhuhanqing in #6617
- 🔥 SwanLab experiment tracker by @Zeyi-Lin in #6401
- 🔥 Ray Trainer by @erictang000 in #6542
- Batch inference with vLLM TP by @JieShenAI in #6190
- QLoRA on Ascend NPU by @codemayq in #6601
- Yarn and Llama3 rope scaling by @hiyouga in #6693
- Support
uv run
by @erictang000 in #6907 - Ollama modelfile auto-generation by @codemayq in #4686
- Mistral tool prompt by @AlongWY in #5473
- Llama3 and Qwen2 tool prompt by @hiyouga in #6367 and #6369
New models
- Base models
- GPT2 (0.1B/0.4B/0.8B/1.5B) 📄
- Granite 3.0-3.1 (1B/2B/3B/8B) 📄
- PaliGemma2 (3B/10B/28B) 📄🖼️
- Moonlight (16B) 📄
- DeepSeek V2-V2.5 Base (236B) 📄
- DeepSeek V3 Base (671B) 📄
- Instruct/Chat models
- Granite 3.0-3.1 (1B/2B/3B/8B) by @Tuyohai in #5922 📄🤖
- DeepSeek R1 (1.5B/7B/8B/14B/32B/70B/671B) by @Qwtdgh in #6767 📄🤖
- TeleChat2 (3B/7B/12B/35B/115B) @ge-xing in #6313 📄🤖
- Qwen2.5-VL (3B/7B/72B) by @hiyouga in #6779 📄🤖🖼️
- PaliGemma2-mix (3B/10B/28B) by @Kuangdd01 in #7060 📄🤖🖼️
- Qwen2 Audio (7B) by @BUAADreamer in #6701 📄🤖🔈
- MiniCPM-V/MiniCPM-o (8B) by @BUAADreamer in #6598 and #6631 📄🤖🖼️🔈
- InternLM3-Instruct (8B) by @hhaAndroid in #6640 📄🤖
- Marco-o1 (8B) 📄🤖
- Skywork-o1 (8B) 📄🤖
- Phi-4 (14B) 📄🤖
- Moonlight Instruct (16B) 📄
- Mistral Small (24B) 📄🤖
- QwQ (32B) 📄🤖
- Llama-3.3-Instruct (70B) 📄🤖
- QvQ (72B) 📄🤖🖼️
- DeepSeek V2-V2.5 (236B) 📄🤖
- DeepSeek V3 (671B) 📄🤖
New datasets
- Supervised fine-tuning datasets
- OpenO1 (en) 📄
- Open Thoughts (en) 📄
- Open-R1-Math (en) 📄
- Chinese-DeepSeek-R1-Distill (zh) 📄
Changes
- Refactor VLMs register by @hiyouga in #6600
- Refactor mm plugin by @hiyouga in #6895
- Refactor template by @hiyouga in #6896
- Refactor data pipeline by @hiyouga in #6901
- Update vlm arguments by @hiyouga in #6976
- We have cleaned large files in git history using BFG Repo-Cleaner, find the backup repo here
Bug fix
- Add
trust_remote_code
option by @yafshar in #5819 - Fix mllama config by @hiyouga in #6137 and #6140
- Fix mllama pad by @hiyouga in #6151 and #6874
- Pin tokenizers version by @hiyouga in #6157
- Fix tokenized data loading by @village-way in #6160
- Show hostname in webui by @hykilpikonna in #6170
- Fix VLMs zero3 training by @hiyouga in #6233
- Add
skip_special_tokens
by @hiyouga in #6363 - Support non-reenterent-gc by @hiyouga in #6364
- Add
disable_shuffling
option by @hiyouga in #6388 - Fix gen kwargs by @hiyouga in #6395
- Enable module run by @youkaichao in #6457
- Fix eval loss value by @hiyouga in #6465
- Fix paligemma inference by @hiyouga in #6483
- Add deepseek v3 template by @piamo in #5507
- Add http proxy argument in dockerfile by @shibingli in #6462
- Fix trainer generate by @hiyouga in #6512
- Fix pixtral DPO training by @hiyouga in #6547
- Fix ray args by @stephen-nju in #6564
- Fix minicpm template by @BUAADreamer in #6620
- Fix stop tokens for visual detection by @hiyouga in #6624
- Pin vllm version by @hiyouga in #6629
- Fix mllama any image by @hiyouga in #6637 and #7053
- Fix tokenizer max length by @xiaosu-zhu in #6632
- Fix webui locale by @steveepreston in #6653
- Fix MiniCPM-o DPO training by @BUAADreamer in #6657
- Fix Qwen2 MoE training by @hiyouga in #6684
- Upgrade to gradio 5 by @hiyouga in #6688
- Support Japanese local file by @engchina in #6698
- Fix DPO loss by @yinpu in #6722
- Webui thinking mode by @hiyouga in #6778
- Upgrade to transformers 4.48 by @hiyouga in #6628
- Fix ci by @hiyouga in #6787
- Fix instructions about installing fa2 on win platform in readme by @neavo in #6788
- Fix minicpmv plugin by @BUAADreamer in #6801, #6890, #6946 and #6998
- Fix qwen2 tool prompt by @yueqis in #6796
- Fix llama pro by @hiyouga in #6814
- Allow thought in function call by @yueqis in #6797
- Add
ALLOW_EXTRA_ARGS
by @hiyouga in #6831 - Fix Qwen2vl plugin by @hiyouga in #6855
- Upgrade vllm to 0.7.2 by @hiyouga in #6857
- Fix unit test for tool using by @hiyouga in #6865
- Skip broken data in sharegpt converter by @JJJYmmm in #6879
- Fix qwen2.5 plugin for video by @JJJYmmm in #6868
- Parsing chat template from tokenizer by @hiyouga in #6905 (experimental)
- Fix mllama KTO training by @marko1616 in #6904
- Fix grad checkpointing by @hiyouga in #6916 and #6931
- Fix ollama template by @hiyouga in #6902
- Fix ray example by @erictang000 in #6906
- Improve error handling for media by @noahc1510 in #6128
- Support split on each dataset by @SrWYG in #5522
- Fix gen kwargs in training by @aliencaocao in #5451
- Liger kernel for qwen2.5vl by @hiyouga in #6930
- Fix lora target modules by @hiyouga in #6944
- Add
ray_storage_path
by @erictang000 in #6920 - Fix trainer.predict by @hiyouga in #6972
- Add min resolution control by @hiyouga in #6975
- Upgrade transformers to 4.49 by @hiyouga in #6982
- Add seed in vllm batch predict by @JieShenAI in #7058
- Fix pyproject.toml by @hiyouga in #7067
- Upgrade CANN images by @leo-pony in #7061
- Display swanlab link by @Zeyi-Lin in #7089
- Fix hf engine by @hiyouga in #7120
- Add bailing chat template by @oldstree in #7117
- Use bicubic resampler instead of nearest by @hiyouga in #7143
- Fix Qwen2Audio plugin by @lsrami in #7166
- Destroy process group by @hiyouga in #7174
- Fix swanlab callback by @Zeyi-Lin in #7176
- Fix paligemma plugin by @hiyouga in #7181
- Escape html tag in webui by @hiyouga in #7190
- Upgrade vllm to 0.7.3 by @hiyouga in #7183 and #7193
- Fix parser by @hiyouga in #7204
- Fix function formatter by @zhangch-ss in #7201
- Fix deepspeed config by @hiyouga in #7205
- Fix dataloader by @hiyouga in #7207
- Fix export tokenizer by @hiyouga in #7230
- Update arguments by @hiyouga in #7231
- Add
swanlab_logdir
by @Zeyi-Lin in #7219 - Fix vllm batch prediction by @hiyouga in #7235
- Avoid exit after saving tokenized data by @hiyouga in #7244
- Support commit in env by @hiyouga in #7247
- Release v0.9.2 by @hiyouga in #7242
- Fix #1204 #3306 #3462 #5121 #5270 #5404 #5444 #5472 #5518 #5616 #5712 #5714 #5756 #5944 #5986 #6020 #6056 #6092 #6136 #6139 #6149 #6165 #6213 #6287 #6320 #6345 #6345 #6346 #6348 #6358 #6362 #6391 #6415 #6439 #6448 #6452 #6482 #6499 #6543 #6546 #6551 #6552 #6610 #6612 #6636 #6639 #6662 #6669 #6738 #6772 #6776 #6780 #6782 #6793 #6806 #6812 #6819 #6826 #6833 #6839 #6850 #6854 #6860 #6878 #6885 #6889 #6937 #6948 #6952 #6960 #6966 #6973 #6981 #7036 #7064 #7072 #7116 #7125 #7130 #7171 #7173 #7180 #7182 #7184 #7192 #7198 #7213 #7234 #7243
Full Changelog: v0.9.1...v0.9.2
v0.9.1: Many Vision Models, Qwen2.5 Coder, Gradient Fix
New features
- 🔥Support Llama-3.2 and Llama-3.2-Vision by @marko1616 in #5547 and #5555
- 🔥Support LLaVA-NeXT, LLaVA-NeXT-Video and Video-LLaVA by @BUAADreamer in #5574
- 🔥Support Pixtral model by @Kuangdd01 in #5581
- Support EXAONE3.0 by @shing100 in #5585
- Support Index-series models by @Cuiyn in #5910
- Support Liger-Kernel for Qwen2-VL by @aliencaocao in #5438
- Support download models from ModelHub by @huniu20 in #5642
- Fix abnormal loss values in transformers 4.46 by @hiyouga in #5852 #5871
- Support multi-image inference by @hiyouga in #5895
- Support calculating effective tokens for SFT and DPO by @wtmlon in #6078
Note: now you can install transformers>=4.46.0,<=4.46.1
to make the gradient accumulation fix enabled.
New models
- Base models
- Qwen2.5 (0.5B/1.5B/3B/7B/14B/32B/72B) 📄
- Qwen2.5-Coder (0.5B/1.5B/3B/7B/14B/32B) 📄🖥️
- Llama-3.2 (1B/3B) 📄
- OpenCoder (1.5B/8B) 📄🖥️
- Index (1.9B) 📄
- Instruct/Chat models
- Qwen2.5-Instruct (0.5B/1.5B/3B/7B/14B/32B/72B) 📄🤖
- Qwen2.5-Coder-Instruct (0.5B/1.5B/3B/7B/14B/32B) 📄🤖🖥️
- Llama-3.2-Instruct (1B/3B) 📄🤖
- OpenCoder-Instruct (1.5B/8B) 📄🤖🖥️
- Index-Chat (1.9B) 📄🤖
- LLaVA-NeXT (7B/8B/13B/34B/72B/110B) 📄🤖🖼️
- LLaVA-NeXT-Video (7B/34B) 📄🤖🖼️
- Video-LLaVA (7B) 📄🤖🖼️
- Pixtral (12B) 📄🤖🖼️
- EXAONE-3.0-Instruct (8B) 📄🤖
Security fix
- Fix CVE-2024-52803 by @superboy-zjc in aa6a174
Bug fix
- Update version of rocm docker by @HardAndHeavy in #5427
- Fix Phi-3-small template by @menibrief in #5475
- Fix function call dataset process function by @whybeyoung in #5483
- Add docker args by @StrangeBytesDev in #5533
- Fix logger by @chengchengpei in #5546
- Fix Gemma2 flash attention warning by @amrear in #5580
- Update setup by @johnnynunez in #5615 #5665
- Add project by @NLPJCL in #5801
- Fix saving Qwen2-VL processor by @hiyouga in #5857
- Support change base image in dockerfile by @sd3ntato in #5880
- Fix template replace behaviour by @hiyouga in #5907
- Add
image_dir
argument by @hiyouga in #5909 - Add rank0 logger by @hiyouga in #5912
- Fix DPO metrics by @hiyouga in #5913 #6052
- Update datasets version by @hiyouga in #5926
- Fix chat engines by @hiyouga in #5927
- Fix vllm 0.6.3 by @hiyouga in #5970
- Fix extra args in llamaboard by @hiyouga in #5971
- Fix vllm input args by @JJJJerry in #5973
- Add
vllm_config
args by @hiyouga in #5982 #5990 - Add shm_size in docker compose config by @XYZliang in #6010
- Fix tyro version by @hiyouga in #6065
- Fix ci by @hiyouga in #6120
- Fix Qwen2-VL inference on vLLM by @hiyouga in #6123 #6126
- Release v0.9.1 by @hiyouga in #6124
- Fix #3881 #4712 #5411 #5542 #5549 #5611 #5668 #5705 #5747 #5749 #5768 #5796 #5797 #5883 #5904 #5966 #5988 #6050 #6061
Full Changelog: v0.9.0...v0.9.1
v0.9.0: Qwen2-VL, Liger-Kernel, Adam-mini
Congratulations on 30,000 stars 🎉 Follow us at X (twitter)
New features
- 🔥Support fine-tuning Qwen2-VL model on multi-image datasets by @simonJJJ in #5290
- 🔥Support time&memory-efficient Liger-Kernel via the
enable_liger_kernel
argument by @hiyouga - 🔥Support memory-efficient Adam-mini optimizer via the
use_adam_mini
argument by @relic-yuexi in #5095 - Support fine-tuning Qwen2-VL model on video datasets by @hiyouga in #5365 and @BUAADreamer in #4136 (needs patch huggingface/transformers#33307)
- Support fine-tuning vision language models (VLMs) using RLHF/DPO/ORPO/SimPO approaches by @hiyouga
- Support Unsloth's asynchronous activation offloading method via the
use_unsloth_gc
argument - Support vLLM 0.6.0 version
- Support MFU calculation by @yzoaim in #5388
New models
- Base models
- Qwen2-Math (1.5B/7B/72B) 📄🔢
- Yi-Coder (1.5B/9B) 📄🖥️
- InternLM2.5 (1.8B/7B/20B) 📄
- Gemma-2-2B 📄
- Meta-Llama-3.1 (8B/70B) 📄
- Instruct/Chat models
- MiniCPM/MiniCPM3 (1B/2B/4B) by @LDLINGLINGLING in #4996 #5372 📄🤖
- Qwen2-Math-Instruct (1.5B/7B/72B) 📄🤖🔢
- Yi-Coder-Chat (1.5B/9B) 📄🤖🖥️
- InternLM2.5-Chat (1.8B/7B/20B) 📄🤖
- Qwen2-VL-Instruct (2B/7B) 📄🤖🖼️
- Gemma-2-2B-it by @codemayq in #5037 📄🤖
- Meta-Llama-3.1-Instruct (8B/70B) 📄🤖
- Mistral-Nemo-Instruct (12B) 📄🤖
New datasets
- Supervised fine-tuning datasets
- Magpie-ultra-v0.1 (en) 📄
- Pokemon-gpt4o-captions (en&zh) 📄🖼️
- Preference datasets
- RLHF-V (en) 📄🖼️
- VLFeedback (en) 📄🖼️
Changes
- Due to compatibility consideration, fine-tuning vision language models (VLMs) requires
transformers>=4.35.0.dev0
, trypip install git+https://github.com/huggingface/transformers.git
to install it. visual_inputs
has been deprecated, now you do not need to specify this argument.- LlamaFactory now adopts lazy loading for multimodal inputs, see #5346 for details. Please use
preprocessing_batch_size
to restrict the batch size in dataset pre-processing (supported by @naem1023 in #5323 ). - LlamaFactory now supports
lmf
(equivalent tollamafactory-cli
) as a shortcut command.
Bug fix
- Fix LlamaBoard export by @liuwwang in #4950
- Add ROCm dockerfiles by @HardAndHeavy in #4970
- Fix deepseek template by @piamo in #4892
- Fix pissa savecallback by @codemayq in #4995
- Add Korean display language in LlamaBoard by @Eruly in #5010
- Fix deepseekcoder template by @relic-yuexi in #5072
- Fix examples by @codemayq in #5109
- Fix
mask_history
truncate from last by @YeQiuO in #5115 - Fix jinja template by @YeQiuO in #5156
- Fix PPO optimizer and lr scheduler by @liu-zichen in #5163
- Add SailorLLM template by @chenhuiyu in #5185
- Fix XPU device count by @Zxilly in #5188
- Fix bf16 check in NPU by @Ricardo-L-C in #5193
- Update NPU docker image by @MengqingCao in #5230
- Fix image input api by @marko1616 in #5237
- Add liger-kernel link by @ByronHsu in #5317
- Fix #4684 #4696 #4917 #4925 #4928 #4944 #4959 #4992 #5035 #5048 #5060 #5092 #5228 #5252 #5292 #5295 #5305 #5307 #5308 #5324 #5331 #5334 #5338 #5344 #5366 #5384
v0.8.3: Neat Packing, Split Evaluation
New features
- 🔥Support contamination-free packing via the
neat_packing
argument by @chuan298 in #4224 - 🔥Support split evaluation via the
eval_dataset
argument by @codemayq in #4691 - 🔥Support HQQ/EETQ quantization via the
quantization_method
argument by @hiyouga - 🔥Support ZeRO-3 when using BAdam by @Ledzy in #4352
- Support train on the last turn via the
mask_history
argument by @aofengdaxia in #4878 - Add NPU Dockerfile by @MengqingCao in #4355
- Support building FlashAttention2 in Dockerfile by @hzhaoy in #4461
- Support
batch_eval_metrics
at evaluation by @hiyouga
New models
- Base models
- InternLM2.5-7B 📄
- Gemma2 (9B/27B) 📄
- Instruct/Chat models
Changes
- Fix DPO cutoff len and deprecate
reserved_label_len
argument - Improve loss function for reward modeling
Bug fix
- Fix numpy version by @MengqingCao in #4382
- Improve cli by @kno10 in #4409
- Add
tool_format
parameter to control prompt by @mMrBun in #4417 - Automatically label npu issue by @MengqingCao in #4445
- Fix flash_attn args by @stceum in #4446
- Fix docker-compose path by @MengqingCao in #4544
- Fix torch-npu dependency by @hashstone in #4561
- Fix deepspeed + pissa by @hzhaoy in #4580
- Improve cli by @injet-zhou in #4590
- Add project by @wzh1994 in #4662
- Fix docstring by @hzhaoy in #4673
- Fix Windows command preview in WebUI by @marko1616 in #4700
- Fix vllm 0.5.1 by @T-Atlas in #4706
- Fix save value head model callback by @yzoaim in #4746
- Fix CUDA Dockerfile by @hzhaoy in #4781
- Fix examples by @codemayq in #4804
- Fix evaluation data split by @codemayq in #4821
- Fix CI by @codemayq in #4822
- Fix #2290 #3974 #4113 #4379 #4398 #4402 #4410 #4419 #4432 #4456 #4458 #4549 #4556 #4579 #4592 #4609 #4617 #4674 #4677 #4683 #4684 #4699 #4705 #4731 #4742 #4779 #4780 #4786 #4792 #4820 #4826
v0.8.2: PiSSA, Parallel Functions
New features
- Support GLM-4 tools and parallel function calling by @mMrBun in #4173
- Support PiSSA fine-tuning by @hiyouga in #4307
New models
- Base models
- DeepSeek-Coder-V2 (16B MoE/236B MoE) 📄
- Instruct/Chat models
- MiniCPM-2B 📄🤖
- DeepSeek-Coder-V2-Instruct (16B MoE/236B MoE) 📄🤖
New datasets
- Supervised fine-tuning datasets
- Neo-sft (zh)
- Magpie-Pro-300K-Filtered (en) by @EliMCosta in #4309
- WebInstruct (en) by @EliMCosta in #4309
Bug fix
- Fix DPO+ZeRO3 problem by @hiyouga
- Add MANIFEST.in by @iamthebot in #4191
- Fix eos_token in llama3 pretrain by @dignfei in #4204
- Fix vllm version by @kimdwkimdw and @hzhaoy in #4234 and #4246
- Fix Dockerfile by @EliMCosta in #4314
- Fix pandas version by @zzxzz12345 in #4334
- Fix #3162 #3196 #3778 #4198 #4209 #4221 #4227 #4238 #4242 #4271 #4292 #4295 #4326 #4346 #4357 #4362
v0.8.1: Patch release
v0.8.0: GLM-4, Qwen2, PaliGemma, KTO, SimPO
Stronger LlamaBoard 💪😀
- Support single-node distributed training in Web UI
- Add dropdown menu for easily resuming from checkpoints and picking saved configurations by @hiyouga and @hzhaoy in #4053
- Support selecting checkpoints of full/freeze tuning
- Add throughput metrics to LlamaBoard by @injet-zhou in #4066
- Faster UI loading
New features
- Add KTO algorithm by @enji-zhou in #3785
- Add SimPO algorithm by @hiyouga
- Support passing
max_lora_rank
to the vLLM backend by @jue-jue-zi in #3794 - Support preference datasets in sharegpt format and remove big files from git repo by @hiyouga in #3799
- Support setting system messages in CLI inference by @ycjcl868 in #3812
- Add
num_samples
option indataset_info.json
by @seanzhang-zhichen in #3829 - Add NPU docker image by @dongdongqiang2018 in #3876
- Improve NPU document by @MengqingCao in #3930
- Support SFT packing with greedy knapsack algorithm by @AlongWY in #4009
- Add
llamafactory-cli env
for bug report - Support image input in the API mode
- Support random initialization via the
train_from_scratch
argument - Initialize CI
New models
- Base models
- Qwen2 (0.5B/1.5B/7B/72B/MoE) 📄
- PaliGemma-3B (pt/mix) 📄🖼️
- GLM-4-9B 📄
- Falcon-11B 📄
- DeepSeek-V2-Lite (16B) 📄
- Instruct/Chat models
New datasets
- Pre-training datasets
- FineWeb (en)
- FineWeb-Edu (en)
- Supervised fine-tuning datasets
- Ruozhiba-GPT4 (zh)
- STEM-Instruction (zh)
- Preference datasets
- Argilla-KTO-mix-15K (en)
- UltraFeedback (en)
Bug fix
- Fix RLHF for multimodal finetuning
- Fix LoRA target in multimodal finetuning by @BUAADreamer in #3835
- Fix
yi
template by @Yimi81 in #3925 - Fix abort issue in LlamaBoard by @injet-zhou in #3987
- Pass
scheduler_specific_kwargs
toget_scheduler
by @Uminosachi in #4006 - Fix hyperparameters helps by @xu-song in #4007
- Update issue template by @statelesshz in #4011
- Fix vllm dtype parameter
- Fix exporting hyperparameters by @MengqingCao in #4080
- Fix DeepSpeed ZeRO3 in PPO trainer
- Fix #3108 #3387 #3646 #3717 #3764 #3769 #3803 #3807 #3818 #3837 #3847 #3853 #3873 #3900 #3931 #3965 #3971 #3978 #3992 #4005 #4012 #4013 #4022 #4033 #4043 #4061 #4075 #4077 #4079 #4085 #4090 #4120 #4132 #4137 #4139
v0.7.1: Ascend NPU Support, Yi-VL Models
🚨🚨 Core refactor 🚨🚨
- Add CLIs usage, now we recommend using
llamafactory-cli
to launch training and inference, the entry point is located at the cli.py - Rename files:
train_bash.py
->train.py
,train_web.py
->webui.py
,api_demo.py
->api.py
- Remove files:
cli_demo.py
,evaluate.py
,export_model.py
,web_demo.py
, usellamafactory-cli chat/eval/export/webchat
instead - Use YAML configs in examples instead of shell scripts for a pretty view
- Remove the sha1 hash check when loading datasets
- Rename arguments:
num_layer_trainable
->freeze_trainable_layers
,name_module_trainable
->freeze_trainable_modules
The above changes are made by @hiyouga in #3596
REMINDER: Now installation is mandatory to use LLaMA Factory
New features
- Support training and inference on the Ascend NPU 910 devices by @zhou-wjjw and @statelesshz (docker images are also provided)
- Support
stop
parameter in vLLM engine by @zhaonx in #3527 - Support fine-tuning token embeddings in freeze tuning via the
freeze_extra_modules
argument - Add Llama3 quickstart to readme
New models
- Base models
- Yi-1.5 (6B/9B/34B) 📄
- DeepSeek-V2 (236B) 📄
- Instruct/Chat models
- Yi-1.5-Chat (6B/9B/34B) 📄🤖
- Yi-VL-Chat (6B/34B) by @BUAADreamer in #3748 📄🖼️🤖
- Llama3-Chinese-Chat (8B/70B) 📄🤖
- DeepSeek-V2-Chat (236B) 📄🤖
Bug fix
- Add badam arguments to LlamaBoard by @codemayq in #3487
- Add openai data format to readme by @khazic in #3490
- Fix slow operation in dpo/orpo trainer by @hiyouga
- Fix badam examples by @pha123661 in #3578
- Fix download link of the nectar_rm dataset by @ZeyuTeng96 in #3588
- Add project by @Katehuuh in #3601
- Fix dockerfile by @gaussian8 in #3604
- Fix full tuning of MLLMs by @BUAADreamer in #3651
- Fix gradio environment variables by @cocktailpeanut in #3654
- Fix typo and add log in API by @Tendo33 in #3655
- Fix download link of the phi-3 model by @YUUUCC in #3683
- Fix #3559 #3560 #3602 #3603 #3606 #3625 #3650 #3658 #3674 #3694 #3702 #3724 #3728
v0.7.0: LLaVA Multimodal LLM Support
Congratulations on 20k stars 🎉 We are the 1st of the GitHub Trending at Apr. 23rd 🔥 Follow us at X
New features
- Support SFT/PPO/DPO/ORPO for the LLaVA-1.5 model by @BUAADreamer in #3450
- Support inferring the LLaVA-1.5 model with both native Transformers and vLLM by @hiyouga in #3454
- Support vLLM+LoRA inference for partial models (see support list)
- Support 2x faster generation of the QLoRA model based on UnslothAI's optimization
- Support adding new special tokens to the tokenizer via the
new_special_tokens
argument - Support choosing the device to merge LoRA in LlamaBoard via the
export_device
argument - Add a Colab notebook for getting into fine-tuning the Llama-3 model on a free T4 GPU
- Automatically enable SDPA attention and fast tokenizer for higher performance
New models
- Base models
- OLMo-1.7-7B
- Jamba-v0.1-51B
- Qwen1.5-110B
- DBRX-132B-Base
- Instruct/Chat models
- Phi-3-mini-3.8B-instruct (4k/128k)
- LLaVA-1.5-7B
- LLaVA-1.5-13B
- Qwen1.5-110B-Chat
- DBRX-132B-Instruct
New datasets
- Supervised fine-tuning datasets
- LLaVA mixed (en&zh) by @BUAADreamer in #3471
- Preference datasets
- DPO mixed (en&zh) by @hiyouga