Releases · hpcaitech/ColossalAI

04 Nov 09:28

github-actions

v0.4.6

13ffa08

Version v0.4.6 Release Today! Latest

Latest

What's Changed

Release

[release] update version (#6109) by Hongxin Liu

Pre-commit.ci

[pre-commit.ci] pre-commit autoupdate (#6078) by pre-commit-ci[bot]

Checkpointio

[checkpointio] fix hybrid plugin model save (#6106) by Hongxin Liu

Mcts

[MCTS] Add self-refined MCTS (#6098) by Tong Li

Doc

[doc] sora solution news (#6100) by binmakeswell

Extension

[extension] hotfix compile check (#6099) by Hongxin Liu

Hotfix

Merge pull request #6096 from BurkeHulk/hotfix/lora_ckpt by Hanks

Full Changelog: v0.4.6...v0.4.5

Assets 2

21 Oct 02:21

github-actions

v0.4.5

19baab5

Version v0.4.5 Release Today!

What's Changed

Release

[release] update version (#6094) by Hongxin Liu

Misc

[misc] fit torch api upgradation and remove legecy import (#6093) by Hongxin Liu

Fp8

[fp8] add fallback and make compile option configurable (#6092) by Hongxin Liu

Chore

[chore] refactor by botbw

Ckpt

[ckpt] add safetensors util by botbw

Pipeline

[pipeline] hotfix backward for multiple outputs (#6090) by Hongxin Liu

Ring attention

[Ring Attention] Improve comments (#6085) by Wenxuan Tan
Merge pull request #6071 from wangbluo/ring_attention by Wang Binluo

Coati

[Coati] Train DPO using PP (#6054) by Tong Li

Shardformer

[shardformer] optimize seq parallelism (#6086) by Hongxin Liu
[shardformer] fix linear 1d row and support uneven splits for fused qkv linear (#6084) by Hongxin Liu

Full Changelog: v0.4.5...v0.4.4

Assets 2

19 Sep 02:53

github-actions

v0.4.4

dabc2e7

Version v0.4.4 Release Today!

What's Changed

Release

[release] update version (#6062) by Hongxin Liu

Colossaleval

[ColossalEval] support for vllm (#6056) by Camille Zhong

Moe

[moe] add parallel strategy for shared_expert && fix test for deepseek (#6063) by botbw

Sp

Merge pull request #6064 from wangbluo/fix_attn by Wang Binluo
Merge pull request #6061 from wangbluo/sp_fix by Wang Binluo

Doc

[doc] FP8 training and communication document (#6050) by Guangyao Zhang
[doc] update sp doc (#6055) by flybird11111

Fp8

[fp8] Disable all_gather intranode. Disable Redundant all_gather fp8 (#6059) by Guangyao Zhang
[fp8] fix missing fp8_comm flag in mixtral (#6057) by botbw
[fp8] hotfix backward hook (#6053) by Hongxin Liu

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]

Hotfix

[hotfix] moe hybrid parallelism benchmark & follow-up fix (#6048) by botbw

Feature

[Feature] Split cross-entropy computation in SP (#5959) by Wenxuan Tan

Full Changelog: v0.4.4...v0.4.3

Assets 2

10 Sep 02:39

github-actions

v0.4.3

b3db105

Version v0.4.3 Release Today!

What's Changed

Release

[release] update version (#6041) by Hongxin Liu

Fp8

[fp8] disable all_to_all_fp8 in intranode (#6045) by Hanks
[fp8] fix linear hook (#6046) by Hongxin Liu
[fp8] optimize all-gather (#6043) by Hongxin Liu
[FP8] unsqueeze scale to make it compatible with torch.compile (#6040) by Guangyao Zhang
Merge pull request #6012 from hpcaitech/feature/fp8_comm by Hongxin Liu
Merge pull request #6033 from wangbluo/fix by Wang Binluo
Merge pull request #6024 from wangbluo/fix_merge by Wang Binluo
Merge pull request #6023 from wangbluo/fp8_merge by Wang Binluo
[fp8] Merge feature/fp8_comm to main branch of Colossalai (#6016) by Wang Binluo
[fp8] zero support fp8 linear. (#6006) by flybird11111
[fp8] add use_fp8 option for MoeHybridParallelPlugin (#6009) by Wang Binluo
[fp8]update reduce-scatter test (#6002) by flybird11111
[fp8] linear perf enhancement by botbw
[fp8] update torch.compile for linear_fp8 to >= 2.4.0 (#6004) by botbw
[fp8] support asynchronous FP8 communication (#5997) by flybird11111
[fp8] refactor fp8 linear with compile (#5993) by Hongxin Liu
[fp8] support hybrid parallel plugin (#5982) by Wang Binluo
[fp8]Moe support fp8 communication (#5977) by flybird11111
[fp8] use torch compile (torch >= 2.3.0) (#5979) by botbw
[fp8] support gemini plugin (#5978) by Hongxin Liu
[fp8] support fp8 amp for hybrid parallel plugin (#5975) by Hongxin Liu
[fp8] add fp8 linear (#5967) by Hongxin Liu
[fp8]support all2all fp8 (#5953) by flybird11111
[FP8] rebase main (#5963) by flybird11111
Merge pull request #5961 from ver217/feature/zeor-fp8 by Hanks
[fp8] add fp8 comm for low level zero by ver217

Hotfix

[Hotfix] Remove deprecated install (#6042) by Tong Li
[Hotfix] Fix llama fwd replacement bug (#6031) by Wenxuan Tan
[Hotfix] Avoid fused RMSnorm import error without apex (#5985) by Edenzzzz
[Hotfix] README link (#5966) by Tong Li
[hotfix] Remove unused plan section (#5957) by Tong Li

Colossalai/checkpoint_io/...

[colossalai/checkpoint_io/...] fix bug in load_state_dict_into_model; format error msg (#6020) by Gao, Ruiyuan

Colossal-llama

[Colossal-LLaMA] Refactor latest APIs (#6030) by Tong Li

Plugin

[plugin] hotfix zero plugin (#6036) by Hongxin Liu
[plugin] add cast inputs option for zero (#6003) (#6022) by Hongxin Liu
[plugin] add cast inputs option for zero (#6003) by Hongxin Liu

Ci

[CI] Remove triton version for compatibility bug; update req torch >=2.2 (#6018) by Wenxuan Tan

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] pre-commit autoupdate (#5995) by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]

Colossalchat

[ColossalChat] Add PP support (#6001) by Tong Li

Misc

[misc] Use dist logger in plugins (#6011) by Edenzzzz
[misc] update compatibility (#6008) by Hongxin Liu
[misc] Bypass the huggingface bug to solve the mask mismatch problem (#5991) by Haze188
[misc] remove useless condition by haze188
[misc] fix ci failure: change default value to false in moe plugin by haze188
[misc] remove incompatible test config by haze188
[misc] remove debug/print code by haze188
[misc] skip redunant test by haze188
[misc] solve booster hang by rename the variable by haze188

Feature

[Feature] Zigzag Ring attention (#5905) by Edenzzzz
[Feature]: support FP8 communication in DDP, FSDP, Gemini (#5928) by Hanks
[Feature] llama shardformer fp8 support (#5938) by Guangyao Zhang
[Feature] MoE Ulysses Support (#5918) by Haze188

Chat

[Chat] fix readme (#5989) by YeAnbang
Merge pull request #5962 from hpcaitech/colossalchat by YeAnbang
[Chat] Fix lora (#5946) by YeAnbang

Test ci

[test ci]Feature/fp8 comm (#5981) by flybird11111

Docs

[Docs] clarify launch port by Edenzzzz

Test

[test] add zero fp8 test case by ver217
[test] add check by hxwang
[test] fix test: test_zero1_2 by hxwang
[test] add mixtral modelling test by botbw
[test] pass mixtral shardformer test by botbw
[test] mixtra pp shard test by hxwang
[test] add mixtral transformer test by hxwang
[test] add mixtral for sequence classification by hxwang

Lora

[lora] lora support hybrid parallel plugin (#5956) by Wang Binluo

Feat

[feat] Dist Loader for Eval (#5950) by Tong Li

Chore

[chore] remove redundant test case, print string & reduce test tokens by botbw
[chore] docstring by hxwang
[chore] change moe_pg_mesh to private by hxwang
[chore] solve moe ckpt test failure and some other arg pass failure by hxwang
[chore] minor fix after rebase by hxwang
[chore] minor fix by hxwang
[chore] arg pass & remove drop token by hxwang
[chore] trivial fix by botbw
[chore] manually revert unintended commit by botbw
[chore] handle non member group by hxwang

Moe

[moe] solve dp axis issue by botbw
[moe] remove force_overlap_comm flag and add warning instead by hxwang
Revert "[moe] implement submesh initialization" by hxwang
[moe] refactor mesh assignment by hxwang
[moe] deepseek moe sp support by haze188
[moe] remove ops by hxwang
[moe] full test for deepseek and mixtral (pp + sp to fix) by hxwang
[moe] finalize test (no pp) by hxwang
[moe] init moe plugin comm setting with sp by hxwang
[moe] clean legacy code by hxwang
[moe] test deepseek by hxwang
[moe] implement tp by botbw
[moe] add mixtral dp grad scaling when not all experts are activated by botbw...

Assets 2

31 Jul 02:06

github-actions

v0.4.2

09c5f72

Version v0.4.2 Release Today!

What's Changed

Release

[release] update version (#5952) by Hongxin Liu

Zero

[zero] hotfix update master params (#5951) by Hongxin Liu

Feat

[Feat] Distrifusion Acceleration Support for Diffusion Inference (#5895) by Runyu Lu

Shardformer

[shardformer] hotfix attn mask (#5947) by Hongxin Liu
[shardformer] hotfix attn mask (#5945) by Hongxin Liu

Chat

Merge pull request #5922 from hpcaitech/kto by YeAnbang

Feature

[Feature] Add a switch to control whether the model checkpoint needs to be saved after each epoch ends (#5941) by zhurunhua

Hotfix

[Hotfix] Fix ZeRO typo #5936 by Edenzzzz

Fix bug

[FIX BUG] convert env param to int in (#5934) by Gao, Ruiyuan
[FIX BUG] UnboundLocalError: cannot access local variable 'default_conversation' where it is not associated with a value (#5931) by zhurunhua

Colossalchat

[ColossalChat] Hotfix for ColossalChat (#5910) by Tong Li

Examples

[Examples] Add lazy init to OPT and GPT examples (#5924) by Edenzzzz

Plugin

[plugin] support all-gather overlap for hybrid parallel (#5919) by Hongxin Liu

Full Changelog: v0.4.2...v0.4.1

Assets 2

17 Jul 09:30

github-actions

v0.4.1

73494de

Version v0.4.1 Release Today!

What's Changed

Release

[release] update version (#5912) by Hongxin Liu

Misc

[misc] support torch2.3 (#5893) by Hongxin Liu

Compatibility

[compatibility] support torch 2.2 (#5875) by Guangyao Zhang

Chat

Merge pull request #5901 from hpcaitech/colossalchat by YeAnbang
Merge pull request #5850 from hpcaitech/rlhf_SimPO by YeAnbang

Shardformer

[ShardFormer] fix qwen2 sp (#5903) by Guangyao Zhang
[ShardFormer] Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM (#5897) by Guangyao Zhang
[shardformer] DeepseekMoE support (#5871) by Haze188
[shardformer] fix the moe (#5883) by Wang Binluo
[Shardformer] change qwen2 modeling into gradient checkpointing style (#5874) by Jianghai
[shardformer]delete xformers (#5859) by flybird11111

Auto parallel

[Auto Parallel]: Speed up intra-op plan generation by 44% (#5446) by Stephan Kö

Zero

[zero] support all-gather overlap (#5898) by Hongxin Liu

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] pre-commit autoupdate (#5878) by pre-commit-ci[bot]
[pre-commit.ci] pre-commit autoupdate (#5572) by pre-commit-ci[bot]

Feature

[Feature] Enable PP + SP for llama (#5868) by Edenzzzz

Hotfix

[HotFix] CI,import,requirements-test for #5838 (#5892) by Runyu Lu
[Hotfix] Fix OPT gradient checkpointing forward by Edenzzzz
[hotfix] fix the bug that large tensor exceed the maximum capacity of TensorBucket (#5879) by Haze188

Feat

[Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (#5838) by Runyu Lu

Hoxfix

[Hoxfix] Fix CUDA_DEVICE_MAX_CONNECTIONS for comm overlap by Edenzzzz

Quant

[quant] fix bitsandbytes version check (#5882) by Hongxin Liu

Doc

[doc] Update llama + sp compatibility; fix dist optim table by Edenzzzz

Moe/zero

[MoE/ZeRO] Moe refactor with zero refactor (#5821) by Haze188

Full Changelog: v0.4.1...v0.4.0

Assets 2

28 Jun 02:51

github-actions

v0.4.0

eaea88c

Version v0.4.0 Release Today!

What's Changed

Release

[release] update version (#5864) by Hongxin Liu

Inference

[Inference]Lazy Init Support (#5785) by Runyu Lu

Shardformer

[shardformer] Support the T5ForTokenClassification model (#5816) by Guangyao Zhang

Zero

[zero] use bucket during allgather (#5860) by Hongxin Liu

Gemini

[gemini] fixes for benchmarking (#5847) by botbw
[gemini] fix missing return (#5845) by botbw

Feature

[Feature] optimize PP overlap (#5735) by Edenzzzz

Doc

[doc] add GPU cloud playground (#5851) by binmakeswell
[doc] fix open sora model weight link (#5848) by binmakeswell
[doc] opensora v1.2 news (#5846) by binmakeswell

Full Changelog: v0.4.0...v0.3.9

Assets 2

20 Jun 05:35

github-actions

v0.3.9

bd3e34f

Version v0.3.9 Release Today!

What's Changed

Release

[release] update version (#5833) by Hongxin Liu

Fix

[Fix] Fix spec-dec Glide LlamaModel for compatibility with transformers (#5837) by Yuanheng Zhao

Shardformer

[shardformer] Change atol in test command-r weight-check to pass pytest (#5835) by Guangyao Zhang
Merge pull request #5818 from GuangyaoZhang/command-r by Guangyao Zhang
[shardformer] upgrade transformers to 4.39.3 (#5815) by flybird11111
[shardformer] fix modeling of bloom and falcon (#5796) by Hongxin Liu
[shardformer] fix import (#5788) by Hongxin Liu

Devops

[devops] Remove building on PR when edited to avoid skip issue (#5836) by Guangyao Zhang
[devops] fix docker ci (#5780) by Hongxin Liu

Launch

[launch] Support IPv4 host initialization in launch (#5822) by Kai Lv

Misc

[misc] Add dist optim to doc sidebar (#5806) by Edenzzzz
[misc] update requirements (#5787) by Hongxin Liu
[misc] fix dist logger (#5782) by Hongxin Liu
[misc] Accelerate CI for zero and dist optim (#5758) by Edenzzzz
[misc] update dockerfile (#5776) by Hongxin Liu

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]

Gemini

[gemini] quick fix on possible async operation (#5803) by botbw
[Gemini] Use async stream to prefetch and h2d data moving (#5781) by Haze188
[gemini] optimize reduce scatter d2h copy (#5760) by botbw

Inference

[Inference] Fix flash-attn import and add model test (#5794) by Li Xingjian
[Inference]refactor baichuan (#5791) by Runyu Lu
Merge pull request #5771 from char-1ee/refactor/modeling by Li Xingjian
[Inference]Add Streaming LLM (#5745) by yuehuayingxueluo

Test

[test] fix qwen2 pytest distLarge (#5797) by Guangyao Zhang
[test] fix chatglm test kit (#5793) by Hongxin Liu
[test] Fix/fix testcase (#5770) by duanjunwen

Colossalchat

Merge pull request #5759 from hpcaitech/colossalchat_upgrade by YeAnbang

Install

[install]fix setup (#5786) by flybird11111

Hotfix

[hotfix] fix testcase in test_fx/test_tracer (#5779) by duanjunwen
[hotfix] fix llama flash attention forward (#5777) by flybird11111
[Hotfix] Add missing init file in inference.executor (#5774) by Yuanheng Zhao

Test/ci

[Test/CI] remove test cases to reduce CI duration (#5753) by botbw

Ci/tests

[CI/tests] simplify some test case to reduce testing time (#5755) by Haze188

Full Changelog: v0.3.9...v0.3.8

Assets 2

31 May 11:41

github-actions

v0.3.8

68359ed

Version v0.3.8 Release Today!

What's Changed

Release

[release] update version (#5752) by Hongxin Liu

Fix/example

[Fix/Example] Fix Llama Inference Loading Data Type (#5763) by Yuanheng Zhao

Gemini

Merge pull request #5749 from hpcaitech/prefetch by botbw
Merge pull request #5754 from Hz188/prefetch by botbw
[Gemini] add some code for reduce-scatter overlap, chunk prefetch in llama benchmark. (#5751) by Haze188
[gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713) by botbw
Merge pull request #5733 from Hz188/feature/prefetch by botbw
Merge pull request #5731 from botbw/prefetch by botbw
[gemini] init auto policy prefetch by hxwang
Merge pull request #5722 from botbw/prefetch by botbw
[gemini] maxprefetch means maximum work to keep by hxwang
[gemini] use compute_chunk to find next chunk by hxwang
[gemini] prefetch chunks by hxwang
[gemini]remove registered gradients hooks (#5696) by flybird11111

Chore

[chore] refactor profiler utils by hxwang
[chore] remove unnecessary assert since compute list might not be recorded by hxwang
[chore] remove unnecessary test & changes by hxwang
Merge pull request #5738 from botbw/prefetch by Haze188
[chore] fix init error by hxwang
[chore] Update placement_policy.py by botbw
[chore] remove debugging info by hxwang
[chore] remove print by hxwang
[chore] refactor & sync by hxwang
[chore] sync by hxwang

Bug

[bug] continue fix by hxwang
[bug] workaround for idx fix by hxwang
[bug] fix early return (#5740) by botbw

Bugs

[bugs] fix args.profile=False DummyProfiler errro by genghaozhe

Inference

[inference] Fix running time of test_continuous_batching (#5750) by Yuanheng Zhao
[Inference]Fix readme and example for API server (#5742) by Jianghai
[inference] release (#5747) by binmakeswell
[Inference] Fix Inference Generation Config and Sampling (#5710) by Yuanheng Zhao
[Inference] Fix API server, test and example (#5712) by Jianghai
[Inference] Delete duplicated copy_vector (#5716) by 傅剑寒
[Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708) by yuehuayingxueluo
[Inference] Add example test_ci script by CjhHa1
[Inference] Fix bugs and docs for feat/online-server (#5598) by Jianghai
[Inference] resolve rebase conflicts by CjhHa1
[Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432) by Jianghai
[Inference] ADD async and sync Api server using FastAPI (#5396) by Jianghai
[Inference] Support the logic related to ignoring EOS token (#5693) by yuehuayingxueluo
[Inference]Adapt temperature processing logic (#5689) by yuehuayingxueluo
[Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679) by Steve Luo
[Inference] Fix quant bits order (#5681) by 傅剑寒
[inference]Add alibi to flash attn function (#5678) by yuehuayingxueluo
[Inference] Adapt Baichuan2-13B TP (#5659) by yuehuayingxueluo

Feature

[Feature] auto-cast optimizers to distributed version (#5746) by Edenzzzz
[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) by Edenzzzz
Merge pull request #5588 from hpcaitech/feat/online-serving by Jianghai
[Feature] qlora support (#5586) by linsj20

Example

[example] add profile util for llama by hxwang
[example] Update Inference Example (#5725) by Yuanheng Zhao

Colossal-inference

[Colossal-Inference] (v0.1.0) Merge pull request #5739 from hpcaitech/feature/colossal-infer by Yuanheng Zhao

Nfc

[NFC] fix requirements (#5744) by Yuanheng Zhao
[NFC] Fix code factors on inference triton kernels (#5743) by Yuanheng Zhao

Ci

[ci] Temporary fix for build on pr (#5741) by Yuanheng Zhao
[ci] Fix example tests (#5714) by Yuanheng Zhao

Sync

Merge pull request #5737 from yuanheng-zhao/inference/sync/main by Yuanheng Zhao
[sync] Sync feature/colossal-infer with main by Yuanheng Zhao
[Sync] Update from main to feature/colossal-infer (Merge pull request #5685) by Yuanheng Zhao
[sync] resolve conflicts of merging main by Yuanheng Zhao

Shardformer

[Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702) by Haze188
[Shardformer]fix the num_heads assert for llama model and qwen model (#5704) by Wang Binluo
[Shardformer] Support the Qwen2 model (#5699) by Wang Binluo
Merge pull request #5684 from wangbluo/parallel_output by Wang Binluo
[Shardformer] add assert for num of attention heads divisible by tp_size (#5670) by Wang Binluo
[shardformer] support bias_gelu_jit_fused for models (#5647) by flybird11111

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]

Doc

[doc] Update Inference Readme (#5736) by Yuanheng Zhao

Fix/inference

[Fix/Inference] Add unsupported auto-policy error message (#5730) by Yuanheng Zhao

Lazy

[lazy] fix lazy cls init (#5720) by flybird11111

Misc

[misc] Update PyTorch version in docs (#5724) by binmakeswell
[misc] Update PyTorch version in docs (#5711) by Edenzzzz
[misc] Add an existing issue checkbox in bug report (#5691) by Edenzzzz
[misc] refactor launch API and tensor constructor (#5666) by Hongxin Liu

Colossal-llama

[Colossal-LLaMA] Fix sft issue for llama2 (#5719) by Tong Li

Fix

[Fix] Llama3 Load/Omit CheckpointIO Temporarily (#5717) by Runyu Lu
[Fix] Fix Inference Example, Tests, and Requirements (#5688) by Yuanheng Zhao
[Fix] Fix & Update Inference Tests (compatibility w/ main) by Yuanheng Zhao

Feat

[Feat]Inference RPC Server Support (#5705) by Runyu Lu

Hotfix

[hotfix] fix inference typo (#5438) by hugo-syn
[hotfix] fix OpenMOE example import path (#5697) by Yuanheng Zhao
[hotfix] Fix KV He...

Assets 2

27 Apr 11:00

github-actions

v0.3.7

4cfbf30

Version v0.3.7 Release Today!

What's Changed

Release

[release] update version (#5654) by Hongxin Liu
[release] grok-1 inference benchmark (#5500) by binmakeswell
[release] grok-1 314b inference (#5490) by binmakeswell

Hotfix

[hotfix] add soft link to support required files (#5661) by Tong Li
[hotfix] Fixed fused layernorm bug without apex (#5609) by Edenzzzz
[hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606) by Edenzzzz
[hotfix] fix typo s/get_defualt_parser /get_default_parser (#5548) by digger yu
[hotfix] quick fixes to make legacy tutorials runnable (#5559) by Edenzzzz
[hotfix] set return_outputs=False in examples and polish code (#5404) by Wenhao Chen
[hotfix] fix typo s/keywrods/keywords etc. (#5429) by digger yu

News

[news] llama3 and open-sora v1.1 (#5655) by binmakeswell

Lazyinit

[lazyinit] skip whisper test (#5653) by Hongxin Liu

Shardformer

[shardformer] refactor pipeline grad ckpt config (#5646) by Hongxin Liu
[shardformer] fix chatglm implementation (#5644) by Hongxin Liu
[shardformer] remove useless code (#5645) by flybird11111
[shardformer] update transformers (#5583) by Wang Binluo
[shardformer] fix pipeline grad ckpt (#5620) by Hongxin Liu
[shardformer] refactor embedding resize (#5603) by flybird11111
[shardformer] Sequence Parallelism Optimization (#5533) by Zhongkai Zhao
[shardformer] fix pipeline forward error if custom layer distribution is used (#5189) by Insu Jang
[shardformer] update colo attention to support custom mask (#5510) by Hongxin Liu
[shardformer]Fix lm parallel. (#5480) by flybird11111
[shardformer] fix gathering output when using tensor parallelism (#5431) by flybird11111

Fix

[Fix]: implement thread-safety singleton to avoid deadlock for very large-scale training scenarios (#5625) by Season
[fix] fix typo s/muiti-node /multi-node etc. (#5448) by digger yu
[Fix] Grok-1 use tokenizer from the same pretrained path (#5532) by Yuanheng Zhao
[fix] fix grok-1 example typo (#5506) by Yuanheng Zhao

Coloattention

[coloattention]modify coloattention (#5627) by flybird11111

Example

[example] llama3 (#5631) by binmakeswell
[example] update Grok-1 inference (#5495) by Yuanheng Zhao
[example] add grok-1 inference (#5485) by Hongxin Liu

Exampe

[exampe] update llama example (#5626) by Hongxin Liu

Feature

[Feature] Support LLaMA-3 CPT and ST (#5619) by Tong Li

Zero

[zero] support multiple (partial) backward passes (#5596) by Hongxin Liu

Doc

[doc] fix ColossalMoE readme (#5599) by Camille Zhong
[doc] update open-sora demo (#5479) by binmakeswell
[doc] release Open-Sora 1.0 with model weights (#5468) by binmakeswell

Devops

[devops] remove post commit ci (#5566) by Hongxin Liu
[devops] fix example test ci (#5504) by Hongxin Liu
[devops] fix compatibility (#5444) by Hongxin Liu

Shardformer, pipeline

[shardformer, pipeline] add gradient_checkpointing_ratio and heterogenous shard policy for llama (#5508) by Wenhao Chen

Colossalchat

[ColossalChat] Update RLHF V2 (#5286) by YeAnbang

Format

[format] applied code formatting on changed files in pull request 5510 (#5517) by github-actions[bot]

Full Changelog: v0.3.7...v0.3.6

Assets 2

Releases: hpcaitech/ColossalAI

Version v0.4.6 Release Today!

What's Changed

Release

Pre-commit.ci

Checkpointio

Mcts

Doc

Extension

Hotfix

Version v0.4.5 Release Today!

What's Changed

Release

Misc

Fp8

Chore

Ckpt

Pipeline

Ring attention

Coati

Shardformer

Version v0.4.4 Release Today!

What's Changed

Release

Colossaleval

Moe

Sp

Doc

Fp8

Pre-commit.ci

Hotfix

Feature

Version v0.4.3 Release Today!

What's Changed

Release

Fp8

Hotfix

Colossalai/checkpoint_io/...

Colossal-llama

Plugin

Ci

Pre-commit.ci

Colossalchat

Misc

Feature

Chat

Test ci

Docs

Test

Lora

Feat

Chore

Moe

Version v0.4.2 Release Today!

What's Changed

Release

Zero

Feat

Shardformer

Chat

Feature

Hotfix

Fix bug

Colossalchat

Examples

Plugin

Version v0.4.1 Release Today!

What's Changed

Release

Misc

Compatibility

Chat

Shardformer

Auto parallel

Zero

Pre-commit.ci

Feature

Hotfix

Feat

Hoxfix