Releases: hpcaitech/ColossalAI
Releases · hpcaitech/ColossalAI
Version v0.4.6 Release Today!
What's Changed
Release
- [release] update version (#6109) by Hongxin Liu
Pre-commit.ci
- [pre-commit.ci] pre-commit autoupdate (#6078) by pre-commit-ci[bot]
Checkpointio
- [checkpointio] fix hybrid plugin model save (#6106) by Hongxin Liu
Mcts
Doc
- [doc] sora solution news (#6100) by binmakeswell
Extension
- [extension] hotfix compile check (#6099) by Hongxin Liu
Hotfix
Full Changelog: v0.4.6...v0.4.5
Version v0.4.5 Release Today!
What's Changed
Release
- [release] update version (#6094) by Hongxin Liu
Misc
- [misc] fit torch api upgradation and remove legecy import (#6093) by Hongxin Liu
Fp8
- [fp8] add fallback and make compile option configurable (#6092) by Hongxin Liu
Chore
- [chore] refactor by botbw
Ckpt
- [ckpt] add safetensors util by botbw
Pipeline
- [pipeline] hotfix backward for multiple outputs (#6090) by Hongxin Liu
Ring attention
- [Ring Attention] Improve comments (#6085) by Wenxuan Tan
- Merge pull request #6071 from wangbluo/ring_attention by Wang Binluo
Coati
Shardformer
- [shardformer] optimize seq parallelism (#6086) by Hongxin Liu
- [shardformer] fix linear 1d row and support uneven splits for fused qkv linear (#6084) by Hongxin Liu
Full Changelog: v0.4.5...v0.4.4
Version v0.4.4 Release Today!
What's Changed
Release
- [release] update version (#6062) by Hongxin Liu
Colossaleval
- [ColossalEval] support for vllm (#6056) by Camille Zhong
Moe
Sp
- Merge pull request #6064 from wangbluo/fix_attn by Wang Binluo
- Merge pull request #6061 from wangbluo/sp_fix by Wang Binluo
Doc
- [doc] FP8 training and communication document (#6050) by Guangyao Zhang
- [doc] update sp doc (#6055) by flybird11111
Fp8
- [fp8] Disable all_gather intranode. Disable Redundant all_gather fp8 (#6059) by Guangyao Zhang
- [fp8] fix missing fp8_comm flag in mixtral (#6057) by botbw
- [fp8] hotfix backward hook (#6053) by Hongxin Liu
Pre-commit.ci
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
Hotfix
Feature
- [Feature] Split cross-entropy computation in SP (#5959) by Wenxuan Tan
Full Changelog: v0.4.4...v0.4.3
Version v0.4.3 Release Today!
What's Changed
Release
- [release] update version (#6041) by Hongxin Liu
Fp8
- [fp8] disable all_to_all_fp8 in intranode (#6045) by Hanks
- [fp8] fix linear hook (#6046) by Hongxin Liu
- [fp8] optimize all-gather (#6043) by Hongxin Liu
- [FP8] unsqueeze scale to make it compatible with torch.compile (#6040) by Guangyao Zhang
- Merge pull request #6012 from hpcaitech/feature/fp8_comm by Hongxin Liu
- Merge pull request #6033 from wangbluo/fix by Wang Binluo
- Merge pull request #6024 from wangbluo/fix_merge by Wang Binluo
- Merge pull request #6023 from wangbluo/fp8_merge by Wang Binluo
- [fp8] Merge feature/fp8_comm to main branch of Colossalai (#6016) by Wang Binluo
- [fp8] zero support fp8 linear. (#6006) by flybird11111
- [fp8] add use_fp8 option for MoeHybridParallelPlugin (#6009) by Wang Binluo
- [fp8]update reduce-scatter test (#6002) by flybird11111
- [fp8] linear perf enhancement by botbw
- [fp8] update torch.compile for linear_fp8 to >= 2.4.0 (#6004) by botbw
- [fp8] support asynchronous FP8 communication (#5997) by flybird11111
- [fp8] refactor fp8 linear with compile (#5993) by Hongxin Liu
- [fp8] support hybrid parallel plugin (#5982) by Wang Binluo
- [fp8]Moe support fp8 communication (#5977) by flybird11111
- [fp8] use torch compile (torch >= 2.3.0) (#5979) by botbw
- [fp8] support gemini plugin (#5978) by Hongxin Liu
- [fp8] support fp8 amp for hybrid parallel plugin (#5975) by Hongxin Liu
- [fp8] add fp8 linear (#5967) by Hongxin Liu
- [fp8]support all2all fp8 (#5953) by flybird11111
- [FP8] rebase main (#5963) by flybird11111
- Merge pull request #5961 from ver217/feature/zeor-fp8 by Hanks
- [fp8] add fp8 comm for low level zero by ver217
Hotfix
- [Hotfix] Remove deprecated install (#6042) by Tong Li
- [Hotfix] Fix llama fwd replacement bug (#6031) by Wenxuan Tan
- [Hotfix] Avoid fused RMSnorm import error without apex (#5985) by Edenzzzz
- [Hotfix] README link (#5966) by Tong Li
- [hotfix] Remove unused plan section (#5957) by Tong Li
Colossalai/checkpoint_io/...
- [colossalai/checkpoint_io/...] fix bug in load_state_dict_into_model; format error msg (#6020) by Gao, Ruiyuan
Colossal-llama
Plugin
- [plugin] hotfix zero plugin (#6036) by Hongxin Liu
- [plugin] add cast inputs option for zero (#6003) (#6022) by Hongxin Liu
- [plugin] add cast inputs option for zero (#6003) by Hongxin Liu
Ci
- [CI] Remove triton version for compatibility bug; update req torch >=2.2 (#6018) by Wenxuan Tan
Pre-commit.ci
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] pre-commit autoupdate (#5995) by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
Colossalchat
Misc
- [misc] Use dist logger in plugins (#6011) by Edenzzzz
- [misc] update compatibility (#6008) by Hongxin Liu
- [misc] Bypass the huggingface bug to solve the mask mismatch problem (#5991) by Haze188
- [misc] remove useless condition by haze188
- [misc] fix ci failure: change default value to false in moe plugin by haze188
- [misc] remove incompatible test config by haze188
- [misc] remove debug/print code by haze188
- [misc] skip redunant test by haze188
- [misc] solve booster hang by rename the variable by haze188
Feature
- [Feature] Zigzag Ring attention (#5905) by Edenzzzz
- [Feature]: support FP8 communication in DDP, FSDP, Gemini (#5928) by Hanks
- [Feature] llama shardformer fp8 support (#5938) by Guangyao Zhang
- [Feature] MoE Ulysses Support (#5918) by Haze188
Chat
- [Chat] fix readme (#5989) by YeAnbang
- Merge pull request #5962 from hpcaitech/colossalchat by YeAnbang
- [Chat] Fix lora (#5946) by YeAnbang
Test ci
- [test ci]Feature/fp8 comm (#5981) by flybird11111
Docs
- [Docs] clarify launch port by Edenzzzz
Test
- [test] add zero fp8 test case by ver217
- [test] add check by hxwang
- [test] fix test: test_zero1_2 by hxwang
- [test] add mixtral modelling test by botbw
- [test] pass mixtral shardformer test by botbw
- [test] mixtra pp shard test by hxwang
- [test] add mixtral transformer test by hxwang
- [test] add mixtral for sequence classification by hxwang
Lora
- [lora] lora support hybrid parallel plugin (#5956) by Wang Binluo
Feat
Chore
- [chore] remove redundant test case, print string & reduce test tokens by botbw
- [chore] docstring by hxwang
- [chore] change moe_pg_mesh to private by hxwang
- [chore] solve moe ckpt test failure and some other arg pass failure by hxwang
- [chore] minor fix after rebase by hxwang
- [chore] minor fix by hxwang
- [chore] arg pass & remove drop token by hxwang
- [chore] trivial fix by botbw
- [chore] manually revert unintended commit by botbw
- [chore] handle non member group by hxwang
Moe
- [moe] solve dp axis issue by botbw
- [moe] remove force_overlap_comm flag and add warning instead by hxwang
- Revert "[moe] implement submesh initialization" by hxwang
- [moe] refactor mesh assignment by hxwang
- [moe] deepseek moe sp support by haze188
- [moe] remove ops by hxwang
- [moe] full test for deepseek and mixtral (pp + sp to fix) by hxwang
- [moe] finalize test (no pp) by hxwang
- [moe] init moe plugin comm setting with sp by hxwang
- [moe] clean legacy code by hxwang
- [moe] test deepseek by hxwang
- [moe] implement tp by botbw
- [moe] add mixtral dp grad scaling when not all experts are activated by botbw...
Version v0.4.2 Release Today!
What's Changed
Release
- [release] update version (#5952) by Hongxin Liu
Zero
- [zero] hotfix update master params (#5951) by Hongxin Liu
Feat
Shardformer
- [shardformer] hotfix attn mask (#5947) by Hongxin Liu
- [shardformer] hotfix attn mask (#5945) by Hongxin Liu
Chat
Feature
- [Feature] Add a switch to control whether the model checkpoint needs to be saved after each epoch ends (#5941) by zhurunhua
Hotfix
Fix bug
- [FIX BUG] convert env param to int in (#5934) by Gao, Ruiyuan
- [FIX BUG] UnboundLocalError: cannot access local variable 'default_conversation' where it is not associated with a value (#5931) by zhurunhua
Colossalchat
Examples
Plugin
- [plugin] support all-gather overlap for hybrid parallel (#5919) by Hongxin Liu
Full Changelog: v0.4.2...v0.4.1
Version v0.4.1 Release Today!
What's Changed
Release
- [release] update version (#5912) by Hongxin Liu
Misc
- [misc] support torch2.3 (#5893) by Hongxin Liu
Compatibility
- [compatibility] support torch 2.2 (#5875) by Guangyao Zhang
Chat
- Merge pull request #5901 from hpcaitech/colossalchat by YeAnbang
- Merge pull request #5850 from hpcaitech/rlhf_SimPO by YeAnbang
Shardformer
- [ShardFormer] fix qwen2 sp (#5903) by Guangyao Zhang
- [ShardFormer] Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM (#5897) by Guangyao Zhang
- [shardformer] DeepseekMoE support (#5871) by Haze188
- [shardformer] fix the moe (#5883) by Wang Binluo
- [Shardformer] change qwen2 modeling into gradient checkpointing style (#5874) by Jianghai
- [shardformer]delete xformers (#5859) by flybird11111
Auto parallel
- [Auto Parallel]: Speed up intra-op plan generation by 44% (#5446) by Stephan Kö
Zero
- [zero] support all-gather overlap (#5898) by Hongxin Liu
Pre-commit.ci
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] pre-commit autoupdate (#5878) by pre-commit-ci[bot]
- [pre-commit.ci] pre-commit autoupdate (#5572) by pre-commit-ci[bot]
Feature
Hotfix
- [HotFix] CI,import,requirements-test for #5838 (#5892) by Runyu Lu
- [Hotfix] Fix OPT gradient checkpointing forward by Edenzzzz
- [hotfix] fix the bug that large tensor exceed the maximum capacity of TensorBucket (#5879) by Haze188
Feat
Hoxfix
- [Hoxfix] Fix CUDA_DEVICE_MAX_CONNECTIONS for comm overlap by Edenzzzz
Quant
- [quant] fix bitsandbytes version check (#5882) by Hongxin Liu
Doc
- [doc] Update llama + sp compatibility; fix dist optim table by Edenzzzz
Moe/zero
Full Changelog: v0.4.1...v0.4.0
Version v0.4.0 Release Today!
What's Changed
Release
- [release] update version (#5864) by Hongxin Liu
Inference
Shardformer
- [shardformer] Support the T5ForTokenClassification model (#5816) by Guangyao Zhang
Zero
- [zero] use bucket during allgather (#5860) by Hongxin Liu
Gemini
Feature
Doc
- [doc] add GPU cloud playground (#5851) by binmakeswell
- [doc] fix open sora model weight link (#5848) by binmakeswell
- [doc] opensora v1.2 news (#5846) by binmakeswell
Full Changelog: v0.4.0...v0.3.9
Version v0.3.9 Release Today!
What's Changed
Release
- [release] update version (#5833) by Hongxin Liu
Fix
- [Fix] Fix spec-dec Glide LlamaModel for compatibility with transformers (#5837) by Yuanheng Zhao
Shardformer
- [shardformer] Change atol in test command-r weight-check to pass pytest (#5835) by Guangyao Zhang
- Merge pull request #5818 from GuangyaoZhang/command-r by Guangyao Zhang
- [shardformer] upgrade transformers to 4.39.3 (#5815) by flybird11111
- [shardformer] fix modeling of bloom and falcon (#5796) by Hongxin Liu
- [shardformer] fix import (#5788) by Hongxin Liu
Devops
- [devops] Remove building on PR when edited to avoid skip issue (#5836) by Guangyao Zhang
- [devops] fix docker ci (#5780) by Hongxin Liu
Launch
Misc
- [misc] Add dist optim to doc sidebar (#5806) by Edenzzzz
- [misc] update requirements (#5787) by Hongxin Liu
- [misc] fix dist logger (#5782) by Hongxin Liu
- [misc] Accelerate CI for zero and dist optim (#5758) by Edenzzzz
- [misc] update dockerfile (#5776) by Hongxin Liu
Pre-commit.ci
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
Gemini
- [gemini] quick fix on possible async operation (#5803) by botbw
- [Gemini] Use async stream to prefetch and h2d data moving (#5781) by Haze188
- [gemini] optimize reduce scatter d2h copy (#5760) by botbw
Inference
- [Inference] Fix flash-attn import and add model test (#5794) by Li Xingjian
- [Inference]refactor baichuan (#5791) by Runyu Lu
- Merge pull request #5771 from char-1ee/refactor/modeling by Li Xingjian
- [Inference]Add Streaming LLM (#5745) by yuehuayingxueluo
Test
- [test] fix qwen2 pytest distLarge (#5797) by Guangyao Zhang
- [test] fix chatglm test kit (#5793) by Hongxin Liu
- [test] Fix/fix testcase (#5770) by duanjunwen
Colossalchat
Install
- [install]fix setup (#5786) by flybird11111
Hotfix
- [hotfix] fix testcase in test_fx/test_tracer (#5779) by duanjunwen
- [hotfix] fix llama flash attention forward (#5777) by flybird11111
- [Hotfix] Add missing init file in inference.executor (#5774) by Yuanheng Zhao
Test/ci
Ci/tests
Full Changelog: v0.3.9...v0.3.8
Version v0.3.8 Release Today!
What's Changed
Release
- [release] update version (#5752) by Hongxin Liu
Fix/example
- [Fix/Example] Fix Llama Inference Loading Data Type (#5763) by Yuanheng Zhao
Gemini
- Merge pull request #5749 from hpcaitech/prefetch by botbw
- Merge pull request #5754 from Hz188/prefetch by botbw
- [Gemini] add some code for reduce-scatter overlap, chunk prefetch in llama benchmark. (#5751) by Haze188
- [gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713) by botbw
- Merge pull request #5733 from Hz188/feature/prefetch by botbw
- Merge pull request #5731 from botbw/prefetch by botbw
- [gemini] init auto policy prefetch by hxwang
- Merge pull request #5722 from botbw/prefetch by botbw
- [gemini] maxprefetch means maximum work to keep by hxwang
- [gemini] use compute_chunk to find next chunk by hxwang
- [gemini] prefetch chunks by hxwang
- [gemini]remove registered gradients hooks (#5696) by flybird11111
Chore
- [chore] refactor profiler utils by hxwang
- [chore] remove unnecessary assert since compute list might not be recorded by hxwang
- [chore] remove unnecessary test & changes by hxwang
- Merge pull request #5738 from botbw/prefetch by Haze188
- [chore] fix init error by hxwang
- [chore] Update placement_policy.py by botbw
- [chore] remove debugging info by hxwang
- [chore] remove print by hxwang
- [chore] refactor & sync by hxwang
- [chore] sync by hxwang
Bug
- [bug] continue fix by hxwang
- [bug] workaround for idx fix by hxwang
- [bug] fix early return (#5740) by botbw
Bugs
- [bugs] fix args.profile=False DummyProfiler errro by genghaozhe
Inference
- [inference] Fix running time of test_continuous_batching (#5750) by Yuanheng Zhao
- [Inference]Fix readme and example for API server (#5742) by Jianghai
- [inference] release (#5747) by binmakeswell
- [Inference] Fix Inference Generation Config and Sampling (#5710) by Yuanheng Zhao
- [Inference] Fix API server, test and example (#5712) by Jianghai
- [Inference] Delete duplicated copy_vector (#5716) by 傅剑寒
- [Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708) by yuehuayingxueluo
- [Inference] Add example test_ci script by CjhHa1
- [Inference] Fix bugs and docs for feat/online-server (#5598) by Jianghai
- [Inference] resolve rebase conflicts by CjhHa1
- [Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432) by Jianghai
- [Inference] ADD async and sync Api server using FastAPI (#5396) by Jianghai
- [Inference] Support the logic related to ignoring EOS token (#5693) by yuehuayingxueluo
- [Inference]Adapt temperature processing logic (#5689) by yuehuayingxueluo
- [Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679) by Steve Luo
- [Inference] Fix quant bits order (#5681) by 傅剑寒
- [inference]Add alibi to flash attn function (#5678) by yuehuayingxueluo
- [Inference] Adapt Baichuan2-13B TP (#5659) by yuehuayingxueluo
Feature
- [Feature] auto-cast optimizers to distributed version (#5746) by Edenzzzz
- [Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) by Edenzzzz
- Merge pull request #5588 from hpcaitech/feat/online-serving by Jianghai
- [Feature] qlora support (#5586) by linsj20
Example
- [example] add profile util for llama by hxwang
- [example] Update Inference Example (#5725) by Yuanheng Zhao
Colossal-inference
- [Colossal-Inference] (v0.1.0) Merge pull request #5739 from hpcaitech/feature/colossal-infer by Yuanheng Zhao
Nfc
- [NFC] fix requirements (#5744) by Yuanheng Zhao
- [NFC] Fix code factors on inference triton kernels (#5743) by Yuanheng Zhao
Ci
- [ci] Temporary fix for build on pr (#5741) by Yuanheng Zhao
- [ci] Fix example tests (#5714) by Yuanheng Zhao
Sync
- Merge pull request #5737 from yuanheng-zhao/inference/sync/main by Yuanheng Zhao
- [sync] Sync feature/colossal-infer with main by Yuanheng Zhao
- [Sync] Update from main to feature/colossal-infer (Merge pull request #5685) by Yuanheng Zhao
- [sync] resolve conflicts of merging main by Yuanheng Zhao
Shardformer
- [Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702) by Haze188
- [Shardformer]fix the num_heads assert for llama model and qwen model (#5704) by Wang Binluo
- [Shardformer] Support the Qwen2 model (#5699) by Wang Binluo
- Merge pull request #5684 from wangbluo/parallel_output by Wang Binluo
- [Shardformer] add assert for num of attention heads divisible by tp_size (#5670) by Wang Binluo
- [shardformer] support bias_gelu_jit_fused for models (#5647) by flybird11111
Pre-commit.ci
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
Doc
- [doc] Update Inference Readme (#5736) by Yuanheng Zhao
Fix/inference
- [Fix/Inference] Add unsupported auto-policy error message (#5730) by Yuanheng Zhao
Lazy
- [lazy] fix lazy cls init (#5720) by flybird11111
Misc
- [misc] Update PyTorch version in docs (#5724) by binmakeswell
- [misc] Update PyTorch version in docs (#5711) by Edenzzzz
- [misc] Add an existing issue checkbox in bug report (#5691) by Edenzzzz
- [misc] refactor launch API and tensor constructor (#5666) by Hongxin Liu
Colossal-llama
Fix
- [Fix] Llama3 Load/Omit CheckpointIO Temporarily (#5717) by Runyu Lu
- [Fix] Fix Inference Example, Tests, and Requirements (#5688) by Yuanheng Zhao
- [Fix] Fix & Update Inference Tests (compatibility w/ main) by Yuanheng Zhao
Feat
Hotfix
- [hotfix] fix inference typo (#5438) by hugo-syn
- [hotfix] fix OpenMOE example import path (#5697) by Yuanheng Zhao
- [hotfix] Fix KV He...
Version v0.3.7 Release Today!
What's Changed
Release
- [release] update version (#5654) by Hongxin Liu
- [release] grok-1 inference benchmark (#5500) by binmakeswell
- [release] grok-1 314b inference (#5490) by binmakeswell
Hotfix
- [hotfix] add soft link to support required files (#5661) by Tong Li
- [hotfix] Fixed fused layernorm bug without apex (#5609) by Edenzzzz
- [hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606) by Edenzzzz
- [hotfix] fix typo s/get_defualt_parser /get_default_parser (#5548) by digger yu
- [hotfix] quick fixes to make legacy tutorials runnable (#5559) by Edenzzzz
- [hotfix] set return_outputs=False in examples and polish code (#5404) by Wenhao Chen
- [hotfix] fix typo s/keywrods/keywords etc. (#5429) by digger yu
News
- [news] llama3 and open-sora v1.1 (#5655) by binmakeswell
Lazyinit
- [lazyinit] skip whisper test (#5653) by Hongxin Liu
Shardformer
- [shardformer] refactor pipeline grad ckpt config (#5646) by Hongxin Liu
- [shardformer] fix chatglm implementation (#5644) by Hongxin Liu
- [shardformer] remove useless code (#5645) by flybird11111
- [shardformer] update transformers (#5583) by Wang Binluo
- [shardformer] fix pipeline grad ckpt (#5620) by Hongxin Liu
- [shardformer] refactor embedding resize (#5603) by flybird11111
- [shardformer] Sequence Parallelism Optimization (#5533) by Zhongkai Zhao
- [shardformer] fix pipeline forward error if custom layer distribution is used (#5189) by Insu Jang
- [shardformer] update colo attention to support custom mask (#5510) by Hongxin Liu
- [shardformer]Fix lm parallel. (#5480) by flybird11111
- [shardformer] fix gathering output when using tensor parallelism (#5431) by flybird11111
Fix
- [Fix]: implement thread-safety singleton to avoid deadlock for very large-scale training scenarios (#5625) by Season
- [fix] fix typo s/muiti-node /multi-node etc. (#5448) by digger yu
- [Fix] Grok-1 use tokenizer from the same pretrained path (#5532) by Yuanheng Zhao
- [fix] fix grok-1 example typo (#5506) by Yuanheng Zhao
Coloattention
- [coloattention]modify coloattention (#5627) by flybird11111
Example
- [example] llama3 (#5631) by binmakeswell
- [example] update Grok-1 inference (#5495) by Yuanheng Zhao
- [example] add grok-1 inference (#5485) by Hongxin Liu
Exampe
- [exampe] update llama example (#5626) by Hongxin Liu
Feature
Zero
- [zero] support multiple (partial) backward passes (#5596) by Hongxin Liu
Doc
- [doc] fix ColossalMoE readme (#5599) by Camille Zhong
- [doc] update open-sora demo (#5479) by binmakeswell
- [doc] release Open-Sora 1.0 with model weights (#5468) by binmakeswell
Devops
- [devops] remove post commit ci (#5566) by Hongxin Liu
- [devops] fix example test ci (#5504) by Hongxin Liu
- [devops] fix compatibility (#5444) by Hongxin Liu
Shardformer, pipeline
- [shardformer, pipeline] add
gradient_checkpointing_ratio
and heterogenous shard policy for llama (#5508) by Wenhao Chen
Colossalchat
Format
- [format] applied code formatting on changed files in pull request 5510 (#5517) by github-actions[bot]
Full Changelog: v0.3.7...v0.3.6