v0.3.0
What's Changed
- Fix sharegpt type in doc by @NanoCode012 in #202
- add support for opimum bettertransformers by @winglian in #92
- Use AutoTokenizer for redpajama example by @sroecker in #209
- issue #205 bugfix by @MaciejKarasek in #206
- Fix tokenizing labels by @winglian in #214
- add float16 docs and tweak typehints by @winglian in #212
- support adamw and grad norm hyperparams by @winglian in #215
- Fixing Data Readme by @msinha251 in #235
- don't fail fast by @winglian in #218
- better py3 support w pre-commit by @winglian in #241
- optionally define whether to use_fast tokenizer by @winglian in #240
- skip the system prompt by @winglian in #243
- push intermediate model checkpoints to hub by @winglian in #244
- System prompt data by @winglian in #224
- Add cfg.push_to_hub_model_id to readme by @NanoCode012 in #252
- Fix typing list in prompt tokenizer by @NanoCode012 in #249
- add option for instruct w sys prompts by @winglian in #246
- open orca support by @winglian in #255
- update pip install command for apex by @winglian in #247
- Fix future deprecation push_to_hub_model_id by @NanoCode012 in #258
- [WIP] Support loading data files from a local directory by @utensil in #221
- Fix(readme): local path loading and custom strategy type by @NanoCode012 in #264
- don't use llama if trust_remote_code is set since that needs to use AutoModel path by @winglian in #266
- params are adam_, not adamw_ by @winglian in #268
- Quadratic warmup by @winglian in #271
- support for loading a model by git revision by @winglian in #272
- Feat(docs): Add model_revision arg by @NanoCode012 in #273
- Feat: Add save_safetensors by @NanoCode012 in #275
- Feat: Set push to hub as private by default by @NanoCode012 in #274
- Allow non-default dataset configurations by @cg123 in #277
- Feat(readme): improve docs on multi-gpu by @NanoCode012 in #279
- Update requirements.txt by @teknium1 in #280
- Logging update: added PID and formatting by @theobjectivedad in #276
- git fetch fix for docker by @winglian in #283
- misc fixes by @winglian in #286
- fix axolotl training args dataclass annotation by @winglian in #287
- fix(readme): remove accelerate config by @NanoCode012 in #288
- add hf_transfer to requirements for faster hf upload by @winglian in #289
- Fix(tokenizing): Use multi-core by @NanoCode012 in #293
- Pytorch 2.0.1 by @winglian in #300
- Fix(readme): Improve wording for push model by @NanoCode012 in #304
- add apache 2.0 license by @winglian in #308
- Flash attention 2 by @winglian in #299
- don't resize embeddings to multiples of 32x by default by @winglian in #313
- Add XGen info to README and example config by @ethanhs in #306
- better handling since xgen tokenizer breaks with convert_tokens_to_ids by @winglian in #307
- add runpod envs to .bashrc, fix bnb env by @winglian in #316
- update prompts for open orca to match the paper by @winglian in #317
- latest HEAD of accelerate causes 0 loss immediately w FSDP by @winglian in #321
- Prune cuda117 by @winglian in #327
- update README for updated docker images by @winglian in #328
- fix FSDP save of final model by @winglian in #329
- pin accelerate so it works with llama2 by @winglian in #330
- add peft install back since it doesn't get installed by setup.py by @winglian in #331
- lora/qlora w flash attention fixes by @winglian in #333
- feat/llama-2 examples by @mhenrichsen in #319
- update README by @tmm1 in #337
- Fix flash-attn + qlora not working with llama models by @tmm1 in #336
- optimize the iteration when tokenizeing large datasets by @winglian in #332
- Added Orca Mini prompt strategy by @jphme in #263
- Update XFormers Attention Monkeypatch to handle Llama-2 70B (GQA) by @ssmi153 in #339
- add a basic ds zero3 config by @winglian in #347
- experimental llama 2 chat support by @jphme in #296
- ensure enable_input_require_grads is called on model before getting the peft model by @winglian in #345
- set
group_by_length
to false in all examples by @tmm1 in #350 - GPU memory usage logging by @tmm1 in #354
- simplify
load_model
signature by @tmm1 in #356 - Clarify pre-tokenize before multigpu by @NanoCode012 in #359
- Update README.md on pretraining_dataset by @NanoCode012 in #360
- bump to latest bitsandbytes release with major bug fixes by @tmm1 in #355
- feat(merge): save tokenizer on merge by @NanoCode012 in #362
- Feat: Add rope scaling by @NanoCode012 in #343
- Fix(message): Improve error message for bad format by @NanoCode012 in #365
- fix(model loading): warn when model revision is passed to gptq by @NanoCode012 in #364
- Add wandb_entity to wandb options, update example configs, update README by @morganmcg1 in #361
- fix(save): save as safetensors by @NanoCode012 in #363
- Attention mask and position id fixes for packing by @winglian in #285
- attempt to run non-base docker builds on regular cpu hosts by @winglian in #369
- revert previous change and build ax images w docker on gpu by @winglian in #371
- extract module for working with cfg by @tmm1 in #372
- quiet noise from llama tokenizer by setting pad token earlier by @tmm1 in #374
- improve GPU logging to break out pytorch cache and system mem by @tmm1 in #376
- simplify
load_tokenizer
by @tmm1 in #375 - fix check for flash attn branching by @winglian in #377
- fix for models loading on cpu when not using accelerate launch by @tmm1 in #373
- save tokenizer before training starts by @winglian in #380
- Feat(doc): Improve sharegpt doc by @NanoCode012 in #378
- Fix crash when running without CUDA by @cg123 in #384
- bump flash-attn to 2.0.4 for the base docker image by @winglian in #382
- don't pass rope_scaling kwarg if it's None by @winglian in #383
- new llama-2 default settings by @mhenrichsen in #370
- Error msg for sharegpt if conv has less than 2 msg by @flotos in #379
- Feat(config): Add hub_strategy by @NanoCode012 in #386
- Added "epoch" evaluation_strategy by @flotos in #388
- Feat(config): add max steps by @ittailup in #387
- Feat(doc): Add max_steps to readme by @NanoCode012 in #389
- don't use mask expansion for inference by @winglian in #392
- use context manager to run things on rank0 before others by @winglian in #397
- Feat(doc): Add how to save by epochs by @NanoCode012 in #396
- add
utils.data.prepare_dataset
by @tmm1 in #398 - better handling of empty input ids when tokenizing by @winglian in #395
- fix eval steps and strategy by @winglian in #403
- add templates, CoC and contributing guide by @lightningRalf in #126
- Ax art by @winglian in #405
- Fix(template): Remove iPhone/android from Issue template by @NanoCode012 in #407
- update docs for tokenizer_legacy by @winglian in #401
- Fix(docs): Update flash attn requirements by @NanoCode012 in #409
- Fix(config): Update handling of deepspeed config by @NanoCode012 in #404
- Feat(doc): Add lr_quadratic_warmup to readme by @NanoCode012 in #412
- update path to align with fsdp example by @mhenrichsen in #413
- tag with latest as well for axolotl-runpod by @winglian in #418
- hopefully improve the README by @winglian in #419
- use inputs for image rather than outputs for docker metadata by @winglian in #420
- Fix(template): Inform to place stack trace to Issue by @NanoCode012 in #417
- just resort to tags ans use main-latest by @winglian in #424
- Fix(docs): Remove gptq+lora and fix xformer compat list by @NanoCode012 in #423
- fix orca prompts by @winglian in #422
- fix fixture for new tokenizer handling in transformers by @winglian in #428
- remove extra accelearate in requirements by @winglian in #430
- adds color by @mhenrichsen in #425
- standardize attn hijack patches by @tmm1 in #381
- flash attn pip install by @mhenrichsen in #426
- set env for FSDP offload params by @winglian in #433
- use save_strategy from config if available by @winglian in #434
- fix comma, not a tuple by @winglian in #436
- disable eval using multipack for now by @winglian in #437
- docs(readme): add
cd axolotl
by @philpax in #440 - support user defined prompters, pretokenized datasets in config, local parquet, local arrow files by @winglian in #348
- gracefully handle empty input by @winglian in #442
- fix evals by @winglian in #447
- feat(doc): add pillow to lambda instructions by @NanoCode012 in #445
- feat(docs): improve user customized prompts by @NanoCode012 in #443
- add missing positional arg by @winglian in #450
- is_causal fix for evals? by @winglian in #451
- set env var for FSDP layer to wrap by @winglian in #453
- always drop samples that are too long by @winglian in #452
- recast loralayer, norm, lmhead + embed token weights per original qlora by @winglian in #393
- feat: add Metharme prompt strategy by @TearGosling in #446
- fix test fixture b/c hf trainer tokenization changed by @winglian in #464
- workaround so training doesn't hang when packed dataloader batches aren't even by @winglian in #461
- Fix(doc): Clarify config by @NanoCode012 in #466
- ReLoRA implementation (with quantization) by @cg123 in #322
- improve llama pad token handling by @winglian in #475
- Fix(tokenizer): Fix condition to add pad token by @NanoCode012 in #477
- fix types w lora by @winglian in #478
- allow newer deps in requirements.txt by @tmm1 in #484
- fix checkpints on multigpu by @winglian in #481
- Fix missing 'packaging' wheel by @maximegmd in #482
- fix: inference did not move the model to the correct device by @maximegmd in #483
- let transformers handle adamw_bnb_8bit by @tmm1 in #486
- Add example Llama 2 ReLoRA config by @cg123 in #471
- Feat(doc): Update eval_steps doc by @NanoCode012 in #487
- zero2 config by @mhenrichsen in #476
- Feat(cfg): Add code-llama configs for all sizes by @mhenrichsen in #479
- fix: finetune model inference needs the dtype fix to work with flash-attn by @maximegmd in #485
- Fix(tokenizer): Make sure to add pad for CodeLlamaTokenizer by @NanoCode012 in #489
- fsdp requires params be the same type too by @winglian in #493
- simplify linear layer locator by @tmm1 in #495
pad_to_sequence_len
, for reduced VRAM peak usage due to memory fragmentation by @Birch-san in #498- Refactor train cfg cli by @winglian in #499
- tweak: use default config file when only one file is present by @maximegmd in #501
- Fix(doc): Clarify no amp to full yaml docs by @NanoCode012 in #496
- remove --force-reinstall from Dockerfile to ensure correct pytorch version by @tmm1 in #492
- support for datasets with multiple names by @winglian in #480
- customizable ascii art by @winglian in #506
- add eval benchmark callback by @winglian in #441
- set zero3 optimizer betas to auto so they inherit from HF trainer config by @tmm1 in #507
- drop empty tokenized rows too by @winglian in #509
- Changed Bench Eval to report metrics correctly by split. Added total accuracy and renamed previously used bench_accuracy to bench_average_accuracy. by @alpayariyak in #512
- split train from other cli options by @winglian in #503
- Added advanced DDP args by @jphme in #515
- Debug tokenization output: Add ability to output text only (no tokens), and/or specify num samples to see by @TheBloke in #511
- log supervised token count by @winglian in #448
- Fix(doc): Inform Windows users to use WSL/docker by @NanoCode012 in #518
- fix: bad dtype for full finetune by @maximegmd in #504
- No gather single gpu by @winglian in #523
- move is_llama_derived_model into normalize_config by @tmm1 in #524
- use flash_attn xentropy when available by @tmm1 in #525
- use flash_attn rmsnorm when available by @tmm1 in #526
- Allow for custom system prompts with ShareGPT by @bdashore3 in #520
- Add support for GPTQ using native transformers/peft by @winglian in #468
- misc fixes/improvements by @winglian in #513
- log rank by @winglian in #527
- recommend padding when using sample packing by @winglian in #531
- Early stopping metric by @winglian in #537
- Adding NCCL Timeout Guide by @theobjectivedad in #536
- update readme to point to direct link to runpod template, cleanup install instrucitons by @winglian in #532
- add git environment variables to compose: avoid checkout failure erro… by @SlapDrone in #534
- workaround for md5 variations by @winglian in #533
- Update requirements.txt by @dongxiaolong in #543
- fix for quant config from model by @winglian in #540
- publish to pypi workflow on tagged release by @winglian in #549
- remove with section, doesn't seem to work by @winglian in #551
- pypi on tag push by @winglian in #552
- Ergonomic update to optimizer config documentation by @theobjectivedad in #548
- replace tags, build dist for pypi publish by @winglian in #553
- add long_description for pypi push by @winglian in #555
New Contributors
- @sroecker made their first contribution in #209
- @MaciejKarasek made their first contribution in #206
- @msinha251 made their first contribution in #235
- @cg123 made their first contribution in #277
- @teknium1 made their first contribution in #280
- @theobjectivedad made their first contribution in #276
- @ethanhs made their first contribution in #306
- @tmm1 made their first contribution in #337
- @ssmi153 made their first contribution in #339
- @morganmcg1 made their first contribution in #361
- @flotos made their first contribution in #379
- @ittailup made their first contribution in #387
- @lightningRalf made their first contribution in #126
- @philpax made their first contribution in #440
- @TearGosling made their first contribution in #446
- @maximegmd made their first contribution in #482
- @Birch-san made their first contribution in #498
- @alpayariyak made their first contribution in #512
- @TheBloke made their first contribution in #511
- @bdashore3 made their first contribution in #520
- @SlapDrone made their first contribution in #534
- @dongxiaolong made their first contribution in #543
Full Changelog: v0.2.1...v0.3.0