Release v0.3.0 · axolotl-ai-cloud/axolotl

What's Changed

Fix sharegpt type in doc by @NanoCode012 in #202
add support for opimum bettertransformers by @winglian in #92
Use AutoTokenizer for redpajama example by @sroecker in #209
issue #205 bugfix by @MaciejKarasek in #206
Fix tokenizing labels by @winglian in #214
add float16 docs and tweak typehints by @winglian in #212
support adamw and grad norm hyperparams by @winglian in #215
Fixing Data Readme by @msinha251 in #235
don't fail fast by @winglian in #218
better py3 support w pre-commit by @winglian in #241
optionally define whether to use_fast tokenizer by @winglian in #240
skip the system prompt by @winglian in #243
push intermediate model checkpoints to hub by @winglian in #244
System prompt data by @winglian in #224
Add cfg.push_to_hub_model_id to readme by @NanoCode012 in #252
Fix typing list in prompt tokenizer by @NanoCode012 in #249
add option for instruct w sys prompts by @winglian in #246
open orca support by @winglian in #255
update pip install command for apex by @winglian in #247
Fix future deprecation push_to_hub_model_id by @NanoCode012 in #258
[WIP] Support loading data files from a local directory by @utensil in #221
Fix(readme): local path loading and custom strategy type by @NanoCode012 in #264
don't use llama if trust_remote_code is set since that needs to use AutoModel path by @winglian in #266
params are adam_, not adamw_ by @winglian in #268
Quadratic warmup by @winglian in #271
support for loading a model by git revision by @winglian in #272
Feat(docs): Add model_revision arg by @NanoCode012 in #273
Feat: Add save_safetensors by @NanoCode012 in #275
Feat: Set push to hub as private by default by @NanoCode012 in #274
Allow non-default dataset configurations by @cg123 in #277
Feat(readme): improve docs on multi-gpu by @NanoCode012 in #279
Update requirements.txt by @teknium1 in #280
Logging update: added PID and formatting by @theobjectivedad in #276
git fetch fix for docker by @winglian in #283
misc fixes by @winglian in #286
fix axolotl training args dataclass annotation by @winglian in #287
fix(readme): remove accelerate config by @NanoCode012 in #288
add hf_transfer to requirements for faster hf upload by @winglian in #289
Fix(tokenizing): Use multi-core by @NanoCode012 in #293
Pytorch 2.0.1 by @winglian in #300
Fix(readme): Improve wording for push model by @NanoCode012 in #304
add apache 2.0 license by @winglian in #308
Flash attention 2 by @winglian in #299
don't resize embeddings to multiples of 32x by default by @winglian in #313
Add XGen info to README and example config by @ethanhs in #306
better handling since xgen tokenizer breaks with convert_tokens_to_ids by @winglian in #307
add runpod envs to .bashrc, fix bnb env by @winglian in #316
update prompts for open orca to match the paper by @winglian in #317
latest HEAD of accelerate causes 0 loss immediately w FSDP by @winglian in #321
Prune cuda117 by @winglian in #327
update README for updated docker images by @winglian in #328
fix FSDP save of final model by @winglian in #329
pin accelerate so it works with llama2 by @winglian in #330
add peft install back since it doesn't get installed by setup.py by @winglian in #331
lora/qlora w flash attention fixes by @winglian in #333
feat/llama-2 examples by @mhenrichsen in #319
update README by @tmm1 in #337
Fix flash-attn + qlora not working with llama models by @tmm1 in #336
optimize the iteration when tokenizeing large datasets by @winglian in #332
Added Orca Mini prompt strategy by @jphme in #263
Update XFormers Attention Monkeypatch to handle Llama-2 70B (GQA) by @ssmi153 in #339
add a basic ds zero3 config by @winglian in #347
experimental llama 2 chat support by @jphme in #296
ensure enable_input_require_grads is called on model before getting the peft model by @winglian in #345
set group_by_length to false in all examples by @tmm1 in #350
GPU memory usage logging by @tmm1 in #354
simplify load_model signature by @tmm1 in #356
Clarify pre-tokenize before multigpu by @NanoCode012 in #359
Update README.md on pretraining_dataset by @NanoCode012 in #360
bump to latest bitsandbytes release with major bug fixes by @tmm1 in #355
feat(merge): save tokenizer on merge by @NanoCode012 in #362
Feat: Add rope scaling by @NanoCode012 in #343
Fix(message): Improve error message for bad format by @NanoCode012 in #365
fix(model loading): warn when model revision is passed to gptq by @NanoCode012 in #364
Add wandb_entity to wandb options, update example configs, update README by @morganmcg1 in #361
fix(save): save as safetensors by @NanoCode012 in #363
Attention mask and position id fixes for packing by @winglian in #285
attempt to run non-base docker builds on regular cpu hosts by @winglian in #369
revert previous change and build ax images w docker on gpu by @winglian in #371
extract module for working with cfg by @tmm1 in #372
quiet noise from llama tokenizer by setting pad token earlier by @tmm1 in #374
improve GPU logging to break out pytorch cache and system mem by @tmm1 in #376
simplify load_tokenizer by @tmm1 in #375
fix check for flash attn branching by @winglian in #377
fix for models loading on cpu when not using accelerate launch by @tmm1 in #373
save tokenizer before training starts by @winglian in #380
Feat(doc): Improve sharegpt doc by @NanoCode012 in #378
Fix crash when running without CUDA by @cg123 in #384
bump flash-attn to 2.0.4 for the base docker image by @winglian in #382
don't pass rope_scaling kwarg if it's None by @winglian in #383
new llama-2 default settings by @mhenrichsen in #370
Error msg for sharegpt if conv has less than 2 msg by @flotos in #379
Feat(config): Add hub_strategy by @NanoCode012 in #386
Added "epoch" evaluation_strategy by @flotos in #388
Feat(config): add max steps by @ittailup in #387
Feat(doc): Add max_steps to readme by @NanoCode012 in #389
don't use mask expansion for inference by @winglian in #392
use context manager to run things on rank0 before others by @winglian in #397
Feat(doc): Add how to save by epochs by @NanoCode012 in #396
add utils.data.prepare_dataset by @tmm1 in #398
better handling of empty input ids when tokenizing by @winglian in #395
fix eval steps and strategy by @winglian in #403
add templates, CoC and contributing guide by @lightningRalf in #126
Ax art by @winglian in #405
Fix(template): Remove iPhone/android from Issue template by @NanoCode012 in #407
update docs for tokenizer_legacy by @winglian in #401
Fix(docs): Update flash attn requirements by @NanoCode012 in #409
Fix(config): Update handling of deepspeed config by @NanoCode012 in #404
Feat(doc): Add lr_quadratic_warmup to readme by @NanoCode012 in #412
update path to align with fsdp example by @mhenrichsen in #413
tag with latest as well for axolotl-runpod by @winglian in #418
hopefully improve the README by @winglian in #419
use inputs for image rather than outputs for docker metadata by @winglian in #420
Fix(template): Inform to place stack trace to Issue by @NanoCode012 in #417
just resort to tags ans use main-latest by @winglian in #424
Fix(docs): Remove gptq+lora and fix xformer compat list by @NanoCode012 in #423
fix orca prompts by @winglian in #422
fix fixture for new tokenizer handling in transformers by @winglian in #428
remove extra accelearate in requirements by @winglian in #430
adds color by @mhenrichsen in #425
standardize attn hijack patches by @tmm1 in #381
flash attn pip install by @mhenrichsen in #426
set env for FSDP offload params by @winglian in #433
use save_strategy from config if available by @winglian in #434
fix comma, not a tuple by @winglian in #436
disable eval using multipack for now by @winglian in #437
docs(readme): add cd axolotl by @philpax in #440
support user defined prompters, pretokenized datasets in config, local parquet, local arrow files by @winglian in #348
gracefully handle empty input by @winglian in #442
fix evals by @winglian in #447
feat(doc): add pillow to lambda instructions by @NanoCode012 in #445
feat(docs): improve user customized prompts by @NanoCode012 in #443
add missing positional arg by @winglian in #450
is_causal fix for evals? by @winglian in #451
set env var for FSDP layer to wrap by @winglian in #453
always drop samples that are too long by @winglian in #452
recast loralayer, norm, lmhead + embed token weights per original qlora by @winglian in #393
feat: add Metharme prompt strategy by @TearGosling in #446
fix test fixture b/c hf trainer tokenization changed by @winglian in #464
workaround so training doesn't hang when packed dataloader batches aren't even by @winglian in #461
Fix(doc): Clarify config by @NanoCode012 in #466
ReLoRA implementation (with quantization) by @cg123 in #322
improve llama pad token handling by @winglian in #475
Fix(tokenizer): Fix condition to add pad token by @NanoCode012 in #477
fix types w lora by @winglian in #478
allow newer deps in requirements.txt by @tmm1 in #484
fix checkpints on multigpu by @winglian in #481
Fix missing 'packaging' wheel by @maximegmd in #482
fix: inference did not move the model to the correct device by @maximegmd in #483
let transformers handle adamw_bnb_8bit by @tmm1 in #486
Add example Llama 2 ReLoRA config by @cg123 in #471
Feat(doc): Update eval_steps doc by @NanoCode012 in #487
zero2 config by @mhenrichsen in #476
Feat(cfg): Add code-llama configs for all sizes by @mhenrichsen in #479
fix: finetune model inference needs the dtype fix to work with flash-attn by @maximegmd in #485
Fix(tokenizer): Make sure to add pad for CodeLlamaTokenizer by @NanoCode012 in #489
fsdp requires params be the same type too by @winglian in #493
simplify linear layer locator by @tmm1 in #495
pad_to_sequence_len, for reduced VRAM peak usage due to memory fragmentation by @Birch-san in #498
Refactor train cfg cli by @winglian in #499
tweak: use default config file when only one file is present by @maximegmd in #501
Fix(doc): Clarify no amp to full yaml docs by @NanoCode012 in #496
remove --force-reinstall from Dockerfile to ensure correct pytorch version by @tmm1 in #492
support for datasets with multiple names by @winglian in #480
customizable ascii art by @winglian in #506
add eval benchmark callback by @winglian in #441
set zero3 optimizer betas to auto so they inherit from HF trainer config by @tmm1 in #507
drop empty tokenized rows too by @winglian in #509
Changed Bench Eval to report metrics correctly by split. Added total accuracy and renamed previously used bench_accuracy to bench_average_accuracy. by @alpayariyak in #512
split train from other cli options by @winglian in #503
Added advanced DDP args by @jphme in #515
Debug tokenization output: Add ability to output text only (no tokens), and/or specify num samples to see by @TheBloke in #511
log supervised token count by @winglian in #448
Fix(doc): Inform Windows users to use WSL/docker by @NanoCode012 in #518
fix: bad dtype for full finetune by @maximegmd in #504
No gather single gpu by @winglian in #523
move is_llama_derived_model into normalize_config by @tmm1 in #524
use flash_attn xentropy when available by @tmm1 in #525
use flash_attn rmsnorm when available by @tmm1 in #526
Allow for custom system prompts with ShareGPT by @bdashore3 in #520
Add support for GPTQ using native transformers/peft by @winglian in #468
misc fixes/improvements by @winglian in #513
log rank by @winglian in #527
recommend padding when using sample packing by @winglian in #531
Early stopping metric by @winglian in #537
Adding NCCL Timeout Guide by @theobjectivedad in #536
update readme to point to direct link to runpod template, cleanup install instrucitons by @winglian in #532
add git environment variables to compose: avoid checkout failure erro… by @SlapDrone in #534
workaround for md5 variations by @winglian in #533
Update requirements.txt by @dongxiaolong in #543
fix for quant config from model by @winglian in #540
publish to pypi workflow on tagged release by @winglian in #549
remove with section, doesn't seem to work by @winglian in #551
pypi on tag push by @winglian in #552
Ergonomic update to optimizer config documentation by @theobjectivedad in #548
replace tags, build dist for pypi publish by @winglian in #553
add long_description for pypi push by @winglian in #555

New Contributors

@sroecker made their first contribution in #209
@MaciejKarasek made their first contribution in #206
@msinha251 made their first contribution in #235
@cg123 made their first contribution in #277
@teknium1 made their first contribution in #280
@theobjectivedad made their first contribution in #276
@ethanhs made their first contribution in #306
@tmm1 made their first contribution in #337
@ssmi153 made their first contribution in #339
@morganmcg1 made their first contribution in #361
@flotos made their first contribution in #379
@ittailup made their first contribution in #387
@lightningRalf made their first contribution in #126
@philpax made their first contribution in #440
@TearGosling made their first contribution in #446
@maximegmd made their first contribution in #482
@Birch-san made their first contribution in #498
@alpayariyak made their first contribution in #512
@TheBloke made their first contribution in #511
@bdashore3 made their first contribution in #520
@SlapDrone made their first contribution in #534
@dongxiaolong made their first contribution in #543

Full Changelog: v0.2.1...v0.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0

What's Changed

New Contributors

Contributors