Releases: EricLBuehler/mistral.rs
Releases Β· EricLBuehler/mistral.rs
v0.1.24
Patch release, please update
What's Changed
- Bump version to 0.1.24 by @EricLBuehler in #497
Full Changelog: v0.1.23...v0.1.24
v0.1.23
What's Changed
- Improve and update docs by @EricLBuehler in #477
- Progress bar and logging when loading repeating layers by @EricLBuehler in #479
- Update deps by @EricLBuehler in #483
- Optimize decoding by removing redundant qkv transpose by @EricLBuehler in #487
- Fixes and tweak docs, logging for local loading by @EricLBuehler in #489
- Add the Gemma 2 model by @EricLBuehler in #490
- Update demo video by @EricLBuehler in #491
- Utilize new quantize_onto qtensor api by @EricLBuehler in #492
- Update deps by @EricLBuehler in #493
- Bump version to 0.1.23 by @EricLBuehler in #495
Full Changelog: v0.1.22...v0.1.23
v0.1.22
What's Changed
- Remove erroneously flaky CI test by @EricLBuehler in #466
- NVCC flags support for mistralrs_core build by @EricLBuehler in #469
- Prevent divide by zero in cuda kernel by @joshpopelka20 in #471
- Better cuda build.rs linking of stdc++ by @EricLBuehler in #472
- Remove some unnecessary
&mut
s by @EricLBuehler in #473 - Fix arg order for pdoc by @EricLBuehler in #474
- Bump version to 0.1.22 by @EricLBuehler in #475
Full Changelog: v0.1.21...v0.1.22
v0.1.21
What's Changed
- Expose idefics2 loader by @EricLBuehler in #450
- Try auto dtypes based on compute cap by @EricLBuehler in #453
- Fix dtype error for logit bias by @EricLBuehler in #454
- Fix sequence prompt len for Phi3-V by @EricLBuehler in #455
- Tune threshold for matmul via f16 by @EricLBuehler in #457
- Improve short/long scaling precision for LongRope by @EricLBuehler in #458
- Fix LongRope models position ids calculation by @EricLBuehler in #459
- Update deps by @EricLBuehler in #460
- Improve handling of errors in auto dtype selection by @EricLBuehler in #461
- Add support for cross-gpu device mapping by @EricLBuehler in #462
- Bump version to 0.1.21 by @EricLBuehler in #463
Full Changelog: v0.1.20...v0.1.21
v0.1.20
What's Changed
- Fix with docker images by fixing use of pyo3 by @EricLBuehler in #440
- Update readme with docker info by @EricLBuehler in #441
- Add Cargo.lock file by @EricLBuehler in #442
- Fix causal masks dtype by @EricLBuehler in #443
- Add support for Idefics 2 by @EricLBuehler in #309
- Bump to version 0.1.20 by @EricLBuehler in #449
Full Changelog: v0.1.19...v0.1.20
v0.1.19
What's Changed
- Format readme by @EricLBuehler in #427
- Remove multiple tracing initializations and init outside of mistralrs-core by @EricLBuehler in #428
- Run clippy by @EricLBuehler in #429
- adding reboot functionality by @gregszumel in #378
- Lower memory spike when loading with ISQ on CUDA by @EricLBuehler in #433
- Fix failing docs workflow by @EricLBuehler in #435
- Remove unused line in dockerignore by @EricLBuehler in #436
- Improve
Auto
dtype determination by @EricLBuehler in #438 - Bump version to 0.1.19 by @EricLBuehler in #439
Full Changelog: v0.1.18...v0.1.19
v0.1.18
What's Changed
- Switch to minijinja's pycompat mode by @mitsuhiko in #421
- chore: update speculative.rs by @eltociear in #423
- Bump to new commit of candle with cudarc 0.11.5 by @EricLBuehler in #424
- Use rev key instead of commit to get rid of warning by @EricLBuehler in #425
- Add nonzero and bitwise operators by @chenwanqq in #422
- Fix Python deps, base64 impl, add examples by @EricLBuehler in #426
New Contributors
- @mitsuhiko made their first contribution in #421
- @chenwanqq made their first contribution in #422
Full Changelog: v0.1.17...v0.1.18
v0.1.17
What's Changed
- Add and update template READMEs by @EricLBuehler in #405
- Improve Rust crates docs by @EricLBuehler in #406
- Expose phi3v loader and remove unused deps by @EricLBuehler in #408
- Support GGUF Mixtral format where experts are in one tensor by @EricLBuehler in #355
- Refactor with normal loading metadata for vision models by @EricLBuehler in #409
- Phi 3 vision ISQ support by @EricLBuehler in #410
- Remove causal masks cache by @EricLBuehler in #412
- Fix: use new slice_assign by @EricLBuehler in #415
- Fix Phi-3 GGUF by @EricLBuehler in #414
- Implement gpt2 (BPE) GGUF tokenizer conversion by @EricLBuehler in #397
- Support chat template from GGUF by @EricLBuehler in #416
- Expose API to specify dtype during loading by @EricLBuehler in #417
- Lock candle version to commit by @EricLBuehler in #419
- Bump version to 0.1.17 by @EricLBuehler in #420
Full Changelog: v0.1.16...v0.1.17
v0.1.16
Summary
- Various fixes
- Excellent work on refactoring by @polarathene
- First vision model: Phi 3 vision
What's Changed
- Implement the Phi 3 vision model by @EricLBuehler in #351
- Bump version again to 0.1.15 by @EricLBuehler in #390
- Add docs for installing huggingface-cli by @EricLBuehler in #391
- Fix metal loading issue by loading sequentially by @EricLBuehler in #394
- Fix logging in gguf and ggml by @EricLBuehler in #399
- Add fused bias linear layer with cublaslt by @EricLBuehler in #400
- docs: Resolve CI lints on docs by @polarathene in #401
- Refactor: GGUF metadata tokenizer by @polarathene in #389
- Add
Nonzero
layer by @EricLBuehler in #402 - Bump version to 0.1.16 by @EricLBuehler in #404
Full Changelog: v0.1.15...v0.1.16
v0.1.15
What's Changed
- Patch incorrect unwrap and bump version by @EricLBuehler in #383
Full Changelog: v0.1.14...v0.1.15