Releases · LLukas22/llm-rs-python

Models which were converted this way, can be easily quantized or loaded via the AutoQuantizer or AutoModel without the need to specifiy the architecture.

Assets 2

19 May 16:33

LLukas22

0.2.1

c691fee

Added quantization support

The ability to quantize models is now available for every architecture via quantize.

Assets 2

19 May 09:40

LLukas22

0.2.0

d12f123

LoRA & MPT Support

Added support for Mosaic ML's MPT models.
Added support for LoRA adapters for all architectures.

⚠️Caution⚠️
Due to changes in the ggml format old quantized models are not supported anymore!

Assets 2

08 May 14:10

LLukas22

0.1.1

659a3d2

Tokenization & GIL free generation

Added the tokenize and decode functions to each model, to enable access to the internal tokenizer.

The generation of tokens is now GIL free, meaning other background threads can run at the same time.

Assets 2

03 May 15:17

LLukas22

0.1.0

640e7fc

Support Multiple Model Architectures

Since llama-rs was renamed to llm and now supports multiple model architectures, this wrapper was also expanded to support the new trait system and library structure.

Supported architectures for now: