Support in-situ conversion of GPTQ model to marlin format (naive GPTQ kernel also supported). #92

guoqingbao · 2024-10-21T10:21:40Z

For quantized 4-bit GPTQ model:

cargo run --release -- --port 2000 --weight-path /home/mistral_7b-int4/ mistral --quant marlin

It performs in-situ conversion of GPTQ model to marlin format during model loading.

Please note:

Marlin format in-situ conversion only support 4-bit GPTQ (with sym=True, groupsize=128 or -1, desc_act=False).

…to the program.

… model

… kernel also supported).

guoqingbao added 27 commits August 13, 2024 14:04

Support in-situ quantization

899515a

Typo fix

6e791f5

Cargo fmt

504398d

Optimize quantized matmul in batch processing & update Q4K results

a3e1fc4

Merge branch 'master' into develop

7309f55

Fix bug for non-stream response

80f56ae

Ask users to provide huggingface token if no token cached and passed …

bd476d3

…to the program.

No crash when both hidden_act and hidden_activation are set for gemma…

afb50f3

… model

Print the number of decoded tokens for each request

616ffc6

Merge branch 'master' into develop

573a61a

Restore previous bug fix

360a227

Support softcapping (Gemma-2 models)

a33884f

Merge branch 'master' into develop

f3b1a7d

Update lib.rs

761067e

Fix Gemma-2 multiple eos/bos ids

ff84499

Custom benchmark with parameters

2c81291

Mention arguments for benchmark.py

221eace

Tweak

08f9491

Support GPTQ/Marlin format quantization (4bit weight, f16 input)

e23d8ae

Merge branch 'master' into develop

d4239ef

Support bf16 inputs for GPTQ/Marlin format quantization

d40c2b0

Merge branch 'develop' into develop

2d5b452

Merge remote-tracking branch 'eric/master' into develop

b7c6cd1

Merge remote-tracking branch master into develop

4b2fe7d

Add an example for marlin format conversion & update results

6078814

Typo fix

08b5e8a

Support in-situ conversion of GPTQ model to marlin format (naive GPTQ…

30a5040

… kernel also supported).

guoqingbao merged commit 3ebf3b8 into master Oct 21, 2024
6 checks passed

Provide feedback