Towards proving large language models #397

pnyda · 2023-08-06T08:39:55Z

pnyda
Aug 6, 2023

The problem with zkML right now is that it's slow to prove big models. One way to mitigate this problem is weight quantization. LLMs such as LLaMa can do inference fine with INT4 instead FP32. https://arxiv.org/abs/2210.17323
How do we leverage this to speed up EZKL? Can we put all of 4bit arithmetic result into a lookup table and stop using arithemetic gates for most parts? or should we use Plonky2 / Starky instead of Halo2 as it can do 32bit uint arithmetics faster? What do you guys think?

jasonmorton · 2023-09-06T15:47:55Z

jasonmorton
Sep 6, 2023
Maintainer

We already use quantization, although we have not tried much with the int4 quantization (usually using something larger). Using lookup instead of arithmetic is also worth investigating. My guess is that the performance would be worse right now, but that could change with faster lookup arguments.

0 replies

pnyda · 2023-09-07T15:46:32Z

pnyda
Sep 7, 2023
Author

Yeah I was surprised to see Onnx not supporting INT4... I heard some guys are working on integrating Lasso in Halo 2. If the PR got merged worth benchmarking.

1 reply

alexander-camuto Sep 7, 2023
Maintainer

privacy-scaling-explorations/halo2#194

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Towards proving large language models #397

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Towards proving large language models #397

pnyda Aug 6, 2023

Replies: 2 comments · 1 reply

jasonmorton Sep 6, 2023 Maintainer

pnyda Sep 7, 2023 Author

alexander-camuto Sep 7, 2023 Maintainer

pnyda
Aug 6, 2023

Replies: 2 comments 1 reply

jasonmorton
Sep 6, 2023
Maintainer

pnyda
Sep 7, 2023
Author

alexander-camuto Sep 7, 2023
Maintainer