Replies: 2 comments 1 reply
-
We already use quantization, although we have not tried much with the int4 quantization (usually using something larger). Using lookup instead of arithmetic is also worth investigating. My guess is that the performance would be worse right now, but that could change with faster lookup arguments. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Yeah I was surprised to see Onnx not supporting INT4... I heard some guys are working on integrating Lasso in Halo 2. If the PR got merged worth benchmarking. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The problem with zkML right now is that it's slow to prove big models. One way to mitigate this problem is weight quantization. LLMs such as LLaMa can do inference fine with INT4 instead FP32. https://arxiv.org/abs/2210.17323
How do we leverage this to speed up EZKL? Can we put all of 4bit arithmetic result into a lookup table and stop using arithemetic gates for most parts? or should we use Plonky2 / Starky instead of Halo2 as it can do 32bit uint arithmetics faster? What do you guys think?
Beta Was this translation helpful? Give feedback.
All reactions