optimize `infer_tower_product_witness` with less traverse #780

hero78119 · 2024-12-20T15:35:56Z

design rationale

a simple way by just iterate chunk size 2 at once.

benchmark

this result benefits opcode read/write product argument witness inferring.

with command

cargo bench --bench fibonacci --package ceno_zkvm -- --baseline baseline

on ceno server it shows around 3-4% fibonacci e2e performance

fibonacci_max_steps_1048576/prove_fibonacci/fibonacci_max_steps_1048576
                        time:   [3.9635 s 3.9860 s 4.0097 s]
                        change: [-5.4827% -4.7524% -3.9562%] (p = 0.00 < 0.05)
                        Performance has improved.

matthiasgoergens · 2024-12-24T03:51:21Z

ceno_zkvm/src/scheme/utils.rs

-                                .with_min_len(MIN_PAR_SIZE)
-                                .map(|(v, evaluations)| *evaluations *= *v)
-                                .collect()
+                    next_layer.chunks_exact(2).for_each(|f| {


I would suggest using .tuples so that the compiler can help you more.

Something a bit like this:

diff --git a/ceno_zkvm/src/scheme/utils.rs b/ceno_zkvm/src/scheme/utils.rs index 26a67c3e..7bac7044 100644 --- a/ceno_zkvm/src/scheme/utils.rs +++ b/ceno_zkvm/src/scheme/utils.rs @@ -211,21 +211,21 @@ pub(crate) fn infer_tower_product_witness<E: ExtensionField>( let cur_layer: Vec<ArcMultilinearExtension<E>> = (0..num_product_fanin) .map(|index| { let mut evaluations = vec![E::ONE; cur_len]; - next_layer.chunks_exact(2).for_each(|f| { - match (f[0].evaluations(), f[1].evaluations()) { + next_layer + .iter() + .map(|f| f.evaluations()) + .tuples() + .for_each(|(f1, f2)| match (f1, f2) { (FieldType::Ext(f1), FieldType::Ext(f2)) => { let start: usize = index * cur_len; - (start..(start + cur_len)) + (start..start + cur_len) .into_par_iter() - .zip(evaluations.par_iter_mut()) + .zip(&mut evaluations) .with_min_len(MIN_PAR_SIZE) - .map(|(index, evaluations)| { - *evaluations *= f1[index] * f2[index] - }) + .map(|(index, evaluation)| *evaluation *= f1[index] * f2[index]) .collect() } _ => unreachable!("must be extension field"), - } }); evaluations.into_mle().into() })

thanks for the suggestion, but tuples will silently skip the remainder for non-even result without error hints. I would prefer stick to current way of chunks_exact(2) so it's self-explanantion, and terminate with error precisely in runtime

iterate 2 vector at once

4dad4b4

matthiasgoergens reviewed Dec 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize `infer_tower_product_witness` with less traverse #780

optimize `infer_tower_product_witness` with less traverse #780

hero78119 commented Dec 20, 2024

matthiasgoergens Dec 24, 2024

hero78119 Dec 24, 2024 •

edited

Loading

optimize infer_tower_product_witness with less traverse #780

Are you sure you want to change the base?

optimize infer_tower_product_witness with less traverse #780

Conversation

hero78119 commented Dec 20, 2024

design rationale

benchmark

matthiasgoergens Dec 24, 2024

Choose a reason for hiding this comment

hero78119 Dec 24, 2024 • edited Loading

Choose a reason for hiding this comment

optimize `infer_tower_product_witness` with less traverse #780

optimize `infer_tower_product_witness` with less traverse #780

hero78119 Dec 24, 2024 •

edited

Loading