[Cairo 1 Run] Refactor integration tests + check that return values taken from output segment are correct #1741

fmoletta · 2024-04-29T13:56:09Z

After PR #1686 the return values are now fetched from the output segment instead of the execution segment when either the append_return_values or proof_mode flags are enabled, this makes our check_append_ret_values_to_output_segment not as useful as it no longer checks that the correct values are being copied to the output segment.
A way to fix it would be to instead compare the values from the execution segment to the ones on the output segment, but after PR #1721 when the segment arena is used the execution segment now contains the values produced by the arena validation where the return values used to be found, so we can no longer use it for comparison.
As a result of these two changes, the best way to test that the correct values are copied to the output segment are to 1: test that the output segment indeed has the return values outputted and no more values after them (current behaviour), and 2: test that the return values outputted when running with --append_return_values match the ones outputted by a normal run (this can be accomplished by adding a third case with --append_return_values enabled to our in integration tests.
This PR adds this third test case and also refactors the integration tests to into one test with multiple cases & values so adding new checks and argument combinations is easier.

github-actions · 2024-04-29T14:06:01Z

**Hyper Thereading Benchmark results**




hyperfine -r 2 -n "hyper_threading_main threads: 1" 'RAYON_NUM_THREADS=1 ./hyper_threading_main' -n "hyper_threading_pr threads: 1" 'RAYON_NUM_THREADS=1 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 1
  Time (mean ± σ):     30.347 s ±  0.165 s    [User: 29.631 s, System: 0.714 s]
  Range (min … max):   30.230 s … 30.463 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 1
  Time (mean ± σ):     30.497 s ±  0.063 s    [User: 29.734 s, System: 0.762 s]
  Range (min … max):   30.452 s … 30.542 s    2 runs
 
Summary
  'hyper_threading_main threads: 1' ran
    1.00 ± 0.01 times faster than 'hyper_threading_pr threads: 1'




hyperfine -r 2 -n "hyper_threading_main threads: 2" 'RAYON_NUM_THREADS=2 ./hyper_threading_main' -n "hyper_threading_pr threads: 2" 'RAYON_NUM_THREADS=2 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 2
  Time (mean ± σ):     16.224 s ±  0.037 s    [User: 29.951 s, System: 0.718 s]
  Range (min … max):   16.198 s … 16.250 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 2
  Time (mean ± σ):     16.249 s ±  0.011 s    [User: 30.011 s, System: 0.712 s]
  Range (min … max):   16.241 s … 16.257 s    2 runs
 
Summary
  'hyper_threading_main threads: 2' ran
    1.00 ± 0.00 times faster than 'hyper_threading_pr threads: 2'




hyperfine -r 2 -n "hyper_threading_main threads: 4" 'RAYON_NUM_THREADS=4 ./hyper_threading_main' -n "hyper_threading_pr threads: 4" 'RAYON_NUM_THREADS=4 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 4
  Time (mean ± σ):     12.047 s ±  0.010 s    [User: 42.072 s, System: 0.941 s]
  Range (min … max):   12.040 s … 12.054 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 4
  Time (mean ± σ):     11.569 s ±  0.391 s    [User: 42.293 s, System: 0.916 s]
  Range (min … max):   11.292 s … 11.845 s    2 runs
 
Summary
  'hyper_threading_pr threads: 4' ran
    1.04 ± 0.04 times faster than 'hyper_threading_main threads: 4'




hyperfine -r 2 -n "hyper_threading_main threads: 6" 'RAYON_NUM_THREADS=6 ./hyper_threading_main' -n "hyper_threading_pr threads: 6" 'RAYON_NUM_THREADS=6 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 6
  Time (mean ± σ):     11.634 s ±  0.156 s    [User: 41.785 s, System: 0.933 s]
  Range (min … max):   11.523 s … 11.744 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 6
  Time (mean ± σ):     11.702 s ±  0.050 s    [User: 41.669 s, System: 1.012 s]
  Range (min … max):   11.667 s … 11.738 s    2 runs
 
Summary
  'hyper_threading_main threads: 6' ran
    1.01 ± 0.01 times faster than 'hyper_threading_pr threads: 6'




hyperfine -r 2 -n "hyper_threading_main threads: 8" 'RAYON_NUM_THREADS=8 ./hyper_threading_main' -n "hyper_threading_pr threads: 8" 'RAYON_NUM_THREADS=8 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 8
  Time (mean ± σ):     11.277 s ±  0.151 s    [User: 42.107 s, System: 0.974 s]
  Range (min … max):   11.171 s … 11.384 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 8
  Time (mean ± σ):     11.402 s ±  0.135 s    [User: 42.155 s, System: 0.984 s]
  Range (min … max):   11.306 s … 11.497 s    2 runs
 
Summary
  'hyper_threading_main threads: 8' ran
    1.01 ± 0.02 times faster than 'hyper_threading_pr threads: 8'




hyperfine -r 2 -n "hyper_threading_main threads: 16" 'RAYON_NUM_THREADS=16 ./hyper_threading_main' -n "hyper_threading_pr threads: 16" 'RAYON_NUM_THREADS=16 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 16
  Time (mean ± σ):     11.349 s ±  0.294 s    [User: 42.798 s, System: 0.989 s]
  Range (min … max):   11.141 s … 11.558 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 16
  Time (mean ± σ):     11.334 s ±  0.178 s    [User: 42.446 s, System: 1.075 s]
  Range (min … max):   11.208 s … 11.460 s    2 runs
 
Summary
  'hyper_threading_pr threads: 16' ran
    1.00 ± 0.03 times faster than 'hyper_threading_main threads: 16'

codecov · 2024-04-29T14:11:24Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.80%. Comparing base (0df3f34) to head (3c1ad7c).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1741      +/-   ##
==========================================
- Coverage   94.81%   94.80%   -0.01%     
==========================================
  Files         101      101              
  Lines       38720    38689      -31     
==========================================
- Hits        36711    36680      -31     
  Misses       2009     2009

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2024-04-29T14:15:17Z

Benchmark Results for unmodified programs 🚀

Command	Mean [s]	Min [s]	Max [s]	Relative
`base big_factorial`	2.371 ± 0.021	2.355	2.425	1.00 ± 0.01
`head big_factorial`	2.365 ± 0.015	2.349	2.390	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base big_fibonacci`	2.358 ± 0.013	2.341	2.389	1.02 ± 0.01
`head big_fibonacci`	2.320 ± 0.014	2.296	2.341	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base blake2s_integration_benchmark`	8.752 ± 0.111	8.590	8.868	1.00 ± 0.02
`head blake2s_integration_benchmark`	8.737 ± 0.119	8.571	8.977	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base compare_arrays_200000`	2.422 ± 0.029	2.388	2.478	1.01 ± 0.02
`head compare_arrays_200000`	2.404 ± 0.022	2.368	2.432	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base dict_integration_benchmark`	1.556 ± 0.003	1.549	1.560	1.01 ± 0.01
`head dict_integration_benchmark`	1.544 ± 0.016	1.528	1.583	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base field_arithmetic_get_square_benchmark`	1.432 ± 0.012	1.417	1.448	1.00
`head field_arithmetic_get_square_benchmark`	1.444 ± 0.032	1.413	1.505	1.01 ± 0.02

Command	Mean [s]	Min [s]	Max [s]	Relative
`base integration_builtins`	8.683 ± 0.106	8.562	8.854	1.00 ± 0.02
`head integration_builtins`	8.657 ± 0.089	8.552	8.796	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base keccak_integration_benchmark`	8.941 ± 0.119	8.806	9.070	1.00 ± 0.02
`head keccak_integration_benchmark`	8.909 ± 0.096	8.779	9.038	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base linear_search`	2.437 ± 0.021	2.409	2.462	1.01 ± 0.01
`head linear_search`	2.417 ± 0.007	2.408	2.429	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base math_cmp_and_pow_integration_benchmark`	1.914 ± 0.021	1.892	1.967	1.00 ± 0.01
`head math_cmp_and_pow_integration_benchmark`	1.906 ± 0.014	1.891	1.936	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base math_integration_benchmark`	1.712 ± 0.009	1.703	1.727	1.01 ± 0.01
`head math_integration_benchmark`	1.700 ± 0.009	1.685	1.715	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base memory_integration_benchmark`	1.340 ± 0.007	1.329	1.352	1.00 ± 0.01
`head memory_integration_benchmark`	1.333 ± 0.007	1.325	1.346	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base operations_with_data_structures_benchmarks`	1.998 ± 0.012	1.982	2.024	1.01 ± 0.01
`head operations_with_data_structures_benchmarks`	1.975 ± 0.008	1.963	1.985	1.00

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`base pedersen`	562.2 ± 2.4	557.4	567.1	1.00 ± 0.01
`head pedersen`	562.1 ± 2.9	559.4	568.3	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base poseidon_integration_benchmark`	1.014 ± 0.023	1.005	1.079	1.01 ± 0.02
`head poseidon_integration_benchmark`	1.003 ± 0.005	0.994	1.010	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base secp_integration_benchmark`	2.001 ± 0.020	1.980	2.044	1.00 ± 0.02
`head secp_integration_benchmark`	1.997 ± 0.023	1.976	2.055	1.00

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`base set_integration_benchmark`	751.2 ± 4.9	746.7	760.0	1.01 ± 0.01
`head set_integration_benchmark`	745.7 ± 2.8	741.6	749.4	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base uint256_integration_benchmark`	4.844 ± 0.069	4.780	4.946	1.00
`head uint256_integration_benchmark`	4.847 ± 0.039	4.775	4.902	1.00 ± 0.02

This reverts commit ef59dd8.

pefontana

nice one!

Fix test

ef59dd8

fmoletta added the tests Implementation of tests label Apr 29, 2024

fmoletta added 5 commits April 29, 2024 11:15

Revert "Fix test"

024dbf3

This reverts commit ef59dd8.

Start test refactor

b23ece6

Add args

138babd

refactor all tests

bb3555e

Add disclaimer

c76fcac

fmoletta changed the title ~~[WIP][Cairo 1 Run] Fix check_append_ret_values_to_output_segment test~~ [Cairo 1 Run] Refactor integration tests + check that return values taken from output segment are correct Apr 29, 2024

fmoletta added 3 commits April 29, 2024 12:36

clippy

65392c3

Add comments

46525f0

use slice

ad8fcce

fmoletta marked this pull request as ready for review April 29, 2024 15:55

fmoletta requested review from igaray, Oppen, juanbono and pefontana as code owners April 29, 2024 15:55

pefontana and others added 2 commits April 29, 2024 14:49

Merge branch 'main' into fix-append-return-values-test

9e0544a

Merge branch 'main' into fix-append-return-values-test

3c1ad7c

pefontana approved these changes Apr 30, 2024

View reviewed changes

igaray approved these changes Apr 30, 2024

View reviewed changes

fmoletta added this pull request to the merge queue Apr 30, 2024

Merged via the queue into main with commit 73e188d Apr 30, 2024
72 checks passed

fmoletta deleted the fix-append-return-values-test branch April 30, 2024 21:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cairo 1 Run] Refactor integration tests + check that return values taken from output segment are correct #1741

[Cairo 1 Run] Refactor integration tests + check that return values taken from output segment are correct #1741

fmoletta commented Apr 29, 2024 •

edited

Loading

github-actions bot commented Apr 29, 2024 •

edited

Loading

codecov bot commented Apr 29, 2024 •

edited

Loading

github-actions bot commented Apr 29, 2024 •

edited

Loading

pefontana left a comment

[Cairo 1 Run] Refactor integration tests + check that return values taken from output segment are correct #1741

[Cairo 1 Run] Refactor integration tests + check that return values taken from output segment are correct #1741

Conversation

fmoletta commented Apr 29, 2024 • edited Loading

github-actions bot commented Apr 29, 2024 • edited Loading

codecov bot commented Apr 29, 2024 • edited Loading

Codecov Report

github-actions bot commented Apr 29, 2024 • edited Loading

pefontana left a comment

Choose a reason for hiding this comment

fmoletta commented Apr 29, 2024 •

edited

Loading

github-actions bot commented Apr 29, 2024 •

edited

Loading

codecov bot commented Apr 29, 2024 •

edited

Loading

github-actions bot commented Apr 29, 2024 •

edited

Loading