-
Notifications
You must be signed in to change notification settings - Fork 396
Description
This issue is intended as "tracker issue" for implementing benchmarks which are Fusaka-compatible which cannot create incorrect benchmark results due to optimizations (parallel execution).
The problem
With Fusaka, we now have a protocol limit of ~16.7M gas per transaction. This means that for benchmarks which target gas limits higher than this limit, we have to split up transactions. This can give skewed results: if these transactions can be parallelized, this means that the execution time could be similar to running a single transaction. The reason is that two transactions are independent, and an optimized client can now run the two transactions on the same state, only to "merge" the results later on.
For simplicity let's assume we have a benchmark test for the KECCAK256 (0x20) opcode. This reads CALLDATA(0) and then keeps hashing this until there is only X amount gas left, where it will SSTORE the topmost stack item at slot 0 (this slot item is already initialized to some nonzero value).
This can be parallelized if we would split up the transactions into multiple, and each transaction has the same gas limit. The end result for all transactions are the same: the gas used and the final item in slot 0. There are multiple ways to make this test non-parallelizable: changing the calldata of each tx is one, setting different gas limits is another (this yields a different value in the final slot), but likely the most effective one is changing this test such that it does not start hashing from the calldata, but instead read the start value from slot 0 and also store this value to slot 0. The result is that we have to calculate a hash chain and there is no way to parallelize this. As bonus: if this is set up like this, we can immediately see the benefits of Block Level Access lists , because this allows it to be parallelize it, as it saves the intermediate values of the hash chain to the BAL.
Problems to solve
- If the benchmark, once split up, can be parallelized, this will skew the results if clients run these transactions on multiple cores, making it look like twice the gas limit runs about the same time as on the previous gas limit (thus making it look twice at fast)
- We should update the benchmarks (or maybe write a benchmark "mode" or flag for this) to make these non-parallelizable
- This immediately unlocks the benefits of BALs: these tests without BALs and with BALs should show these tests are parallelizable
Note that if we add this logic into the benchmark to run, this is thus overhead for the test, and the opcode-under-test will thus be ran less times than previously. It is likely helpful to introduce a test helper to make these items sequential. For instance, this is how it could look:
We have a test A, which for instance just tests ADD a lot of times (in a loop).
In order to make this sequential, we have to make new txs depend on the previous ones. To do this, we could:
Have a contract which has the code of test A. We now introduce root contract B which calls into A (this is the overhead). Root contract B calls A with current GAS (note: this will thus send at most 63/64 of current gas). Since we have 1/64 of current gas left, if the transaction has a high enough gas limit (it will do in most cases), we can now SSTORE(0, ADD(SLOAD(0), 1)). This will simply increment the value of slot 0 by 1 and will thus make this unparallelizable. (Of course a smart optimized could detect this logic and could still run this sequentially). To make this "better" unparallelizable some input value of previous tx should be fed to the target code and we should store the result of this target code.