-
Notifications
You must be signed in to change notification settings - Fork 170
test(benchmark): implement CREATE2 addressing for bloatnet tests #2090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Guillaume Ballet <[email protected]>
Signed-off-by: Guillaume Ballet <[email protected]>
Signed-off-by: Guillaume Ballet <[email protected]>
Signed-off-by: Guillaume Ballet <[email protected]>
Signed-off-by: Guillaume Ballet <[email protected]>
Signed-off-by: Guillaume Ballet <[email protected]> remove leftover single whitespace :|
Signed-off-by: Guillaume Ballet <[email protected]>
- Add CREATE2 deterministic address calculation to overcome 24KB bytecode limit - Fix While loop condition to properly iterate through contracts - Account for memory expansion costs in gas calculations - Add safety margins (50k gas reserve, 98% utilization) for stability - Tests now scale to any gas limit without bytecode constraints - Achieve 98% gas utilization with 10M and 20M gas limits
@CPerezz Thanks for this PR, I review the last commit, for the
Based on the description, I suggest some optimization below. I compare the two versions with the command, using 10M as gas limit:
Based on current implementation, the Your current approach is (1) create a lot of contract via CREATE2 (2) calling these contract with On the other side, you mention the difficulty of consuming all the gas in the block here. You can pass this value to Please let me know what do you think! |
tests/benchmark/test_bloatnet.py
Outdated
deploy_bytecode = Bytecode(Op.STOP) # First byte is STOP for safety | ||
pattern_count = 0 | ||
while len(deploy_bytecode) < max_contract_size - 100: | ||
unique_value = Hash(pattern_count) | ||
deploy_bytecode += Op.PUSH32[unique_value] + Op.POP | ||
pattern_count += 1 | ||
while len(deploy_bytecode) < max_contract_size: | ||
deploy_bytecode += Op.JUMPDEST | ||
|
||
assert len(deploy_bytecode) == max_contract_size, ( | ||
f"Contract size mismatch: {len(deploy_bytecode)}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Hash(pattern)
, do you expect a keccak256
hashing result? If so, i think this method does not work as expected here. I print out Hash(1)
and it turns out to be 0x0000000000000000000000000000000000000000000000000000000000000001
.
If this is what you expected, what about simplify this as the following:
deploy_bytecode = Op.STOP + sum(
(Op.POP(Hash(i)) for i in range(max_contract_size // 4)), Bytecode()
)
deploy_bytecode += Op.JUMPDEST * (max_contract_size - len(deploy_bytecode)) # For padding to max contract size
Although Hash(i)
is 32 bytes here, but when pushing to the stack, the immediate bytes would be packed in 1-2 bytes.
PUSH(Hash(0))
Hash(0)
->0x0000000000000000000000000000000000000000000000000000000000000001
.PUSH(Hash(0))
- >PUSH1 0x01
(1 byte)
PUSH(Hash(1000))
Hash(1000)
->0x00000000000000000000000000000000000000000000000000000000000003E8
PUSH(Hash(1000))
->PUSH2 0x03E8
(2 bytes)
And due to the max contract size limit (0x6000
), each iteration could be packed into 4 bytes sequence: PUSH32 values POP
If you want a keccak256
-like hashing result, consider something like this (You might need to resolve some typing issue)
deploy_bytecode = Op.STOP + sum(
(Op.POP(keccak256(i)) for i in range(max_contract_size // len(Op.POP(keccak256(0))))), Bytecode()
)
deploy_bytecode += Op.JUMPDEST * (max_contract_size - len(deploy_bytecode)) # For padding to max contract size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e70132b should bring some light on what I'm trying to do with the Hash usage.
If it's not clear still, I can expand more on it.
) | ||
attack_code += Op.POP # Clean up counter | ||
# Store success flag for validation | ||
attack_code += Op.SSTORE(SUCCESS_FLAG_SLOT, SUCCESS_FLAG_VALUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess your intention here is to make sure all the operation works as expected, no OOG, invalid JUMPDEST
or other issue during execution, and the program process until the end.
Using SSTORE
is good, but might be too expensive here. What we do now is to calculated the expected usage and pass it to expected_benchmark_gas_used
params in blockchain_test
.
More explanation can be found in Mattermost:
For instance, if you are benchmarking a block with a 45M gas limit but the execution only consumes 44.9M, then expected_benchmark_gas_used should be set to 44.9M.
This acts as a safeguard to verify the benchmark behaves as intended. For example, if we create a benchmark test using the JUMP operation but the destination is invalid, each transaction would terminate early with an “invalid jump destination” error. In that case, the gas consumed would fall short of what we expected, and the check would catch this problem.
If you remove this SSTORE operation, please also remove the following variable, and update the post
storage
# Storage slot for success flag
SUCCESS_FLAG_SLOT = 0
SUCCESS_FLAG_VALUE = 42
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to avoid using expected benchmark gas used as depending on the gas limit set, it needs to be precisely computed before.
Whereas, with this, we don't have this issue and also:
- We have a nice way to make sure the whole test run correctly and completed execution & state modifications.
- It only accounts for 0.06% of the total gas (in a 10M gas block) so it's negligible.
- Post-state validation passes (empty)
memory_expansion_cost = fork.memory_expansion_gas_calculator()( | ||
new_bytes=CREATE2_MEMORY_OFFSET + 85 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the comment above works by hardcoding the address, we no longer need to calculate CREATE2
address, thus no memory expansion needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned previously. We can't hardcode addresses as this would force us to use CALL chains that would be super deep and consume most of the gas.
The 63/64 Rule:
- EIP-150 also formalized that CALL operations can only forward 63/64 of remaining gas
- This prevents deep call stack attacks
Also, on top of that we would have a TON of contract deployments to do that just hold addresses making the tests significantly slower.
pre[create2_addr] = Account(code=deploy_bytecode) | ||
deployed_contracts.append(create2_addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test run as expected, but I am not very whether this a good approach of using preAlloc
. cc @marioevz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to never do pre[address] = account
approach because it basically breaks our assumptions and we cannot run it in execute
mode.
By the name of the file and test function, I assume running in bloatnet with execute is the goal, so there's a couple of questions:
- Should the factory contract be already in the state of bloatnet?
- Should the contracts deployed by the factory contract also already be deployed in bloatnet before the test starts?
Both cases I feel should be stubbed contracts.
For the factory contract it's already doable and we simply need to add the appropriate label.
For the contracts deployed by the factory contract it's a bit tricky because we: (a) label them all and pre-deploy them to the bloatnet, or (b) find a more automatic way of pre-deploy if these are not already there.
I think this is all pretty new and the discussion is still open on how to handle this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the issue.
We can definitely deploy the factory on Bloatnet and hardcode the address here. Though it would be a bit odd. It will be self-contained within this module. So it's not like it pollutes the whole codebase.
I guess this is the best shot we have. But notice that we will need to use stubbed contracts for every single test we do most likely. So we will have a lot of hardcoded addressess.
If that's ok. Then I'm happy to pre-deploy them in Bloatnet and add some pre-test validations that make sure that the factory is available.
Does that work @LouisTsai-Csie @marioevz ?
Otherwise, the alternative is to deploy a factory and all the sub-contracts on each benchmark run (which for bloatnet purposes is benefitial (more load on I/O).
But I'm not sure if that's ok with you guys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a convo with @gballet :
So my point is: We need to somehow fix this issue. We have various avenues:
- We move the pre-state filling into the main benchmark body (counts, this will enforce all the deployment for each benchmark which can take a huge amount of time - tons of deployments per benchmark to run -.
- We generate the pre-state and deploy it in bloatnet and use stubbed contracts as a way to access it. (It's cleaner, but we should just deploy factories, which brigs us again to the deploy problem).
2.1 We can deploy (all the contracts). But this would mean a lot of hardcoded addresses in our benchmarks. (Maybe not an issue, but any modification will require a re-deployment of the whole stuff).
Hey @LouisTsai-Csie thanks for the comments.
As per this, notice that there's a constraint to what you propose which is that a contract is capped at 24kB atm. Thus, we can only fit ~945 addresses in it. Therefore, unless I'm missing something, this should be the best way to do this (otherwise we can have a chain of contracts that calls each one another as a linkedlist, but it's quite complex for the benefit (arround 6%). |
- Remove gas reserve and 98% utilization logic for contract calculations - Directly calculate the number of contracts based on available gas - Introduce precise expected gas usage calculations for better accuracy - Ensure tests scale effectively without unnecessary constraints
…mization - Update tests to generate unique bytecode for each contract, maximizing I/O reads during benchmarks. - Clarify comments regarding bytecode generation and its impact on gas costs. - Ensure CREATE2 addresses are calculated consistently using a base bytecode template. - Improve test descriptions to reflect the changes in contract deployment strategy.
…utility function - Remove the custom `calculate_create2_address` function in favor of the `compute_create2_address` utility. - Update tests to utilize the new utility for consistent CREATE2 address calculations. - Simplify code by eliminating unnecessary complexity in address calculation logic. - Ensure that the CREATE2 prefix is directly set to 0xFF in the memory operation for clarity.
@CPerezz Thanks for clarification, after revisit the suggested changes and re-run the code again, it indeed exceeds the max code size - I did not notice this since it was successful on my computer, but i later realized that our framework does not catch the max code size limit issue for Since this PR is still based on previous PR, so i could not help you run the CI, but I hope you could verify the test locally through |
🗒️ Description
🔗 Related Issues or PRs
#1986
✅ Checklist
tox
checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
type(scope):
.mkdocs serve
locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.