-
Notifications
You must be signed in to change notification settings - Fork 36
Speeding Up Geth
Below are notes that mostly transcribe George Hotz' ideas on how to make Geth 10x faster.
George would actually claim he can make Geth 100x faster, but 10x of that is requiring faster hardware and larger disks to increase the gas limit.
Approximate figure at the time of writing (February 2022):
- The full Ethereum state takes ~40GB.
- This only stores the key-value pairs, and not the intermediate state trie nodes.
- Storing the whole trie including the intermediate nodes takes this figure to ~150GB.
- These two figures assume a fully pruned state!
- Storing all the full blocks in the chain takes ~150GB.
An anecdote from George is that he recently a full archive node sync. It took him ~2 days with Geth, and ~4 days. This is (a) incredibly fast and (b) the opposite of the commonly expected results. He also noted that during the sync, it was often the case that the process was CPU-bound instead of IO-bound (i.e. Geth's main process at 100% CPU utilization).
How did George make the sync so fast? He had a few tricks:
- He increase Geth's (and Erigon's) database cache size to 64GB (up from 16MB).
- LevelDB was writing on a distributed disk made up of multiple NVME SSDs (must be clarified, he did mention 4 1TB NVME drives in RAID in one of our conversations).
- He ran the sync over a GB ethernet connection.
- However, unclear to me (norswap) if this was a bottleneck at all.
Erigon's degraded performance can be explained by two facts:
- It separate sync in phases: first download all headers, then validate all header, then download all blocks, etc..
- Unclear to me (norswap) why this speed up the sync, I would expected the disk to be the clear bottleneck. Does improving cache locality & make codepaths "warmer" when running the blocks really make such a difference? I do trust the Erigon team that his does work in general.
- It writes more data to disk (related to Erigon's DB architecture, which as far as my limited understanding go, allows reading key-value pairs without traversing the state Merkle trie.
In general, George's solution to making Geth 10x faster comes down to:
- Improving the database.
- Optmizing Geth under the assumption that disk is not the bottleneck, but the CPU usage is.
Let's examine those in turn.
The biggest thing in speeding up geth is to speed up the database.
- Refactor geth to use a pure hash → preimage interface to the database.
- It mostly works like this today, but not entirely, which is a limiting factor.
- Such a refactor will also make it easy to implement beam sync in Geth.
- This might be George's EthDenver hackaton project.
- Beam sync allows a geth node to pull trie nodes on-demand from p2p in order to validate new block while sync is ongoing. This also makes it possible to sync "backwards" (from most recent block to oldest) though that is not mandatory.
- It also enables light clients that don't store the state at all.
- Beam sync is orthogonal to Snap sync, which allows syncing from a state snapshot.
- Beam sync has been mostly rejected on L1 because it would cause too much strain on the p2p network.
- On L2, beam sync would be useful to add additional L2 nodes without imposing the full disk requirements.
- Having full nodes as beam sync data providers is a way to incentivize having more nodes validating L2.
- If required, we can require the clients to store/cache the contracts (since they are the largest preimages, and don't take up a majority of the blockspace).
- Add a peformant and scalable caching layer, or switch to a database that has one built-in.
- George's improvement in sync speed mostly from increasing the cache size.
- To avoid latency, it's preferable to have the caching layer in-process.
- The database could be on its own machine and shared between multiple geth processes (each with their own cache).
- One of the big advantage of the caching layer is removing the need to wait after enacting a write.
- Another background thread can be in charge of periodially flushing things to the database.
- Change the backed to something more performant than LevelDB
- It would also be good to be able to prune the database "online" (i.e. while the node is running).
- norswap: I think geth has been building towards something like this, but not sure where it's at, and how compatible it is with a pure hash → preimage interface.
- It would also be good to be able to prune the database "online" (i.e. while the node is running).
The EVM implementation in Geth is a fairly naive interpreter. It could be sped up considerably by using JIT compilation techniques.
In particular, a minimal but highly impactful change would be to maintain a counter for each contract, and compile it from EVM bytecode to machine code when hitting a certain threshold.
Further (finer-grained) JIT compilation techniques can be applied, but they're significantly more complex and would bring a smaller speedup than compiling contracts.
A common objection is that gas metering implies a large overhead on the execution. While this is not untrue, George wasn't very worried about that (& I think he was right): it's just about adding an ADD instruction to a hot variable (or even better, dedicated register) after every instruction.
(norswap: This section has a little bit less of George's thinking and a bit more of my own thinking, though it is something we discussed and pretty much agreed on.)
Parallelizing the EVM is notoriously tricky, though very little has actually been attempted.
A "simple" way to do parallelization is to "optimistically" execute transactions over multiple threads, and have a main thread responsible for merging pre-executed transactions into the block one at a time if there is no state conflict.
- Each optimistic execution would have its own caching layer on top of the main cache.
- If we use our pure [hash → preimage] database, this layer can even be written down to the main cache (since hashes won't conflict), though this might be wasteful.
- This can get a bit messy if MEV extraction is involved and multiple ordering must be tried.
- Assuming the ordering of transactions is fixed, if a transaction's state conflicts with the state of the block so far (reading an already written variable), the transaction needs to be re-run by the main thread
- ... but because the variables read were pulled up in the upper cache through the main cache, many values read might still be in the cache.
- This doesn't improve CPU performance, but maximizes CPU utilization by minimizing the the time where the CPU is stalled waiting for I/O.
- In fact, even if we don't implement state merging logic, just running executions in parallel to warm up the cache might be worth it.