This repository has been archived by the owner on Nov 6, 2020. It is now read-only.
ethcore/client: fix deadlock caused by double-read lock and conflicting lock order #11766
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #11176
This PR fixes deadlock in ethcore/client caused by double read lock and conflicting lock order.
The first commit fixes double read lock:
As is explained in #11176, two read locks interleaved by a write lock with higher priority lead to deadlock.
Four such cases are caused by
chain.read()
inbuild_last_hashes()
:e.g.
https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L2132-L2134
https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L933
https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L1299-L1304
Two cases are caused by
chain.read()
inblock_number_ref()
:https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L1979-L1980
https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L2000
https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L1271-L1276
https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L1299-L1304
The fix is to use lock propagating as suggested by @niklasad1. I appends
&RwLockReadGuard
to the parameter lists ofbuild_last_hashes()
andblock_number_ref()
. Note that the&
is necessary because if we move it, then we cannot access the lockguard after calling build_last_hashes().But after that, I find a callchain:
check_and_lock_block
->chain.read()->check_epoch_end_signal->build_last_hashesSo I simply
drop(chain)
before callingcheck_epoch_end_signal
to avoid changing its function signature. This is in the second commit.The second commit deals with conflicting lock order:
Most of them are related to the wrong order of locks involving
state_db
inrestore_db
:e.g.
Some are related to
import_lock
:e.g.
The fix is to pass the lockguard as a parameter to
import_old_block()
and enforces the order betweenimport_lock
andchain
/db
.An alternative way is to add a comment requiring holding
import_lock
beforeimport_old_block()
and remove the lock inside.