-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NONEVM-984][solana] - Reorg Detection + lighter rpc call #951
base: develop
Are you sure you want to change the base?
Conversation
Quality Gate passedIssues Measures |
@@ -112,11 +130,8 @@ func (c *pendingTxContext) New(tx pendingTx, sig solana.Signature, cancel contex | |||
return err | |||
} | |||
|
|||
// upgrade to write lock if sig or id do not exist | |||
// upgrade to write lock if id do not exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// upgrade to write lock if id do not exist | |
// upgrade to write lock if id does not exist |
return err | ||
} | ||
|
||
var pTx pendingTx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: think we can move this into the write lock block since we don't really need it outside
return | ||
} | ||
|
||
txm.lggr.Warnw("re-org detected for transaction", "txID", txInfo.id, "signature", sig, "previousStatus", txInfo.state, "currentStatus", currentTxState) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think we can consider re-orgs as part of the normal TXM processes since it's an expected occurrence eventually. Warn
might be too alarming here. I think Info
or even Debug
might be a better log level here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually let's do Debug
to keep this the same as our expiration log
} | ||
|
||
// For regressions from "Confirmed, we'll need to rebroadcast the tx. | ||
if regressionType == FromConfirmed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC I thought I remember you mentioning that if we detect a re-org, we'd have to replace the blockhash in all cases since the re-org implies that the block hash we were using is invalid. So rebroadcasting with the same blockhash would fail anyways. If that's the case, I think we can ditch this regressionType
and just assign a new blockhash always. We could also skip the specific confirmed re-orged tx
log below too then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if that's not an issue, we can maybe still consider keeping the same logic for both cases if it helps simplify things. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Great catch with this one. This approach simplifies the design a lot. Let me test it first to ensure we don’t unintentionally introduce races on Processed
ones, though I have a good feeling about this.
To back up our thoughts, here are some relevant quotes from the Solana Docs:
What is a blockhash?
A blockhash refers to the last Proof of History (PoH) hash for a slot. Since Solana uses PoH as a trusted clock, a transaction's recent blockhash can be thought of as a timestamp.
Proof of History refresher
Solana's Proof of History mechanism uses a very long chain of recursive SHA-256 hashes to build a trusted clock. The “history” part of the name comes from the fact that block producers hash transaction id's into the stream to record which transactions were processed in their block.
PoH can be used as a trusted clock because each hash must be produced sequentially. Each produced block contains a blockhash and a list of hash checkpoints called “ticks” so that validators can verify the full chain of hashes in parallel and prove that some amount of time has actually passed.
How does transaction expiration work?
Each transaction includes a “recent blockhash” which is used as a PoH clock timestamp and expires when that blockhash is no longer “recent enough”.
As each block is finalized, the final hash of the block is added to the BlockhashQueue which stores a maximum of the 300 most recent blockhashes. During transaction processing, Solana Validators will check if each transaction's recent blockhash is recorded within the most recent 151 stored hashes (aka "max processing age"). If the transaction's recent blockhash is older than this max processing age, the transaction is not processed.
In conclusion, I have a good feeling this approach will be viable. If a block gets reorged, my understanding is that its blockhash will no longer be present in the mentioned BlockhashQueue
. Therefore, any retry attempt using the previous blockhash would not be accepted, mitigating the risk of duplicate transactions. The only transaction that should go through would be the new one created with a fresh (and hopefully valid) blockhash.
// - Confirmed -> Processed || Broadcasted || Not Found | ||
// - Processed -> Broadcasted || Not Found | ||
currentTxState := convertStatus(status) | ||
if regressionType, isRegressed := isStatusRegression(txInfo.state, currentTxState); isRegressed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't need regressionType
anymore based on my other comment, I think we can just use TxHasReorg
for this and eliminate GetSignatureInfo
and UpdateSignatureStatus
to minimize the number of storage methods we use.
- We could rework
TxHasReorg
to take in a sig and a current state and return the id and bool. That way we can do theisStatusRegression
on the storage side and we can dropGetSignatureInfo
- We could also update
TxHasReorg
to fetch the id from the sig map to fetch the tx. Since we'd have access to the current sig and its new state, I don't think we'd have to callUpdateSignatureStatus
. We'd be dropping all old sigs anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this
status, errGetStatus = txmInstance.GetTransactionStatus(ctx, txID) | ||
require.NoError(t, errGetStatus) | ||
require.Equal(t, types.Finalized, status, "tx should be finalized after reorg") | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also include a test case where we detect a re-org from Processed
state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn’t achieve this because the local live validator transitions to Confirmed
instantly. I can try spamming a large number of transactions to overwhelm it, which might delay the transition and provide a window to stop it
Description
Tickets
Soak Testing