[NONEVM-984][solana] - Reorg Detection + lighter rpc call #951

Farber98 · 2024-11-28T22:34:06Z

Description

Track transaction statuses on per signature basis to identify which signature for a transaction was included to detect re-orgs specifically for it
Update the confirmation logic to identify regression in a signature’s transaction status when no or processed status is received to detect re-orgs
- A transaction can revert back to no status or processed from confirmed status
- A transaction can revert back to no status from processed status
If a re-org is detected
- from confirmed, restart the retry/bump loop
- from processed, we don't do nothing and it's handled by expiration rebroadcast if needed later

Tickets

Soak Testing

[NONEVM-984][SOAK] - Reorg Detection #969

…ebroadcasted

…nment

cl-sonarqube-production · 2025-01-02T19:33:56Z

Quality Gate passed

Issues
2 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
79.9% Coverage on New Code
1.7% Duplication on New Code

See analysis details on SonarQube

amit-momin · 2025-01-02T20:43:54Z

pkg/solana/txm/pendingtx.go

@@ -112,11 +130,8 @@ func (c *pendingTxContext) New(tx pendingTx, sig solana.Signature, cancel contex
 		return err
 	}

-	// upgrade to write lock if sig or id do not exist
+	// upgrade to write lock if id do not exist


nit:

Suggested change

// upgrade to write lock if id do not exist

// upgrade to write lock if id does not exist

amit-momin · 2025-01-02T20:50:13Z

pkg/solana/txm/pendingtx.go

+		return err
+	}
+
+	var pTx pendingTx


nit: think we can move this into the write lock block since we don't really need it outside

amit-momin · 2025-01-02T21:03:29Z

pkg/solana/txm/txm.go

+			return
+		}
+
+		txm.lggr.Warnw("re-org detected for transaction", "txID", txInfo.id, "signature", sig, "previousStatus", txInfo.state, "currentStatus", currentTxState)


Think we can consider re-orgs as part of the normal TXM processes since it's an expected occurrence eventually. Warn might be too alarming here. I think Info or even Debug might be a better log level here.

Actually let's do Debug to keep this the same as our expiration log

amit-momin · 2025-01-02T21:19:24Z

pkg/solana/txm/txm.go

+		}
+
+		// For regressions from "Confirmed, we'll need to rebroadcast the tx.
+		if regressionType == FromConfirmed {


IIRC I thought I remember you mentioning that if we detect a re-org, we'd have to replace the blockhash in all cases since the re-org implies that the block hash we were using is invalid. So rebroadcasting with the same blockhash would fail anyways. If that's the case, I think we can ditch this regressionType and just assign a new blockhash always. We could also skip the specific confirmed re-orged tx log below too then.

Even if that's not an issue, we can maybe still consider keeping the same logic for both cases if it helps simplify things. What do you think?

Hey! Great catch with this one. This approach simplifies the design a lot. Let me test it first to ensure we don’t unintentionally introduce races on Processed ones, though I have a good feeling about this.

To back up our thoughts, here are some relevant quotes from the Solana Docs:

What is a blockhash?

A blockhash refers to the last Proof of History (PoH) hash for a slot. Since Solana uses PoH as a trusted clock, a transaction's recent blockhash can be thought of as a timestamp.

Proof of History refresher

Solana's Proof of History mechanism uses a very long chain of recursive SHA-256 hashes to build a trusted clock. The “history” part of the name comes from the fact that block producers hash transaction id's into the stream to record which transactions were processed in their block.

PoH can be used as a trusted clock because each hash must be produced sequentially. Each produced block contains a blockhash and a list of hash checkpoints called “ticks” so that validators can verify the full chain of hashes in parallel and prove that some amount of time has actually passed.

How does transaction expiration work?

Each transaction includes a “recent blockhash” which is used as a PoH clock timestamp and expires when that blockhash is no longer “recent enough”.

As each block is finalized, the final hash of the block is added to the BlockhashQueue which stores a maximum of the 300 most recent blockhashes. During transaction processing, Solana Validators will check if each transaction's recent blockhash is recorded within the most recent 151 stored hashes (aka "max processing age"). If the transaction's recent blockhash is older than this max processing age, the transaction is not processed.

In conclusion, I have a good feeling this approach will be viable. If a block gets reorged, my understanding is that its blockhash will no longer be present in the mentioned BlockhashQueue. Therefore, any retry attempt using the previous blockhash would not be accepted, mitigating the risk of duplicate transactions. The only transaction that should go through would be the new one created with a fresh (and hopefully valid) blockhash.

amit-momin · 2025-01-02T21:36:26Z

pkg/solana/txm/txm.go

+	// - Confirmed -> Processed || Broadcasted || Not Found
+	// - Processed -> Broadcasted || Not Found
+	currentTxState := convertStatus(status)
+	if regressionType, isRegressed := isStatusRegression(txInfo.state, currentTxState); isRegressed {


If we don't need regressionType anymore based on my other comment, I think we can just use TxHasReorg for this and eliminate GetSignatureInfo and UpdateSignatureStatus to minimize the number of storage methods we use.

We could rework TxHasReorg to take in a sig and a current state and return the id and bool. That way we can do the isStatusRegression on the storage side and we can drop GetSignatureInfo

We could also update TxHasReorg to fetch the id from the sig map to fetch the tx. Since we'd have access to the current sig and its new state, I don't think we'd have to call UpdateSignatureStatus. We'd be dropping all old sigs anyways.

I like this

amit-momin · 2025-01-02T21:41:36Z

pkg/solana/txm/txm_integration_test.go

+		status, errGetStatus = txmInstance.GetTransactionStatus(ctx, txID)
+		require.NoError(t, errGetStatus)
+		require.Equal(t, types.Finalized, status, "tx should be finalized after reorg")
+	})


Could you also include a test case where we detect a re-org from Processed state?

I couldn’t achieve this because the local live validator transitions to Confirmed instantly. I can try spamming a large number of transactions to overwhelm it, which might delay the transition and provide a window to stop it

Farber98 added 30 commits November 26, 2024 17:30

refactor so txm owns blockhash assignment

2d1a82d

lastValidBlockHeight shouldn't be exported

50dfef0

better comment

4e545e2

refactor sendWithRetry to make it clearer

4ded53c

confirm loop refactor

9e1be6d

fix infinite loop

7dd2028

move accountID inside msg

6c675f2

lint fix

b0d9426

base58 does not contain lower l

1b38665

fix hash errors

6923ddf

fix generate random hash

462844b

remove blockhash as we only need block height

fd785d0

expired tx changes without tests

cf958a4

add maybe to mocks

c5e957b

expiration tests

a505993

send txes through queue

adc8b1c

revert pendingtx leakage of information. overwrite blockhash

7d77f99

fix order of confirm loop and not found signature check

92a280b

fix mocks

2598e19

prevent confirmation loop to mark tx as errored when it needs to be r…

42b3da1

…ebroadcasted

fix test

89af1f3

fix pointer

5e8a0da

add comments

75c1dcd

reduce rpc calls + refactors

4ff2d23

tests + check to save rpc calls

84e423e

address feedback + remove redundant impl

7d8319e

iface comment

68f3a3e

address feedback on compute unit limit and lastValidBlockHeight assig…

780179f

…nment

blockhash assignment inside txm.sendWithRetry

98f0246

address feedback

cbf55f6

Farber98 had a problem deploying to integration December 23, 2024 18:51 — with GitHub Actions Error

Farber98 added 2 commits December 23, 2024 15:53

fix integration tests

d94a2c9

Merge branch 'develop' into nonevm-984-reorg

da1ee68

Farber98 temporarily deployed to integration December 23, 2024 18:55 — with GitHub Actions Inactive

Farber98 had a problem deploying to integration December 23, 2024 19:00 — with GitHub Actions Failure

Farber98 temporarily deployed to integration December 23, 2024 19:12 — with GitHub Actions Inactive

Merge branch 'develop' into nonevm-984-reorg

6c08a75

Farber98 temporarily deployed to integration January 2, 2025 18:38 — with GitHub Actions Inactive

Farber98 temporarily deployed to integration January 2, 2025 18:39 — with GitHub Actions Inactive

Farber98 temporarily deployed to integration January 2, 2025 18:45 — with GitHub Actions Inactive

remove unused params and better comments

cc50224

Farber98 temporarily deployed to integration January 2, 2025 19:22 — with GitHub Actions Inactive

Farber98 temporarily deployed to integration January 2, 2025 19:27 — with GitHub Actions Inactive

Farber98 marked this pull request as ready for review January 2, 2025 19:40

Farber98 requested a review from a team as a code owner January 2, 2025 19:40

amit-momin reviewed Jan 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NONEVM-984][solana] - Reorg Detection + lighter rpc call #951

[NONEVM-984][solana] - Reorg Detection + lighter rpc call #951

Farber98 commented Nov 28, 2024 •

edited

Loading

cl-sonarqube-production bot commented Jan 2, 2025

amit-momin Jan 2, 2025

amit-momin Jan 2, 2025

amit-momin Jan 2, 2025

amit-momin Jan 2, 2025

amit-momin Jan 2, 2025

amit-momin Jan 2, 2025

Farber98 Jan 3, 2025

amit-momin Jan 2, 2025

Farber98 Jan 3, 2025

amit-momin Jan 2, 2025

Farber98 Jan 3, 2025 •

edited

Loading

	// upgrade to write lock if id do not exist
	// upgrade to write lock if id does not exist

[NONEVM-984][solana] - Reorg Detection + lighter rpc call #951

Are you sure you want to change the base?

[NONEVM-984][solana] - Reorg Detection + lighter rpc call #951

Conversation

Farber98 commented Nov 28, 2024 • edited Loading

Description

Tickets

Soak Testing

cl-sonarqube-production bot commented Jan 2, 2025

Quality Gate passed

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Farber98 Jan 3, 2025 • edited Loading

Choose a reason for hiding this comment

Farber98 commented Nov 28, 2024 •

edited

Loading

Farber98 Jan 3, 2025 •

edited

Loading