Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: missing signers' votes #1243

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

fix: missing signers' votes #1243

wants to merge 10 commits into from

Conversation

Jiloc
Copy link
Collaborator

@Jiloc Jiloc commented Jan 20, 2025

Description

Closes: #1237

Changes

  • add new deposit_decisions_retry_window param to the signers' settings (defaults to 3)
  • add query to get the all the DepositSigner entries from the DB from the last N blocks
  • add resend mechanism in the request_decider to resend all the Requests of the last 3 blocks when a new bitcoin block arrives

Testing Information

  • add query integration tests

Checklist:

  • I have performed a self-review of my code
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes

@Jiloc Jiloc self-assigned this Jan 20, 2025
Comment on lines 1925 to 1971
async fn get_deposit_signer_decisions(
&self,
chain_tip: &model::BitcoinBlockHash,
context_window: u16,
signer_public_key: &PublicKey,
) -> Result<Vec<model::DepositSigner>, Error> {
sqlx::query_as::<_, model::DepositSigner>(
r#"
WITH RECURSIVE context_window AS (
-- Anchor member: Initialize the recursion with the chain tip
SELECT block_hash, block_height, parent_hash, created_at, 1 AS depth
FROM sbtc_signer.bitcoin_blocks
WHERE block_hash = $1

UNION ALL

-- Recursive member: Fetch the parent block using the last block's parent_hash
SELECT parent.block_hash, parent.block_height, parent.parent_hash,
parent.created_at, last.depth + 1
FROM sbtc_signer.bitcoin_blocks parent
JOIN context_window last ON parent.block_hash = last.parent_hash
WHERE last.depth < $2
),
transactions_in_window AS (
SELECT transactions.txid
FROM context_window blocks_in_window
JOIN sbtc_signer.bitcoin_transactions transactions ON
transactions.block_hash = blocks_in_window.block_hash
)
SELECT
deposit_signers.txid
, deposit_signers.output_index
, deposit_signers.signer_pub_key
, deposit_signers.can_sign
, deposit_signers.can_accept
FROM transactions_in_window transactions
JOIN sbtc_signer.deposit_signers USING (txid)
WHERE deposit_signers.signer_pub_key = $3
"#,
)
.bind(chain_tip)
.bind(i32::from(context_window))
.bind(signer_public_key)
.fetch_all(&self.0)
.await
.map_err(Error::SqlxQuery)
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple optimization could be to specialize a bit the query and only get decision of the deposits which have less than the total number of signers entries in the table.

Copy link
Collaborator

@matteojug matteojug Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is doing something slightly different from what we initially talked about: here we pick the deposits that were confirmed (on the canonical chain) in the last three blocks, but this means that if we fetch from Emily a deposit that was already confirmed but submitted (to emily) only recently we may not retry it.
This is fine wrt the original issue (signers getting a vote for a viewed-as-unconfirmed deposit), but if we were to have a real retry 3 times approach we would also add redundancy to lost votes messages (for other, unrelated reasons) -- but it would require adding more info somewhere else in the db, so maybe it's not really worth it.

Also a possible simplification here could be to ignore the canonical chain, but given the context window is so small the recursive part shouldn't be an issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thinking on the edge case of missing older transactions! A simple approach that avoids modifying the database could be to use the created_at column of the bitcoin block in the bitcoin_blocks table relative to the selected context_window. With each new block, we can retrieve the creation time of the block received N-updates ago and fetch all decisions with a created_at greater than that block's creation time. The only downside is that if blocks arrive very close to each other, we might end up retrying a vote an additional time. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's a nice way to avoid extra stuff! And an extra retry shouldn't really be an issue, we would just send 4x more votes message but I don't think that's overwhelming -- but maybe we should double check, just in case.
I'm okay with leveraging created_at, but maybe let's check also with @djordon or @cylewitruk , just to be sure it's not an overkill or we are missing some other corner case.

chain_tip: &BitcoinBlockHash,
) -> Result<(), Error> {
for decision in decisions.into_iter().map(SignerDepositDecision::from) {
self.send_message(decision, chain_tip).await?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this retry is a "best effort" type of thing, should we just log the error in here instead of bailing at the first one?

Comment on lines 1925 to 1971
async fn get_deposit_signer_decisions(
&self,
chain_tip: &model::BitcoinBlockHash,
context_window: u16,
signer_public_key: &PublicKey,
) -> Result<Vec<model::DepositSigner>, Error> {
sqlx::query_as::<_, model::DepositSigner>(
r#"
WITH RECURSIVE context_window AS (
-- Anchor member: Initialize the recursion with the chain tip
SELECT block_hash, block_height, parent_hash, created_at, 1 AS depth
FROM sbtc_signer.bitcoin_blocks
WHERE block_hash = $1

UNION ALL

-- Recursive member: Fetch the parent block using the last block's parent_hash
SELECT parent.block_hash, parent.block_height, parent.parent_hash,
parent.created_at, last.depth + 1
FROM sbtc_signer.bitcoin_blocks parent
JOIN context_window last ON parent.block_hash = last.parent_hash
WHERE last.depth < $2
),
transactions_in_window AS (
SELECT transactions.txid
FROM context_window blocks_in_window
JOIN sbtc_signer.bitcoin_transactions transactions ON
transactions.block_hash = blocks_in_window.block_hash
)
SELECT
deposit_signers.txid
, deposit_signers.output_index
, deposit_signers.signer_pub_key
, deposit_signers.can_sign
, deposit_signers.can_accept
FROM transactions_in_window transactions
JOIN sbtc_signer.deposit_signers USING (txid)
WHERE deposit_signers.signer_pub_key = $3
"#,
)
.bind(chain_tip)
.bind(i32::from(context_window))
.bind(signer_public_key)
.fetch_all(&self.0)
.await
.map_err(Error::SqlxQuery)
}
Copy link
Collaborator

@matteojug matteojug Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is doing something slightly different from what we initially talked about: here we pick the deposits that were confirmed (on the canonical chain) in the last three blocks, but this means that if we fetch from Emily a deposit that was already confirmed but submitted (to emily) only recently we may not retry it.
This is fine wrt the original issue (signers getting a vote for a viewed-as-unconfirmed deposit), but if we were to have a real retry 3 times approach we would also add redundancy to lost votes messages (for other, unrelated reasons) -- but it would require adding more info somewhere else in the db, so maybe it's not really worth it.

Also a possible simplification here could be to ignore the canonical chain, but given the context window is so small the recursive part shouldn't be an issue.

Comment on lines 1933 to 1947
WITH RECURSIVE context_window AS (
-- Anchor member: Initialize the recursion with the chain tip
SELECT block_hash, block_height, parent_hash, created_at, 1 AS depth
FROM sbtc_signer.bitcoin_blocks
WHERE block_hash = $1

UNION ALL

-- Recursive member: Fetch the parent block using the last block's parent_hash
SELECT parent.block_hash, parent.block_height, parent.parent_hash,
parent.created_at, last.depth + 1
FROM sbtc_signer.bitcoin_blocks parent
JOIN context_window last ON parent.block_hash = last.parent_hash
WHERE last.depth < $2
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use bitcoin_blockchain_of here?

signer/src/request_decider.rs Show resolved Hide resolved
signer/tests/integration/postgres.rs Outdated Show resolved Hide resolved
signer/tests/integration/postgres.rs Outdated Show resolved Hide resolved
signer/tests/integration/postgres.rs Outdated Show resolved Hide resolved
signer/tests/integration/postgres.rs Outdated Show resolved Hide resolved
signer/tests/integration/postgres.rs Outdated Show resolved Hide resolved
Comment on lines 1440 to 1448
// Test data contains 5 deposit requests, we should get decisions for
// the last 3.
for deposit in test_data.deposit_requests[2..].iter() {
assert!(deposit_decisions.iter().any(|decision| {
decision.txid == deposit.txid
&& decision.output_index == deposit.output_index
&& decision.signer_pub_key == *signer_pub_key
}));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't always work: TestData generates a chain with forks, so picking the last 3 deposits requests doesn't guarantee that those 3 are all on the canonical chain; it does work with 51 as random seed, but if you change that to 52 the test fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Missing Signer Votes
2 participants