-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Summary
The AccountManager in the onchain publisher has a race condition that causes publish-nodes tasks to fail with TypeError: Cannot read properties of undefined (reading 'address') when all accounts are removed from the pool during high-concurrency scenarios.
Problem
When multiple transactions fail simultaneously, the account pool can become empty, causing subsequent transaction registrations to fail.
Error Observed
TypeError: Cannot read properties of undefined (reading 'address')
at accounts.js:41:46
Root Cause
In apps/backend/src/infrastructure/services/upload/onchainPublisher/accounts.ts:
const getNextAccount = () => {
const account = accounts[trxCounter % accounts.length] // undefined when accounts.length === 0
trxCounter++
return account
}
const removeAccount = (address: string) => {
const index = accounts.findIndex((account) => account.address === address)
if (index !== -1) {
accounts.splice(index, 1)
}
if (accounts.length === 0) {
return initAccounts() // async, but not awaited by callers
}
}Issues:
getNextAccount()returnsundefinedwhenaccounts.length === 0(division by zero edge case)removeAccount()callsinitAccounts()asynchronously but doesn't block new registrations while accounts are being reloaded- Multiple concurrent
removeAccount()calls can race to empty the array
Trigger Conditions
- High message throughput (
RABBITMQ_PREFETCH=100) - High transaction concurrency (
MAX_CONCURRENT_UPLOADS=40) - Substrate RPC returns nonce collision errors (
1014: Priority is too low) - Failed transactions trigger
removeAccount(), eventually depleting the pool
Proposed Solution
export const createAccountManager = async (api: ApiPromise) => {
let accounts = getAccounts()
let isReinitializing = false
const uniqueExec = pLimit(1)
const nonceByAccount: Record<string, number> = {}
let trxCounter = 0
const initAccounts = () =>
uniqueExec(async () => {
isReinitializing = true
try {
accounts = getAccounts()
const promises = accounts.map(async (account) => {
const nonce = await getOnChainNonce(api, account.address)
nonceByAccount[account.address] = nonce
})
await Promise.all(promises)
} finally {
isReinitializing = false
}
})
const getNextAccount = () => {
if (accounts.length === 0) {
throw new Error('No accounts available')
}
const account = accounts[trxCounter % accounts.length]
trxCounter++
return account
}
const registerTransaction = () =>
uniqueExec(async () => {
// Wait for reinitialization if in progress
if (accounts.length === 0 || isReinitializing) {
await initAccounts()
}
const account = getNextAccount()
const nonce = nonceByAccount[account.address]
nonceByAccount[account.address] = nonce + 1
return { account, nonce }
})
const removeAccount = async (address: string) => {
return uniqueExec(async () => {
const index = accounts.findIndex((account) => account.address === address)
if (index !== -1) {
accounts.splice(index, 1)
}
if (accounts.length === 0) {
await initAccounts()
}
})
}
await initAccounts()
return { registerTransaction, removeAccount }
}Key Changes
- Guard against empty array:
getNextAccount()throws explicitly when no accounts available - Track reinitialization state:
isReinitializingflag prevents new transactions during account reload - Make
removeAccountasync: Ensure callers can await the reinitialization - Serialize all operations: Use
uniqueExecconsistently to prevent race conditions
Alternative Considerations
- Circuit breaker pattern: Stop accepting new tasks when error rate exceeds threshold
- Retry with backoff: Add exponential backoff before reinitializing accounts
- Separate nonce tracking: Query fresh nonce from chain instead of local tracking (slower but more reliable)
- Account balance monitoring: Alert when any publishing account balance falls below threshold
- Never fully deplete pool: Keep at least 1 account active even on failures, or pause processing
Testing
- Unit test:
registerTransactionwhen accounts array is empty - Unit test: concurrent
removeAccountcalls - Unit test:
registerTransactionduringinitAccountsexecution - Integration test: high-concurrency publish-nodes with simulated RPC failures
Metadata
Metadata
Assignees
Labels
No labels