Don't cascade reconnect requests on cascading failures

The way the SerialFallbackProvider's fallback logic was written, if 8 requests came in while a provider had recently closed, the requests would all trigger reconnects, cascading through any fallback providers and ultimately creating a storm of reconnects. Across multiple serial fallback providers, the problem was magnified, creating a tremendous volume of connection/reconnection loops when one provider failed. Some very light tweaks now check for whether there's already a retry/reconnect in progress. If such a reconnect is happening, the request is retried on the new provider. If such a reconnect has not yet been initiated, it is triggered. This ensures only one reconnect is being attempted at the same time. Still missing is any sort of backoff if all providers fail.
tahowallet · Dec 23, 2024 · 8490364 · 8490364
1 parent 9b914b2
commit 8490364
Showing 1 changed file with 17 additions and 1 deletion.
diff --git a/background/services/chain/serial-fallback-provider.ts b/background/services/chain/serial-fallback-provider.ts
@@ -383,6 +383,7 @@ export default class SerialFallbackProvider extends JsonRpcProvider {
           (this.currentProvider._pendingBatch as { length: number } | undefined)
         : undefined
     const pendingBatchSize = pendingBatch?.length
+    const existingProviderIndex = this.currentProviderIndex
 
     if (
       pendingBatch &&
@@ -514,6 +515,19 @@ export default class SerialFallbackProvider extends JsonRpcProvider {
           /WebSocket is already in CLOSING|bad response|missing response|we can't execute this request|failed response|TIMEOUT|NETWORK_ERROR/,
         )
       ) {
+        // If a new provider is already in the process of being tried, go ahead
+        // and fire off into the new provider.
+        if (this.currentProviderIndex !== existingProviderIndex) {
+          logger.debug(
+            "Retrying on newly connected provider on chain",
+            this.chainID,
+            ": ",
+            method,
+            params,
+          )
+          return await this.routeRpcCall(messageId)
+        }
+
         // If there is another provider to try - try to send the message on that provider
         if (this.currentProviderIndex + 1 < this.providerCreators.length) {
           return await this.attemptToSendMessageOnNewProvider(messageId)
@@ -534,6 +548,8 @@ export default class SerialFallbackProvider extends JsonRpcProvider {
         stringifiedError.match(/bad result from backend/)
       ) {
         if (
+          // If the current provider is the one we tried with initially.
+          this.currentProviderIndex === existingProviderIndex &&
           // If there is another provider to try and we have exceeded the
           // number of retries try to send the message on that provider
           this.currentProviderIndex + 1 < this.providerCreators.length &&
@@ -701,7 +717,7 @@ export default class SerialFallbackProvider extends JsonRpcProvider {
       // If every other provider failed and we're on the alchemy provider,
       // reconnect to the first provider once we've handled this request
       // as we should limit relying on alchemy as a fallback
-      if (isAlchemyFallback) {
+      if (isAlchemyFallback && this.currentProviderIndex !== 0) {
         this.currentProviderIndex = 0
         this.reconnectProvider()
       }