fix: various reader failover fixes #1227

aaron-congo · 2024-12-18T18:44:48Z

Description

failover1 fixes:
- use query to verify host role instead of topology, since server-side topology may not be updated yet.
- include the original writer as a candidate when failoverMode=strict-reader, since it is likely now a reader. The host role will be verified after connecting.
- in the ClusterAwareReaderFailoverHandler we pass a variable indicating whether failoverMode=strict-reader, but before these changes the variable was incorrectly passed before failoverMode had been initialized so it was getting the wrong value.
failover2 fixes:
- fixed a bug where reader failover was failing because the plugin assumed that the topology returned by forceRefreshHostList was not stale. In practice, the topology may not be stale, but it likely is because we do not wait for it to get updated before starting reader failover. To solve this, reader failover now tries to connect to all nodes in the topology. If failoverMode=strict-reader, the plugin will query the server to verify the host's role.
- also fixed a bug where the original writer was only attempted once, which was causing failover failures for 2-instance clusters with failoverMode=strict-reader because it takes some time for the original writer to recover. All nodes will now be attempted multiple times until we hit the failover timeout.
telemetry fixes (failover1 and failover2):
- fixed a bug where telemetryContext.setSuccess and telemetryContext.setException were not called when failover succeeded
- fixed a bug where failoverReaderFailedCounter was increased twice instead of once when failover failed

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

github-actions · 2024-12-18T18:50:59Z

Qodana Community for JVM

It seems all right 👌

No new problems were found according to the checks applied

💡 Qodana analysis was run in the pull request mode: only the changed files were checked

View the detailed Qodana report

To be able to view the detailed Qodana report, you can either:

Register at Qodana Cloud and configure the action
Use GitHub Code Scanning with Qodana
Host Qodana report at GitHub Pages
Inspect and use qodana.sarif.json (see the Qodana SARIF format for details)

To get *.log files or any other Qodana artifacts, run the action with upload-result option set to true,
so that the action will upload the files as the job artifacts:

      - name: 'Qodana Scan'
        uses: JetBrains/[email protected]
        with:
          upload-result: true

Contact Qodana team

Contact us at [email protected]

Or via our issue tracker: https://jb.gg/qodana-issue
Or share your feedback: https://jb.gg/qodana-discussions

…erMode=strict-reader

wrapper/src/main/java/software/amazon/jdbc/plugin/failover/FailoverConnectionPlugin.java

wrapper/src/main/java/software/amazon/jdbc/plugin/failover2/FailoverConnectionPlugin.java

…g that they have READER role

aaron-congo · 2024-12-20T22:16:29Z

wrapper/src/main/java/software/amazon/jdbc/plugin/failover/FailoverConnectionPlugin.java

    initHostProviderFunc.call();
  }

-  @Override
-  public OldConnectionSuggestedAction notifyConnectionChanged(final EnumSet<NodeChangeOptions> changes) {


this function has the same implementation in the superclass so I removed it

aaron-congo · 2024-12-20T22:19:13Z

wrapper/src/main/java/software/amazon/jdbc/plugin/failover/FailoverConnectionPlugin.java

@@ -578,22 +567,6 @@ protected void failover(final HostSpec failedHost) throws SQLException {
    } else {
      failoverReader(failedHost);
    }
-
-    if (isInTransaction || this.pluginService.isInTransaction()) {


This was moved to throwFailoverSuccessException so that the exception is thrown in failoverReader/failoverWriter instead. This was needed because failoverReader/failoverWriter catch the exception and then update the telemetry context, but there was a bug where we the exception was thrown here instead of in failoverReader/failoverWriter.

aaron-congo · 2024-12-20T22:21:10Z

wrapper/src/main/java/software/amazon/jdbc/plugin/failover/FailoverConnectionPlugin.java

@@ -621,28 +593,24 @@ protected void failoverReader(final HostSpec failedHostSpec) throws SQLException
      }

      if (result == null || !result.isConnected()) {
-        // "Unable to establish SQL connection to reader instance"
-        processFailoverFailure(Messages.get("Failover.unableToConnectToReader"));
-        this.failoverReaderFailedCounter.inc();


there was a bug here and other places where the failover failed counter was incorrectly increased twice, once here and once in the catch section for this block

aaron-congo · 2024-12-20T22:43:17Z

wrapper/src/test/java/software/amazon/jdbc/plugin/failover/FailoverConnectionPluginTest.java

@@ -442,43 +430,7 @@ void test_execute_withDirectExecute() throws SQLException {

  private void initializePlugin() {
    plugin = new FailoverConnectionPlugin(mockPluginService, properties);
-  }
-
-  private static class FooHostListProvider implements HostListProvider, DynamicHostListProvider {


This class was not being used so I removed it

fix: reader failover logic

ab110c3

aaron-congo added 3 commits December 18, 2024 17:11

Fix bug in reader failover for failover1 plugin

0906502

Fix unit test

90fbc09

Don't attempt original writer if it is the verified writer and failov…

2696be9

…erMode=strict-reader

aaron-congo changed the title ~~fix: reader failover logic~~ fix: various reader failover fixes Dec 19, 2024

sergiyvamz reviewed Dec 19, 2024

View reviewed changes

wrapper/src/main/java/software/amazon/jdbc/plugin/failover/FailoverConnectionPlugin.java Outdated Show resolved Hide resolved

sergiyvamz reviewed Dec 19, 2024

View reviewed changes

wrapper/src/main/java/software/amazon/jdbc/plugin/failover2/FailoverConnectionPlugin.java Outdated Show resolved Hide resolved

aaron-congo added 6 commits December 19, 2024 17:22

failover2: cleanup reader failover, fix telemetry

bfd9631

failover1: fix telemetry

9316886

Cleanup

9d69dc1

failover1: create failover handlers in connectInternal

710e13e

Fix bug where original readers were not being selected after verifyin…

f1b4835

…g that they have READER role

Update host role if it has changed after reader failover

4ec5a6f

aaron-congo commented Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: various reader failover fixes #1227

fix: various reader failover fixes #1227

aaron-congo commented Dec 18, 2024 •

edited

Loading

github-actions bot commented Dec 18, 2024 •

edited

Loading

aaron-congo Dec 20, 2024

aaron-congo Dec 20, 2024

aaron-congo Dec 20, 2024

aaron-congo Dec 20, 2024

fix: various reader failover fixes #1227

Are you sure you want to change the base?

fix: various reader failover fixes #1227

Conversation

aaron-congo commented Dec 18, 2024 • edited Loading

Description

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

github-actions bot commented Dec 18, 2024 • edited Loading

Qodana Community for JVM

aaron-congo Dec 20, 2024

Choose a reason for hiding this comment

aaron-congo Dec 20, 2024

Choose a reason for hiding this comment

aaron-congo Dec 20, 2024

Choose a reason for hiding this comment

aaron-congo Dec 20, 2024

Choose a reason for hiding this comment

aaron-congo commented Dec 18, 2024 •

edited

Loading

github-actions bot commented Dec 18, 2024 •

edited

Loading