The figure that follows shows a simplified Enhanced Failure Monitoring (EFM) workflow. Enhanced Failure Monitoring is a feature available from the Host Monitoring Connection Plugin. The Host Monitoring Connection Plugin periodically checks the connected database node's health or availability. If a database node is determined to be unhealthy, the connection will be aborted. The Host Monitoring Connection Plugin uses the Enhanced Failure Monitoring Parameters and a database node's responsiveness to determine whether a node is healthy.
Enhanced Failure Monitoring helps user applications detect failures earlier. When a user application executes a query, EFM may detect that the connected database node is unavailable. When this happens, the query is cancelled and the connection will be aborted. This allows queries to fail fast instead of waiting indefinitely or failing due to a timeout.
One use case is to pair EFM with the Failover Connection Plugin. When EFM discovers a database node failure, the connection will be aborted. Without the Failover Connection Plugin, the connection would be terminated up to the user application level. With the Failover Connection Plugin, the JDBC wrapper can attempt to failover to a different, healthy database node where the query can be executed.
Not all applications will have a need for Enhanced Failure Monitoring. If an application's query times are predictable and short, and the application does not execute any long-running SQL queries, Enhanced Failure Monitoring may be replaced with one of the following alternatives that consumes fewer resources and is simpler to configure.
The alternatives are:
- setting a simple network timeout, or
- using TCP Keepalive.
Although these alternatives are available, EFM is more configurable than simple network timeouts, and easier to configure than TCP Keepalive. Users should keep these advantages and disadvantages in mind when deciding whether Enhanced Failure Monitoring is suitable for their application.
This option is useful when a user application executes quick statements that run for predictable lengths of time. In this case, the network timeout should be set to a value such as the 95th to 99th percentile. One way to do this is with the setNetworkTimeout
method.
This option is useful because it is built into the TCP protocol. How you enable it depends on the underlying driver provided to the AWS JDBC Driver. For example, to enable TCP Keepalive with an underlying PostgreSQL driver, you will need to set the property tcpKeepAlive
to true
. TCP Keepalive settings, which are similar to some Enhanced Failure Monitoring parameters, are all configurable. However, this is specific to operating systems, so you should verify what your system needs before configuring TCP Keepalive on your system. See this page for more information on how to set TCP Keepalive parameters.
Enhanced Failure Monitoring will NOT be enabled unless the Host Monitoring Connection Plugin is explicitly loaded by adding the plugin code efm
to the wrapperPlugins
value, or if it is added to the current driver profile. Enhanced Failure Monitoring is enabled by default when the Host Monitoring Connection Plugin is loaded, but it can be disabled with the parameter failureDetectionEnabled
set to false
.
⚠️ Note: When loading the Host Monitoring Connection Plugin, the order plugins are loaded in matters. We recommend that you load the Host Monitoring Connection Plugin at the end (or as close to the end) as possible. When used in conjunction with the Failover Connection Plugin, the Host Monitoring Connection Plugin must be loaded after the Failover Connection Plugin. For example, when loading plugins with thewrapperPlugins
parameter, the parameter value should befailover,...,efm
.
The parameters failureDetectionTime
, failureDetectionInterval
, and failureDetectionCount
are similar to TCP Keepalive parameters. Each connection has its own set of parameters. The failureDetectionTime
is how long the monitor waits after a SQL query is started to send a probe to a database node. The failureDetectionInterval
is how often the monitor sends a probe to a database node. The failureDetectionCount
is how many times a monitor probe can go unacknowledged before the database node is deemed unhealthy.
To determine the health of a database node:
- The monitor will first wait for a time equivalent to the
failureDetectionTime
. - Then, every
failureDetectionInterval
, the monitor will send a probe to the database node. - If the probe is not acknowledged by the database node, a counter is incremented.
- If the counter reaches the
failureDetectionCount
, the database node will be deemed unhealthy and the connection will be aborted.
If a more aggressive approach to failure checking is necessary, all of these parameters can be reduced to reflect that. However, increased failure checking may also lead to an increase in false positives. For example, if the failureDetectionInterval
was shortened, the plugin may complete several connection checks that all fail. The database node would then be considered unhealthy, but it may have been about to recover and the connection checks were completed before that could happen.
Parameter | Value | Required | Description | Default Value |
---|---|---|---|---|
failureDetectionCount |
Integer | No | Number of failed connection checks before considering database node as unhealthy. | 3 |
failureDetectionEnabled |
Boolean | No | Set to true to enable Enhanced Failure Monitoring. Set to false to disable it. |
true |
failureDetectionInterval |
Integer | No | Interval in milliseconds between probes to database node. | 5000 |
failureDetectionTime |
Integer | No | Interval in milliseconds between sending a SQL query to the server and the first probe to the database node. | 30000 |
monitorDisposalTime |
Integer | No | Interval in milliseconds for a monitor to be considered inactive and to be disposed. | 60000 |
The Host Monitoring Connection Plugin may create new monitoring connections to check the database node's availability. You can configure these connection with driver-specific configurations by adding the monitoring-
prefix to the configuration parameters, like the following example:
final Properties properties = new Properties();
// Configure the timeout values for all, non-monitoring connections.
properties.setProperty("connectTimeout", "30");
properties.setProperty("socketTimeout", "30");
// Configure different timeout values for the monitoring connections.
properties.setProperty("monitoring-connectTimeout", "10");
properties.setProperty("monitoring-socketTimeout", "10");
❗ Always ensure you provide a non-zero socket timeout value or a connect timeout value to the Host Monitoring Connection Plugin
The Host Monitoring Connection Plugin does not have default timeout values such as
connectTimeout
orsocketTimeout
since these values are driver specific. Most JDBC drivers use 0 as the default timeout value. If you do not override the default timeout value, the Host Monitoring Connection Plugin may wait forever to establish a monitoring connection in the event where the database node is unavailable.
We recommend you either disable the Host Monitoring Connection Plugin or avoid using RDS Proxy endpoints when the Host Monitoring Connection Plugin is active.
Although using RDS Proxy endpoints with the AWS Advanced JDBC Driver with Enhanced Failure Monitoring doesn't cause any critical issues, we don't recommend this approach. The main reason is that RDS Proxy transparently re-routes requests to a single database instance. RDS Proxy decides which database instance is used based on many criteria (on a per-request basis). Switching between different instances makes the Host Monitoring Connection Plugin useless in terms of instance health monitoring because the plugin will be unable to identify which instance it's connected to, and which one it's monitoring. This could result in false positive failure detections. At the same time, the plugin will still proactively monitor network connectivity to RDS Proxy endpoints and report outages back to a user application if they occur.