Enable unattented recovery from zero replicas #30

solsson · 2020-04-29T05:21:29Z

Running a cluster on preemptible nodes now. It's not a production critical cluster, so because of OrderedReady when mariadb-0 for some reason goes unready and we don't attend to it and fix it immediately any evictions on the higher ordinal pods mean they don't get scheduled again.

In this case mariadb-0 crashlooped on:

2020-04-29 04:53:38+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 1:10.2.31+maria~bionic started.
2020-04-29 04:53:38+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2020-04-29 04:53:38+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 1:10.2.31+maria~bionic started.
2020-04-29  4:53:39 140069473916608 [Note] mysqld (mysqld 10.2.31-MariaDB-1:10.2.31+maria~bionic) starting as process 1 ...
2020-04-29  4:53:39 140069473916608 [Note] WSREP: Read nil XID from storage engines, skipping position init
2020-04-29  4:53:39 140069473916608 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2020-04-29  4:53:39 140069473916608 [Note] WSREP: wsrep_load(): Galera 25.3.28(r3875) by Codership Oy <[email protected]> loaded successfully.
2020-04-29  4:53:39 140069473916608 [Note] WSREP: CRC-32C: using hardware acceleration.
2020-04-29  4:53:39 140069473916608 [Note] WSREP: Found saved state: a00fcadf-8777-11ea-b435-9e2aae01e785:-1, safe_to_bootstrap: 0
2020-04-29  4:53:39 140069473916608 [Note] WSREP: Passing config to GCS: base_dir = /data/db/; base_host = 10.24.2.35; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /data/db/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/db//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false
2020-04-29  4:53:39 140069473916608 [Note] WSREP: GCache history reset: a00fcadf-8777-11ea-b435-9e2aae01e785:0 -> a00fcadf-8777-11ea-b435-9e2aae01e785:-1
2020-04-29  4:53:39 140069473916608 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2020-04-29  4:53:39 140069473916608 [Note] WSREP: wsrep_sst_grab()
2020-04-29  4:53:39 140069473916608 [Note] WSREP: Start replication
2020-04-29  4:53:39 140069473916608 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2020-04-29  4:53:39 140069473916608 [Note] WSREP: protonet asio version 0
2020-04-29  4:53:39 140069473916608 [Note] WSREP: Using CRC-32C for message checksums.
2020-04-29  4:53:39 140069473916608 [Note] WSREP: backend: asio
2020-04-29  4:53:39 140069473916608 [Note] WSREP: gcomm thread scheduling priority set to other:0 
2020-04-29  4:53:39 140069473916608 [Warning] WSREP: access file(/data/db//gvwstate.dat) failed(No such file or directory)
2020-04-29  4:53:39 140069473916608 [Note] WSREP: restore pc from disk failed
2020-04-29  4:53:39 140069473916608 [Note] WSREP: GMCast version 0
2020-04-29  4:53:39 140069473916608 [Warning] WSREP: Failed to resolve tcp://mariadb-1.mariadb:4567
2020-04-29  4:53:39 140069473916608 [Warning] WSREP: Failed to resolve tcp://mariadb-2.mariadb:4567
2020-04-29  4:53:39 140069473916608 [Note] WSREP: (6463191c, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2020-04-29  4:53:39 140069473916608 [Note] WSREP: (6463191c, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2020-04-29  4:53:39 140069473916608 [Note] WSREP: EVS version 0
2020-04-29  4:53:39 140069473916608 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer 'mariadb-0.mariadb:,mariadb-1.mariadb:,mariadb-2.mariadb:'
2020-04-29  4:53:39 140069473916608 [Note] WSREP: (6463191c, 'tcp://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address tcp://10.24.2.35:4567
2020-04-29  4:53:42 140069473916608 [Warning] WSREP: no nodes coming from prim view, prim not possible
2020-04-29  4:53:42 140069473916608 [Note] WSREP: view(view_id(NON_PRIM,6463191c,1) memb {
   6463191c,0
} joined {
} left {
} partitioned {
})
2020-04-29  4:53:42 140069473916608 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50167S), skipping check
2020-04-29  4:54:12 140069473916608 [Note] WSREP: view((empty))
2020-04-29  4:54:12 140069473916608 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
    at gcomm/src/pc.cpp:connect():158
2020-04-29  4:54:12 140069473916608 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend connection: -110 (Connection timed out)
2020-04-29  4:54:12 140069473916608 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1458: Failed to open channel 'my_wsrep_cluster' at 'gcomm://mariadb-0.mariadb,mariadb-1.mariadb,mariadb-2.mariadb': -110 (Connection timed out)
2020-04-29  4:54:12 140069473916608 [ERROR] WSREP: gcs connect failed: Connection timed out
2020-04-29  4:54:12 140069473916608 [ERROR] WSREP: wsrep::connect(gcomm://mariadb-0.mariadb,mariadb-1.mariadb,mariadb-2.mariadb) failed: 7
2020-04-29  4:54:12 140069473916608 [ERROR] Aborting

Unfortunately I didn't investigate why the init-container completed, when normally it should have complained about having data but not knowing how to bootstrap. Instead I deleted the statefulset and recreated it with Parallel.

When scaling up from replicas 0 to 3 now mariadb-0 waits in the init container as expected. The other two crashloop, after failing to join WSREP. I think that's a pretty useful behavior. I simply ran the kubectl exec -c init-config mariadb-0 -- touch /tmp/confirm-force-bootstrap option logged by the init script and the cluster recovered.

Theory: To automate recovery, but accept a risk of lost writes, we could default to "confirm-force-bootstrap" on mariadb-0. With the assumption that partitioned clusters are rare the risk of lost writes should be low. Alternatively we could introduce manual start mode for all pods, so that admins can follow the MariaDB's recovery docs and decide to bootstrap the one with the newest data.

While waiting for my choice of start mode with mariadb-0 the other two pods logged the following:

2020-04-29  5:07:08 139926753629888 [Note] WSREP: Passing config to GCS: base_dir = /data/db/; base_host = 10.24.1.118; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /data/db/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/db//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = fals
2020-04-29  5:07:08 139926753629888 [Note] WSREP: GCache history reset: a00fcadf-8777-11ea-b435-9e2aae01e785:0 -> a00fcadf-8777-11ea-b435-9e2aae01e785:-1
2020-04-29  5:07:08 139926753629888 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2020-04-29  5:07:08 139926753629888 [Note] WSREP: wsrep_sst_grab()
2020-04-29  5:07:08 139926753629888 [Note] WSREP: Start replication
2020-04-29  5:07:08 139926753629888 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2020-04-29  5:07:08 139926753629888 [Note] WSREP: protonet asio version 0
2020-04-29  5:07:08 139926753629888 [Note] WSREP: Using CRC-32C for message checksums.
2020-04-29  5:07:08 139926753629888 [Note] WSREP: backend: asio
2020-04-29  5:07:08 139926753629888 [Note] WSREP: gcomm thread scheduling priority set to other:0 
2020-04-29  5:07:08 139926753629888 [Warning] WSREP: access file(/data/db//gvwstate.dat) failed(No such file or directory)
2020-04-29  5:07:08 139926753629888 [Note] WSREP: restore pc from disk failed
2020-04-29  5:07:08 139926753629888 [Note] WSREP: GMCast version 0
2020-04-29  5:07:08 139926753629888 [Note] WSREP: (46947ae0, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2020-04-29  5:07:08 139926753629888 [Note] WSREP: (46947ae0, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2020-04-29  5:07:08 139926753629888 [Note] WSREP: EVS version 0
2020-04-29  5:07:08 139926753629888 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer 'mariadb-0.mariadb:,mariadb-1.mariadb:,mariadb-2.mariadb:'
2020-04-29  5:07:08 139926753629888 [Note] WSREP: (46947ae0, 'tcp://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address tcp://10.24.1.118:4567
2020-04-29  5:07:08 139926753629888 [Note] WSREP: (46947ae0, 'tcp://0.0.0.0:4567') connection established to 345d3b05 tcp://10.24.0.6:4567
2020-04-29  5:07:08 139926753629888 [Note] WSREP: (46947ae0, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2020-04-29  5:07:09 139926753629888 [Note] WSREP: (46947ae0, 'tcp://0.0.0.0:4567') reconnecting to 345d3b05 (tcp://10.24.0.6:4567), attempt 0
2020-04-29  5:07:10 139926753629888 [Note] WSREP: (46947ae0, 'tcp://0.0.0.0:4567') connection established to 4813df72 tcp://10.24.0.6:4567
2020-04-29  5:07:10 139926753629888 [Note] WSREP: remote endpoint tcp://10.24.0.6:4567 changed identity 345d3b05 -> 4813df72
2020-04-29  5:07:13 139926753629888 [Note] WSREP: evs::proto(46947ae0, GATHER, view_id(TRANS,46947ae0,0)) suspecting node: 30eee26f
2020-04-29  5:07:13 139926753629888 [Note] WSREP: evs::proto(46947ae0, GATHER, view_id(TRANS,46947ae0,0)) suspected node without join message, declaring inactive
2020-04-29  5:07:13 139926753629888 [Note] WSREP: evs::proto(46947ae0, GATHER, view_id(TRANS,46947ae0,0)) suspecting node: 345d3b05
2020-04-29  5:07:13 139926753629888 [Note] WSREP: evs::proto(46947ae0, GATHER, view_id(TRANS,46947ae0,0)) suspecting node: 345d3b05
2020-04-29  5:07:14 139926753629888 [Note] WSREP: (46947ae0, 'tcp://0.0.0.0:4567') turning message relay requesting off
2020-04-29  5:07:14 139926753629888 [Note] WSREP: evs::proto(46947ae0, GATHER, view_id(TRANS,46947ae0,0)) suspecting node: 345d3b05
2020-04-29  5:07:14 139926753629888 [Note] WSREP: evs::proto(46947ae0, GATHER, view_id(TRANS,46947ae0,0)) suspecting node: 345d3b05
2020-04-29  5:07:15 139926753629888 [Note] WSREP: evs::proto(46947ae0, GATHER, view_id(TRANS,46947ae0,0)) suspecting node: 345d3b05
2020-04-29  5:07:15 139926753629888 [Note] WSREP: evs::proto(46947ae0, GATHER, view_id(TRANS,46947ae0,0)) suspecting node: 345d3b05
2020-04-29  5:07:16 139926753629888 [Note] WSREP: declaring 4813df72 at tcp://10.24.0.6:4567 stable
2020-04-29  5:07:16 139926753629888 [Warning] WSREP: no nodes coming from prim view, prim not possible
2020-04-29  5:07:16 139926753629888 [Note] WSREP: view(view_id(NON_PRIM,46947ae0,1) memb {
	46947ae0,0
	4813df72,0
} joined {
} left {
} partitioned {
})

mariadb-0 crashlooped while there were no mariadb-1 or mariadb-2: WSREP: gcs/src/gcs.cpp:gcs_open():1458: Failed to open channel 'my_wsrep_cluster' at 'gcomm://mariadb-0.mariadb,mariadb-1.mariadb,mariadb-2.mariadb': -110 (Connection timed out)

solsson · 2020-04-29T05:56:24Z

We've been running in very stable and very ephemeral clusters - nothing inbetween - for a long time so I had forgotten about the DNS based peer detection in the init script. Because the maradb service tolerates unready endpoints the logic will actually only trigger if mariadb-0 is the only scheduled pod.

This PR is now a more optimistic approach to recovery. It assumes that partition is rare, and if it happens that we're ok with selecting a random replica as recovery point. If the MariaDB cluster goes down without partition the replicas all have sufficiently accurate state, therefore we can bootstrap on the first pod. If there was partition some data is likely lost. The branch name is misleading now.

solsson · 2020-04-30T05:15:28Z

I suspect that the reliability of the getent hosts method is insufficient. Pod zero starts normally sometimes anyway and fails to go ready, which means the other pods don't start.

2020-04-29 08:19:21+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 1:10.2.31+maria~bionic started.
2020-04-29 08:19:22+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2020-04-29 08:19:22+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 1:10.2.31+maria~bionic started.
2020-04-29  8:19:22 140027495421632 [Note] mysqld (mysqld 10.2.31-MariaDB-1:10.2.31+maria~bionic) starting as process 1 ...
2020-04-29  8:19:22 140027495421632 [Note] WSREP: Read nil XID from storage engines, skipping position init
2020-04-29  8:19:22 140027495421632 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2020-04-29  8:19:22 140027495421632 [Note] WSREP: wsrep_load(): Galera 25.3.28(r3875) by Codership Oy <[email protected]> loaded successfully.
2020-04-29  8:19:22 140027495421632 [Note] WSREP: CRC-32C: using hardware acceleration.
2020-04-29  8:19:22 140027495421632 [Note] WSREP: Found saved state: aa953e30-89db-11ea-8136-fb1e0dc288d3:30, safe_to_bootstrap: 1
2020-04-29  8:19:22 140027495421632 [Note] WSREP: Passing config to GCS: base_dir = /data/db/; base_host = 10.24.2.3; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /data/db/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/db//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false;
2020-04-29  8:19:22 140027495421632 [Note] WSREP: GCache history reset: aa953e30-89db-11ea-8136-fb1e0dc288d3:0 -> aa953e30-89db-11ea-8136-fb1e0dc288d3:30
2020-04-29  8:19:22 140027495421632 [Note] WSREP: Assign initial position for certification: 30, protocol version: -1
2020-04-29  8:19:22 140027495421632 [Note] WSREP: wsrep_sst_grab()
2020-04-29  8:19:22 140027495421632 [Note] WSREP: Start replication
2020-04-29  8:19:22 140027495421632 [Note] WSREP: 'wsrep-new-cluster' option used, bootstrapping the cluster
2020-04-29  8:19:22 140027495421632 [Note] WSREP: Setting initial position to aa953e30-89db-11ea-8136-fb1e0dc288d3:30
2020-04-29  8:19:22 140027495421632 [Note] WSREP: protonet asio version 0
2020-04-29  8:19:22 140027495421632 [Note] WSREP: Using CRC-32C for message checksums.
2020-04-29  8:19:22 140027495421632 [Note] WSREP: backend: asio
2020-04-29  8:19:22 140027495421632 [Note] WSREP: gcomm thread scheduling priority set to other:0 
2020-04-29  8:19:22 140027495421632 [Warning] WSREP: access file(/data/db//gvwstate.dat) failed(No such file or directory)
2020-04-29  8:19:22 140027495421632 [Note] WSREP: restore pc from disk failed
2020-04-29  8:19:22 140027495421632 [Note] WSREP: GMCast version 0
2020-04-29  8:19:22 140027495421632 [Warning] WSREP: Failed to resolve tcp://mariadb-0.mariadb:4567
2020-04-29  8:19:32 140027495421632 [Warning] WSREP: Failed to resolve tcp://mariadb-1.mariadb:4567
2020-04-29  8:19:32 140027495421632 [Warning] WSREP: Failed to resolve tcp://mariadb-2.mariadb:4567
2020-04-29  8:19:32 140027495421632 [Note] WSREP: (219d8b6e, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2020-04-29  8:19:32 140027495421632 [Note] WSREP: (219d8b6e, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2020-04-29  8:19:32 140027495421632 [Note] WSREP: EVS version 0
2020-04-29  8:19:32 140027495421632 [Note] WSREP: gcomm: bootstrapping new group 'my_wsrep_cluster'
2020-04-29  8:19:32 140027495421632 [Note] WSREP: start_prim is enabled, turn off pc_recovery
2020-04-29  8:19:32 140027495421632 [Note] WSREP: Node 219d8b6e state prim
2020-04-29  8:19:32 140027495421632 [Note] WSREP: view(view_id(PRIM,219d8b6e,1) memb {
	219d8b6e,0
} joined {
} left {
} partitioned {
})
2020-04-29  8:19:32 140027495421632 [Note] WSREP: save pc into disk
2020-04-29  8:19:32 140027495421632 [Note] WSREP: gcomm: connected
2020-04-29  8:19:32 140027495421632 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2020-04-29  8:19:32 140027495421632 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2020-04-29  8:19:32 140027495421632 [Note] WSREP: Opened channel 'my_wsrep_cluster'
2020-04-29  8:19:32 140027495421632 [Note] WSREP: Waiting for SST to complete.
2020-04-29  8:19:32 140027111511808 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
2020-04-29  8:19:32 140027111511808 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 27968c7f-89f2-11ea-831a-2bae58c03ca1
2020-04-29  8:19:32 140027111511808 [Note] WSREP: STATE EXCHANGE: sent state msg: 27968c7f-89f2-11ea-831a-2bae58c03ca1
2020-04-29  8:19:32 140027111511808 [Note] WSREP: STATE EXCHANGE: got state msg: 27968c7f-89f2-11ea-831a-2bae58c03ca1 from 0 (mariadb-0)
2020-04-29  8:19:32 140027111511808 [Note] WSREP: Quorum results:
	version    = 6,
	component  = PRIMARY,
	conf_id    = 0,
	members    = 1/1 (joined/total),
	act_id     = 30,
	last_appl. = -1,
	protocols  = 0/9/3 (gcs/repl/appl),
	group UUID = aa953e30-89db-11ea-8136-fb1e0dc288d3
2020-04-29  8:19:32 140027111511808 [Note] WSREP: Flow-control interval: [16, 16]
2020-04-29  8:19:32 140027111511808 [Note] WSREP: Trying to continue unpaused monitor
2020-04-29  8:19:32 140027111511808 [Note] WSREP: Restored state OPEN -> JOINED (30)
2020-04-29  8:19:32 140027111511808 [Note] WSREP: Member 0.0 (mariadb-0) synced with group.
2020-04-29  8:19:32 140027111511808 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 30)
2020-04-29  8:19:32 140027493775104 [Note] WSREP: New cluster view: global state: aa953e30-89db-11ea-8136-fb1e0dc288d3:30, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 3
2020-04-29  8:19:32 140027495421632 [Note] WSREP: SST complete, seqno: 30
2020-04-29  8:19:32 140027495421632 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2020-04-29  8:19:32 140027495421632 [Note] InnoDB: Uses event mutexes
2020-04-29  8:19:32 140027495421632 [Note] InnoDB: Compressed tables use zlib 1.2.11
2020-04-29  8:19:32 140027495421632 [Note] InnoDB: Using Linux native AIO
2020-04-29  8:19:32 140027495421632 [Note] InnoDB: Number of pools: 1
2020-04-29  8:19:32 140027495421632 [Note] InnoDB: Using SSE2 crc32 instructions
2020-04-29  8:19:32 140027495421632 [Note] InnoDB: Initializing buffer pool, total size = 256M, instances = 1, chunk size = 128M
2020-04-29  8:19:32 140027495421632 [Note] InnoDB: Completed initialization of buffer pool
2020-04-29  8:19:32 140026303424256 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2020-04-29  8:19:32 140027495421632 [Note] InnoDB: Highest supported file format is Barracuda.
2020-04-29  8:19:33 140027495421632 [Note] InnoDB: 128 out of 128 rollback segments are active.
2020-04-29  8:19:33 140027495421632 [Note] InnoDB: Creating shared tablespace for temporary tables
2020-04-29  8:19:33 140027495421632 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2020-04-29  8:19:33 140027495421632 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2020-04-29  8:19:33 140027495421632 [Note] InnoDB: Waiting for purge to start
2020-04-29  8:19:33 140027495421632 [Note] InnoDB: 5.7.29 started; log sequence number 6413511
2020-04-29  8:19:33 140025933068032 [Note] InnoDB: Loading buffer pool(s) from /data/db/ib_buffer_pool
2020-04-29  8:19:33 140027495421632 [Note] Plugin 'FEEDBACK' is disabled.
2020-04-29  8:19:33 140027495421632 [Note] Server socket created on IP: '0.0.0.0'.
2020-04-29  8:19:33 140027495421632 [Note] Reading of all Master_info entries succeeded
2020-04-29  8:19:33 140027495421632 [Note] Added new Master_info '' to hash table
2020-04-29  8:19:33 140027495421632 [Note] mysqld: ready for connections.
Version: '10.2.31-MariaDB-1:10.2.31+maria~bionic'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
2020-04-29  8:19:33 140027493775104 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-04-29  8:19:33 140027493775104 [Note] WSREP: REPL Protocols: 9 (4, 2)
2020-04-29  8:19:33 140027493775104 [Note] WSREP: Assign initial position for certification: 30, protocol version: 4
2020-04-29  8:19:33 140027363362560 [Note] WSREP: Service thread queue flushed.
2020-04-29  8:19:33 140027493775104 [Note] WSREP: GCache history reset: aa953e30-89db-11ea-8136-fb1e0dc288d3:0 -> aa953e30-89db-11ea-8136-fb1e0dc288d3:30
2020-04-29  8:19:33 140027493775104 [Note] WSREP: Synchronized with group, ready for connections
2020-04-29  8:19:33 140027493775104 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-04-29  8:19:33 140025933068032 [Note] InnoDB: Buffer pool(s) load completed at 200429  8:19:33
2020-04-29  8:33:01 140026845492992 [Warning] IP address '10.24.1.138' could not be resolved: Name or service not known
2020-04-29  8:38:09 140027124238080 [Note] WSREP: (219d8b6e, 'tcp://0.0.0.0:4567') connection established to c10e7d09 tcp://10.24.0.4:4567
2020-04-29  8:38:09 140027124238080 [Note] WSREP: (219d8b6e, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2020-04-29  8:38:09 140027124238080 [Note] WSREP: declaring 42db4a84 at tcp://10.24.1.122:4567 stable
2020-04-29  8:38:09 140027124238080 [Note] WSREP: declaring c10e7d09 at tcp://10.24.0.4:4567 stable
2020-04-29  8:38:09 140027124238080 [Warning] WSREP: 219d8b6e conflicting prims: my prim: view_id(PRIM,219d8b6e,1) other prim: view_id(PRIM,42db4a84,12)
2020-04-29  8:38:09 140027124238080 [ERROR] WSREP: caught exception in PC, state dump to stderr follows:
pc::Proto{uuid=219d8b6e,start_prim=1,npvo=0,ignore_sb=0,ignore_quorum=0,state=1,last_sent_seq=6,checksum=0,instances=
	219d8b6e,prim=1,un=0,last_seq=6,last_prim=view_id(PRIM,219d8b6e,1),to_seq=5,weight=1,segment=0
,state_msgs=
	219d8b6e,pcmsg{ type=STATE, seq=0, flags= 0, node_map {	219d8b6e,prim=1,un=0,last_seq=6,last_prim=view_id(PRIM,219d8b6e,1),to_seq=5,weight=1,segment=0
}}
,current_view=view(view_id(REG,219d8b6e,13) memb {
	219d8b6e,0
	42db4a84,0
	c10e7d09,0
} joined {
	42db4a84,0
	c10e7d09,0
} left {
} partitioned {
}),pc_view=view(view_id(PRIM,219d8b6e,1) memb {
	219d8b6e,0
} joined {
} left {
} partitioned {
}),mtu=32636}
2020-04-29  8:38:09 140027124238080 [Note] WSREP: {v=0,t=1,ut=255,o=4,s=0,sr=0,as=-1,f=4,src=42db4a84,srcvid=view_id(REG,219d8b6e,13),insvid=view_id(UNKNOWN,00000000,0),ru=00000000,r=[-1,-1],fs=53119,nl=(
)
} 64
2020-04-29  8:38:09 140027124238080 [ERROR] WSREP: exception caused by message: {v=0,t=3,ut=255,o=1,s=0,sr=-1,as=0,f=4,src=42db4a84,srcvid=view_id(REG,219d8b6e,13),insvid=view_id(UNKNOWN,00000000,0),ru=00000000,r=[-1,-1],fs=53121,nl=(
)
}
 state after handling message: evs::proto(evs::proto(219d8b6e, OPERATIONAL, view_id(REG,219d8b6e,13)), OPERATIONAL) {
current_view=view(view_id(REG,219d8b6e,13) memb {
	219d8b6e,0
	42db4a84,0
	c10e7d09,0
} joined {
} left {
} partitioned {
}),
input_map=evs::input_map: {aru_seq=0,safe_seq=0,node_index=node: {idx=0,range=[1,0],safe_seq=0} node: {idx=1,range=[1,0],safe_seq=0} node: {idx=2,range=[1,0],safe_seq=0} },
fifo_seq=1138,
last_sent=0,
known:
219d8b6e at 
{o=1,s=0,i=1,fs=-1,}
42db4a84 at tcp://10.24.1.122:4567
{o=1,s=0,i=1,fs=53121,}
c10e7d09 at tcp://10.24.0.4:4567
{o=1,s=0,i=1,fs=8,}
 }2020-04-29  8:38:09 140027124238080 [ERROR] WSREP: exception from gcomm, backend must be restarted: 219d8b6e aborting due to conflicting prims: older overrides (FATAL)
	 at gcomm/src/pc_proto.cpp:handle_state():1001
2020-04-29  8:38:09 140027124238080 [Note] WSREP: gcomm: terminating thread
2020-04-29  8:38:09 140027124238080 [Note] WSREP: gcomm: joining thread
2020-04-29  8:38:09 140027124238080 [Note] WSREP: gcomm: closing backend
2020-04-29  8:38:09 140027124238080 [Note] WSREP: Forced PC close
2020-04-29  8:38:09 140027124238080 [Warning] WSREP: discarding 2 messages from message index
2020-04-29  8:38:09 140027124238080 [Note] WSREP: gcomm: closed
2020-04-29  8:38:09 140027111511808 [Note] WSREP: Received self-leave message.
2020-04-29  8:38:09 140027111511808 [Note] WSREP: comp msg error in core 103
2020-04-29  8:38:09 140027111511808 [Note] WSREP: Closing send monitor...
2020-04-29  8:38:09 140027111511808 [Note] WSREP: Closed send monitor.
2020-04-29  8:38:09 140027111511808 [Note] WSREP: Closing replication queue.
2020-04-29  8:38:09 140027111511808 [Note] WSREP: Closing slave action queue.
2020-04-29  8:38:09 140027111511808 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 33)
2020-04-29  8:38:09 140027111511808 [Note] WSREP: RECV thread exiting -103: Software caused connection abort
2020-04-29  8:38:09 140027493775104 [Note] WSREP: New cluster view: global state: 00000000-0000-0000-0000-000000000000:0, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version -1
2020-04-29  8:38:09 140027493775104 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-04-29  8:38:09 140027493775104 [Note] WSREP: applier thread exiting (code:6)

at fresh apply and scale from 0 directly to >1 we'll handle pod churn better now. To reduce Init:Error events we could: - Try to reduce time to readiness - Add waits upon detection of missing peers, before exiting - Retry within the script for pod indexes >0

solsson · 2020-06-04T10:46:52Z

The init script actually depended on finding Ready members of the statefulset, not all existent members. With a new service that reports such entries we can now use Parallel pod management policy. Verified to fix #31 in one instance caused by ephemeral nodes.

solsson · 2020-06-04T10:50:27Z

There's a slightly outdated piece of guidance now at

kubernetes-mysql-cluster/10conf-d.yml

Line 137 in fedf5f0

    
                         echo "Note: to start the other pods you need to edit OrderedReady and add a command: --wsrep-recover"

but I'm not sure we need to fix that now with AUTO_RECOVERY_MODE.

solsson added 3 commits April 29, 2020 07:01

Supports any of the startup modes as "auto"

c9f2585

The way we detect presence of peers actually require OrderedReady

fabc7e8

solsson changed the title ~~Switch to Parallel pod management policy~~ Enable unattented recovery from zero replicas Apr 29, 2020

solsson mentioned this pull request May 27, 2020

Manual recovery after "gcs connection failed" with one statefulset pod gone #31

Open

solsson added 3 commits June 4, 2020 12:34

Fixes the most apparent flaw that triggers #31 split brain

c3353a9

There's an outer if that guards against this condition

4966a39

Mariadb 10.2.32

128baa4

solsson merged commit 7a6d420 into master Jul 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable unattented recovery from zero replicas #30

Enable unattented recovery from zero replicas #30

solsson commented Apr 29, 2020

solsson commented Apr 29, 2020 •

edited

Loading

solsson commented Apr 30, 2020

solsson commented Jun 4, 2020

solsson commented Jun 4, 2020

Enable unattented recovery from zero replicas #30

Enable unattented recovery from zero replicas #30

Conversation

solsson commented Apr 29, 2020

solsson commented Apr 29, 2020 • edited Loading

solsson commented Apr 30, 2020

solsson commented Jun 4, 2020

solsson commented Jun 4, 2020

solsson commented Apr 29, 2020 •

edited

Loading