Large Deployments - Failover fails to initialize the security index #518

skourta · 2024-12-05T09:38:10Z

Steps to reproduce

Deploy a large deployment:
- Main App: cluster_manager role only
- Failover: cluster_manager and data

Expected behavior

The large deployment boots up correctly.

Actual behavior

Failover is stuck at initializing the security index.

Versions

Operating system: Ubuntu 24.04.1 LTS

Juju CLI: 3.6.0-genericlinux-amd64

Juju agent: 3.5.3

Charm revision: 2/edge branch

LXD: 5.21.2 LTS

The text was updated successfully, but these errors were encountered:

reneradoi · 2024-12-05T12:42:36Z

I assume shis issue happens because the failover-unit has itself configured in the unicast_host file (as a config manager), but does not get initial_cluster_manager_nodes configured for itself:

[opensearch-failover-2.890] cluster-manager not discovered yet, this node has not previously joined a bootstrapped cluster, and [cluster.initial_cluster_manager_nodes] is empty on this node: have discovered [{opensearch-failover-2.890}{IHiqih9TSdysSYpXudpl3A}{VfR2BJfCTuOiiUSSjFZeeA}{10.54.237.136}{10.54.237.136:9300}{dm}{shard_indexing_pressure_enabled=true, app_id=ecd37465-df44-4398-866f-ec3a6877af2d/opensearch-failover}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, [::1]:9300, [::1]:9301, [::1]:9302, [::1]:9303, [::1]:9304, [::1]:9305] from hosts providers and [{opensearch-failover-2.890}{IHiqih9TSdysSYpXudpl3A}{VfR2BJfCTuOiiUSSjFZeeA}{10.54.237.136}{10.54.237.136:9300}{dm}{shard_indexing_pressure_enabled=true, app_id=ecd37465-df44-4398-866f-ec3a6877af2d/opensearch-failover}] from last-known cluster state; node term 0, last-accepted version 0 in term 0

This is configured in OpenSearchConfig.set_node() here if a unit has the cluster_manager role (which it does) and contributes to bootstrap (which it does not because only MAIN_ORCHESTRATORS are, see here).

In order to solve this issue, a decision needs to be made if:
a) this kind of deployment setup is considered valid (Main App: cluster_manager role only, Failover: cluster_manager and data) and, if so:
b) if FAILOVER_ORCHESTRATORS should also contribute to bootstrapping in this kind of deployment.

syncronize-issues-to-jira · 2024-12-06T13:13:26Z

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-6158.

This message was autogenerated

skourta added the bug Something isn't working label Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large Deployments - Failover fails to initialize the security index #518

Large Deployments - Failover fails to initialize the security index #518

skourta commented Dec 5, 2024

reneradoi commented Dec 5, 2024

syncronize-issues-to-jira bot commented Dec 6, 2024

Large Deployments - Failover fails to initialize the security index #518

Large Deployments - Failover fails to initialize the security index #518

Comments

skourta commented Dec 5, 2024

Steps to reproduce

Expected behavior

Actual behavior

Versions

reneradoi commented Dec 5, 2024

syncronize-issues-to-jira bot commented Dec 6, 2024