Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

**OLD** [DPE-4575] Add voting settle logic at start and stop service #345

Closed
wants to merge 19 commits into from

Conversation

phvalguima
Copy link
Contributor

@phvalguima phvalguima commented Jul 2, 2024

Currently, we are having several issues with 2-node clusters as OpenSearch won't automatically manage voting anymore. This PR will add logic to manually manage the 2-node cluster scenario using the voting_exclusions API. Whenever we have only two nodes active and registered to the cluster as voting units, _settle_voting_exclusions will exclude the non-leader unit from voting. It also excludes the leaving unit, as we need to cover for the scenario where that unit is the cluster_manager and moving from 3->2 units we may end up with stale metadata.

This PR also makes exclusion mandatory to happen at start / stop. Therefore, we are sure the voting count will always be correct in each stage.

For example, moving from 3->2->1 units results:
3->2) The cluster will set 2x voting exclusions: one for the unit leaving (if this is the cluster manager, that position will move away) and one for one of the 2x remaining units, following a sorted list
2->1) The voting exclusions are removed

Likewise, on scaling up:
1->2) One voting exclusion is added, following the same sorted list of all node names
2->3) Voting exclusions are removed

This PR touches #324, #326, #327 and in #325 this behavior is also observed. This is also linked to issues in our ha/test_storage.py, has one can see in this run.

Closes: #324, #326, #327

@phvalguima phvalguima marked this pull request as ready for review July 2, 2024 17:02
@phvalguima phvalguima changed the title [DPE-4507] Add voting settle logic at stop service [DPE-4507] Add voting settle logic at start and stop service Jul 2, 2024
Copy link
Contributor

@reneradoi reneradoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @phvalguima I've tested this locally and ran into issues when removing the application with only two units left. It runs into

unit-opensearch-3: 09:41:16 ERROR unit.opensearch/3.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-opensearch-3/charm/./src/charm.py", line 267, in <module>
    main(OpenSearchOperatorCharm)
  File "/var/lib/juju/agents/unit-opensearch-3/charm/venv/ops/main.py", line 544, in main
    manager.run()
  File "/var/lib/juju/agents/unit-opensearch-3/charm/venv/ops/main.py", line 520, in run
    self._emit()
  File "/var/lib/juju/agents/unit-opensearch-3/charm/venv/ops/main.py", line 509, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-opensearch-3/charm/venv/ops/main.py", line 143, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-opensearch-3/charm/venv/ops/framework.py", line 350, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-opensearch-3/charm/venv/ops/framework.py", line 849, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-opensearch-3/charm/venv/ops/framework.py", line 939, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-opensearch-3/charm/lib/charms/opensearch/v0/opensearch_base_charm.py", line 523, in _on_opensearch_data_storage_detaching
    raise Exception("Unable to acquire lock: Another unit is starting or stopping.")
Exception: Unable to acquire lock: Another unit is starting or stopping.

and can't get out of it anymore. Furthermore I've seen some failures on the integration tests, so it looks like it does not really work yet.

lib/charms/opensearch/v0/opensearch_base_charm.py Outdated Show resolved Hide resolved
lib/charms/opensearch/v0/opensearch_base_charm.py Outdated Show resolved Hide resolved
@phvalguima phvalguima force-pushed the DPE-4057-voting-exclusion-2-units branch from 8ac22f7 to d37b334 Compare July 8, 2024 11:28
@phvalguima phvalguima requested a review from reneradoi July 9, 2024 15:34
@phvalguima phvalguima force-pushed the DPE-4057-voting-exclusion-2-units branch from 9f168d6 to 1bc6555 Compare July 10, 2024 07:34
reneradoi
reneradoi previously approved these changes Jul 10, 2024
Copy link
Contributor

@reneradoi reneradoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@phvalguima phvalguima changed the title [DPE-4507] Add voting settle logic at start and stop service [DPE-575] Add voting settle logic at start and stop service Jul 11, 2024
@phvalguima phvalguima changed the title [DPE-575] Add voting settle logic at start and stop service [DPE-4575] Add voting settle logic at start and stop service Jul 11, 2024
@phvalguima phvalguima changed the title [DPE-4575] Add voting settle logic at start and stop service **OLD** [DPE-4575] Add voting settle logic at start and stop service Jul 16, 2024
@phvalguima phvalguima closed this Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

One or more replica shards...
2 participants