Skip to content

Commit

Permalink
Wait until after peer relation joined before acquiring lock
Browse files Browse the repository at this point in the history
## Issue
During initial startup (i.e. scale-up), a unit will request a lock via the peer databag until it gets a peer-relation-joined event and learns that other units of OpenSearch are online

## Solution
Wait until after the first peer-relation-joined event before trying to acquire lock (so that if [enough] units of opensearch are online, we request the opensearch lock instead of peer databag lock)
  • Loading branch information
carlcsaposs-canonical committed Apr 30, 2024
1 parent 2071f19 commit fe96c1a
Showing 1 changed file with 21 additions and 0 deletions.
21 changes: 21 additions & 0 deletions lib/charms/opensearch/v0/opensearch_locking.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import typing

import ops
from charms.opensearch.v0.constants_charm import PeerRelationName
from charms.opensearch.v0.helper_cluster import ClusterTopology
from charms.opensearch.v0.opensearch_exceptions import OpenSearchHttpError

Expand Down Expand Up @@ -221,6 +222,26 @@ def acquired(self) -> bool: # noqa: C901
host = self._charm.unit_ip
else:
host = None
if (
self._charm.app.planned_units() > 1
and (relation := self._charm.model.get_relation(PeerRelationName))
and not relation.units
):
# On initial startup (e.g. scaling up, on the new unit), `self._charm.alt_hosts` will
# be empty since it uses `Relation.units` on the `PeerRelationName`.
# Initial startup event sequence (some events omitted for brevity):
# - install
# - peer-relation-created
# - start
# - peer-relation-joined (e.g. for unit 2)
# - peer-relation-changed
# - peer-relation-joined (e.g. for unit 0)
# Until the peer relation joined event, `Relation.units` will be empty
# Therefore, before the first peer relation joined event, we should avoid acquiring the
# lock since otherwise we would fall back to the peer databag lock even if OpenSearch
# nodes were online.
logger.debug("[Node lock] Waiting for peer units before acquiring lock")
return False
alt_hosts = [host for host in self._charm.alt_hosts if self._opensearch.is_node_up(host)]
if host or alt_hosts:
logger.debug("[Node lock] 1+ opensearch nodes online")
Expand Down

0 comments on commit fe96c1a

Please sign in to comment.