Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-4115] Performance Profile Support #466

Merged
merged 79 commits into from
Oct 21, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
a62f180
Sync docs from Discourse (#451)
github-actions[bot] Sep 26, 2024
c9edade
[DPE-5558] Break CA rotation into integration test groups (#458)
phvalguima Sep 27, 2024
3845e38
Adding first batch of changes to the charm on how to process profiles…
phvalguima Sep 27, 2024
7b42f47
Add unit tests for profile management
phvalguima Oct 1, 2024
a3d033c
Extend replace() to cover multilines and look into all possible options
phvalguima Oct 1, 2024
776c3f4
Add index/component template APIs
phvalguima Oct 1, 2024
4b52e33
Add support for profile option in the integration tests
phvalguima Oct 1, 2024
50ea12b
lint fixes
phvalguima Oct 1, 2024
51c7cd5
Fix first batch of unit tests
phvalguima Oct 2, 2024
73cd84b
Remove some dead LoCs because of commenting out lines of code
phvalguima Oct 2, 2024
7c47f20
Update to 1 instead of 1-all
phvalguima Oct 2, 2024
64f9bc9
Update the changes following feedback
phvalguima Oct 11, 2024
c856d3a
lint fix
phvalguima Oct 11, 2024
9cecd5f
Update more tests and fixes
phvalguima Oct 12, 2024
6a85319
Rollback the original internal_users.yml
phvalguima Oct 12, 2024
2d9ce65
Merge remote-tracking branch 'origin' into DPE-4115-performance-profiles
phvalguima Oct 12, 2024
8712a84
Merge branch '2/edge' into DPE-4115-performance-profiles
phvalguima Oct 13, 2024
5363de7
Simplify test_charm as changing to production profile causes a lot of…
phvalguima Oct 13, 2024
60292ed
Remove service started + add set_watermark to small deployment on plu…
phvalguima Oct 13, 2024
b63d02d
Moved test HA to set profile=staging
phvalguima Oct 13, 2024
51d92e7
Remove any profile change from int. tests; fix integration tests
phvalguima Oct 13, 2024
9fab851
Move to the Deployment Description
phvalguima Oct 15, 2024
bfe940e
Remove refs to template apply event
phvalguima Oct 15, 2024
9b50a3d
Fixes following review
phvalguima Oct 15, 2024
42f464d
Roll back internal_users
phvalguima Oct 15, 2024
a6ee6db
Add peer relation listener
phvalguima Oct 15, 2024
d53a4f7
Add _on_install hook
phvalguima Oct 15, 2024
b1e77b5
Add perf profile to track install event
phvalguima Oct 15, 2024
db0d13e
Check file exists on _on_install
phvalguima Oct 15, 2024
86331be
Fix peer cluster relation
phvalguima Oct 16, 2024
22fbe66
Fix the config-changed routine
phvalguima Oct 16, 2024
13082f0
Update the e.response_code on apply template routine
phvalguima Oct 16, 2024
b8c3c59
Fix the append scenarios in replace()
phvalguima Oct 16, 2024
7670d8e
Add multiline replace
phvalguima Oct 16, 2024
37d94d5
Merge remote-tracking branch 'origin/main' into DPE-4115-performance-…
phvalguima Oct 16, 2024
78deee7
Merge remote-tracking branch 'origin/DPE-5677-fix-replace-file-persis…
phvalguima Oct 16, 2024
59790ae
Add profile change in test_charm.py
phvalguima Oct 16, 2024
1038a9b
Remove repeated code
phvalguima Oct 16, 2024
1450e56
Fix path in helper_conf_setter.replace()
phvalguima Oct 16, 2024
5e12ca8
Update lib/charms/opensearch/v0/opensearch_base_charm.py
phvalguima Oct 16, 2024
b6af43e
Remove perf_profile from opensearch_distro
phvalguima Oct 16, 2024
4f46a44
Add post merge with upstream branch
phvalguima Oct 16, 2024
a4ae60f
Merge remote-tracking branch 'origin' into DPE-4115-performance-profiles
phvalguima Oct 16, 2024
7dbd1e3
Move profile to PeerClusterConfig
phvalguima Oct 17, 2024
ebc0e58
Add minor fixes for performance profile
phvalguima Oct 17, 2024
a001dd2
Set correct order in config-changed
phvalguima Oct 17, 2024
07f99a1
Add config-changed
phvalguima Oct 17, 2024
b2c27d4
Add peer relation support and restart logic
phvalguima Oct 17, 2024
e59a628
Merge remote-tracking branch 'origin' into DPE-4115-performance-profiles
phvalguima Oct 18, 2024
deae3c6
Remove non-testing profile
phvalguima Oct 18, 2024
345385b
Add support for upgrade from older versions
phvalguima Oct 18, 2024
6c72f33
Rollback unit test + fix refresh command
phvalguima Oct 18, 2024
7c6df55
Rollback to return None if peer relation not set
phvalguima Oct 18, 2024
cea3c29
Rollback the config-changed to use refresh_relation_data
phvalguima Oct 18, 2024
cd6b419
Fix return empty after upgrade
phvalguima Oct 18, 2024
7d5f232
Update profiles
phvalguima Oct 18, 2024
ebae15a
Add upgrade charm check
phvalguima Oct 18, 2024
0cd057d
Minor fixes for the upgrade + reviews
phvalguima Oct 18, 2024
6a076da
Simplify the peer-cluster event
phvalguima Oct 18, 2024
b6a2989
Remove the ^
phvalguima Oct 18, 2024
44014ff
Fix unit tests
phvalguima Oct 18, 2024
9e60288
Move away from refresh_relation_data
phvalguima Oct 18, 2024
0754ce6
Move to run()
phvalguima Oct 18, 2024
1b69290
Simplify current()
phvalguima Oct 18, 2024
0dcdcb8
fix lint
phvalguima Oct 18, 2024
960c1ef
Fix upgrade assert
phvalguima Oct 18, 2024
c1f6b32
fix lint
phvalguima Oct 18, 2024
60e829a
Update helpers.py
phvalguima Oct 19, 2024
9c20cd3
Add logging; update upgrade helper to dispatch arguments in the corre…
Oct 19, 2024
fc2146c
Update test_ha_multi_clusters.py
phvalguima Oct 20, 2024
4a8f5f0
Update test_ha_multi_clusters.py
phvalguima Oct 20, 2024
f481297
Update test_tls.py
phvalguima Oct 20, 2024
72b8517
Update test_ca_rotation.py
phvalguima Oct 20, 2024
a67e876
Fix: update config options across CI
phvalguima Oct 20, 2024
047579d
Move away from main orch. leadership
phvalguima Oct 21, 2024
cc12e1a
Readd the apply index-template bool
phvalguima Oct 21, 2024
e7af36d
Add custom event with property
phvalguima Oct 21, 2024
4419a66
Reorder self.current + apply perf. templates
phvalguima Oct 21, 2024
1df45a7
Update helper_conf_setter.py
phvalguima Oct 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,13 @@ options:
default: true
type: boolean
description: Enable opensearch-knn

profile:
type: string
default: "production"
Mehdi-Bendriss marked this conversation as resolved.
Show resolved Hide resolved
description: |
Profile representing the scope of deployment, and used to tune resource allocation.
Allowed values are: "production", "staging" or "testing"
Production will tune opensearch for maximum performance while default will tune for
minimal running performance.
Performance tuning is described on: https://opensearch.org/docs/latest/tuning-your-cluster/performance/
2 changes: 2 additions & 0 deletions lib/charms/opensearch/v0/constants_charm.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,3 +118,5 @@

# User-face Backup ID format
OPENSEARCH_BACKUP_ID_FORMAT = "%Y-%m-%dT%H:%M:%SZ"

PERFORMANCE_PROFILE = "profile"
3 changes: 1 addition & 2 deletions lib/charms/opensearch/v0/helper_conf_setter.py
Original file line number Diff line number Diff line change
Expand Up @@ -272,14 +272,13 @@ def replace(
output_file: Target file for the result config, by default same as config_file
"""
path = f"{self.base_path}{config_file}"

if not exists(path):
raise FileNotFoundError(f"{path} not found.")

with open(path, "r+") as f:
data = f.read()

if regex and old_val and re.compile(old_val).match(data):
if regex and old_val and re.compile(old_val, re.MULTILINE).findall(data):
Mehdi-Bendriss marked this conversation as resolved.
Show resolved Hide resolved
data = re.sub(r"{}".format(old_val), f"{new_val}", data)
elif old_val and old_val in data:
data = data.replace(old_val, new_val)
Expand Down
85 changes: 85 additions & 0 deletions lib/charms/opensearch/v0/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@
LIBPATCH = 1


MIN_HEAP_SIZE = 1024 * 1024 # 1GB in KB
MAX_HEAP_SIZE = 32 * MIN_HEAP_SIZE # 32GB in KB


class Model(ABC, BaseModel):
"""Base model class."""

Expand Down Expand Up @@ -153,6 +157,14 @@ class DeploymentType(BaseStrEnum):
OTHER = "other"


class PerformanceType(BaseStrEnum):
"""Performance types available."""

PRODUCTION = "production"
STAGING = "staging"
TESTING = "testing"


class StartMode(BaseStrEnum):
"""Mode of start of units in this deployment."""

Expand Down Expand Up @@ -346,3 +358,76 @@ def promote_failover(self) -> None:
self.main_app = self.failover_app
self.main_rel_id = self.failover_rel_id
self.delete("failover")


class OpenSearchPerfProfile(Model):
"""Generates an immutable description of the performance profile."""

class Config:
"""Pydantic config for this model."""

arbitrary_types_allowed = True
phvalguima marked this conversation as resolved.
Show resolved Hide resolved

typ: PerformanceType
heap_size_in_kb: int = MIN_HEAP_SIZE
opensearch_yml: Dict[str, str] = {}
charmed_index_template: Dict[str, str] = {}
charmed_component_templates: Dict[str, str] = {}

@classmethod
def from_str(cls, input_str: str):
phvalguima marked this conversation as resolved.
Show resolved Hide resolved
"""Create a new instance of this class from a stringified json/dict repr."""
return cls(typ=PerformanceType(input_str))

@root_validator
def set_options(cls, values): # noqa: N805
"""Generate the attributes depending on the input."""
val = values["typ"]
if isinstance(val, str):
val = PerformanceType(val)
phvalguima marked this conversation as resolved.
Show resolved Hide resolved

if val == PerformanceType.PRODUCTION:
values["heap_size_in_kb"] = min(
int(0.50 * OpenSearchPerfProfile.meminfo()["MemTotal"]),
MAX_HEAP_SIZE,
)

if val == PerformanceType.STAGING:
values["heap_size_in_kb"] = min(
int(0.25 * OpenSearchPerfProfile.meminfo()["MemTotal"]),
MAX_HEAP_SIZE,
)

if val == PerformanceType.TESTING:
values["heap_size_in_kb"] = MIN_HEAP_SIZE

if val != PerformanceType.TESTING:
phvalguima marked this conversation as resolved.
Show resolved Hide resolved
values["opensearch_yml"] = {"indices.memory.index_buffer_size": "25%"}

values["charmed_index_template"] = {
"charmed-index-tpl": {
"index_patterns": ["*"],
phvalguima marked this conversation as resolved.
Show resolved Hide resolved
"template": {
"settings": {
"number_of_replicas": "2",
phvalguima marked this conversation as resolved.
Show resolved Hide resolved
},
},
},
}

return values

@staticmethod
def meminfo() -> dict[str, float]:
"""Read the /proc/meminfo file and return the values.

According to the kernel source code, the values are always in kB:
https://github.com/torvalds/linux/blob/
2a130b7e1fcdd83633c4aa70998c314d7c38b476/fs/proc/meminfo.c#L31
"""
with open("/proc/meminfo") as f:
meminfo = f.read()
phvalguima marked this conversation as resolved.
Show resolved Hide resolved

return {
line.split()[0][:-1]: float(line.split()[1]) for line in meminfo.split("\n") if line
}
phvalguima marked this conversation as resolved.
Show resolved Hide resolved
89 changes: 81 additions & 8 deletions lib/charms/opensearch/v0/opensearch_base_charm.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

from charms.grafana_agent.v0.cos_agent import COSAgentProvider
from charms.opensearch.v0.constants_charm import (
PERFORMANCE_PROFILE,
AdminUser,
AdminUserInitProgress,
AdminUserNotConfigured,
Expand Down Expand Up @@ -49,7 +50,12 @@
generate_hashed_password,
generate_password,
)
from charms.opensearch.v0.models import DeploymentDescription, DeploymentType
from charms.opensearch.v0.models import (
DeploymentDescription,
DeploymentType,
OpenSearchPerfProfile,
PerformanceType,
)
from charms.opensearch.v0.opensearch_backups import backup
from charms.opensearch.v0.opensearch_config import OpenSearchConfig
from charms.opensearch.v0.opensearch_distro import OpenSearchDistribution
Expand Down Expand Up @@ -166,12 +172,21 @@ def __init__(self, handle, *, ignore_lock=False):
super().__init__(handle, ignore_lock=ignore_lock)


class _ApplyProfileTemplatesOpenSearch(EventBase):
phvalguima marked this conversation as resolved.
Show resolved Hide resolved
"""Attempt to acquire lock & restart OpenSearch.

The main reason to have a separate event, is to be able to wait for the cluster.
It defers otherwise and only defers the execution of this particular task.
"""


class OpenSearchBaseCharm(CharmBase, abc.ABC):
"""Base class for OpenSearch charms."""

_start_opensearch_event = EventSource(_StartOpenSearch)
_restart_opensearch_event = EventSource(_RestartOpenSearch)
_upgrade_opensearch_event = EventSource(_UpgradeOpenSearch)
_apply_profile_templates_event = EventSource(_ApplyProfileTemplatesOpenSearch)

def __init__(self, *args, distro: Type[OpenSearchDistribution] = None):
super().__init__(*args)
Expand Down Expand Up @@ -208,6 +223,10 @@ def __init__(self, *args, distro: Type[OpenSearchDistribution] = None):
self.framework.observe(self._restart_opensearch_event, self._restart_opensearch)
self.framework.observe(self._upgrade_opensearch_event, self._upgrade_opensearch)

self.framework.observe(
self._apply_profile_templates_event, self._on_apply_profile_templates
)

self.framework.observe(self.on.leader_elected, self._on_leader_elected)
self.framework.observe(self.on.start, self._on_start)
self.framework.observe(self.on.update_status, self._on_update_status)
Expand Down Expand Up @@ -250,6 +269,24 @@ def __init__(self, *args, distro: Type[OpenSearchDistribution] = None):
# in the deferred event queue
self._is_peer_rel_changed_deferred = False

def _on_apply_profile_templates(self, event: EventBase):
"""Apply the profile templates.

The main reason to have a separate event, is to be able to wait for the cluster. It
defers otherwise and only defers the execution of this particular task.
"""
if not self.opensearch_peer_cm.deployment_desc():
logger.info("Applying profile templates but cluster not ready yet.")
event.defer()
return

if self.opensearch_peer_cm.deployment_desc().typ != DeploymentType.MAIN_ORCHESTRATOR:
phvalguima marked this conversation as resolved.
Show resolved Hide resolved
phvalguima marked this conversation as resolved.
Show resolved Hide resolved
logger.info("Applying profile templates but not the main orchestrator.")
return

# Configure templates if needed
self.opensearch.apply_perf_templates_if_needed()

@property
@abc.abstractmethod
def _upgrade(self) -> typing.Optional[upgrade.Upgrade]:
Expand Down Expand Up @@ -355,6 +392,15 @@ def cleanup():
logger.error("Service previously started but now misses the snap.")
return

# Store the current perf. profile we are applying
if self.unit.is_leader():
self.peers_data.put(
phvalguima marked this conversation as resolved.
Show resolved Hide resolved
Scope.APP,
PERFORMANCE_PROFILE,
PerformanceType(self.config.get(PERFORMANCE_PROFILE, PerformanceType.PRODUCTION)),
)
self._apply_profile_templates_event.emit()

# apply the directives computed and emitted by the peer cluster manager
if not self._apply_peer_cm_directives_and_check_if_can_start():
event.defer()
Expand Down Expand Up @@ -676,22 +722,49 @@ def _on_config_changed(self, event: ConfigChangedEvent): # noqa C901
if not self.plugin_manager.check_plugin_manager_ready():
return

if self.upgrade_in_progress:
phvalguima marked this conversation as resolved.
Show resolved Hide resolved
# The following changes in _on_config_changed are not supported during an upgrade
# Therefore, we leave now
logger.warning(
"Changing config during an upgrade is not supported. The charm may be in a broken, "
"unrecoverable state"
)
event.defer()
return

try:
if not self.plugin_manager.check_plugin_manager_ready():
raise OpenSearchNotFullyReadyError()

if self.unit.is_leader():
self.status.set(MaintenanceStatus(PluginConfigCheck), app=True)

if self.plugin_manager.run():
if self.upgrade_in_progress:
logger.warning(
"Changing config during an upgrade is not supported. The charm may be in a broken, "
"unrecoverable state"
restart_requested = self.plugin_manager.run()
if (
self.peers_data.get(Scope.APP, PERFORMANCE_PROFILE)
and PerformanceType(self.peers_data.get(Scope.APP, PERFORMANCE_PROFILE))
!= self.opensearch.perf_profile.typ
phvalguima marked this conversation as resolved.
Show resolved Hide resolved
):
self.opensearch.perf_profile = OpenSearchPerfProfile.from_str(
self.config.get(PERFORMANCE_PROFILE)
)
# If we have a running service, and our profile changed
# then we need a restart to apply the new profile
self.opensearch_config.apply_performance_profile(self.opensearch.perf_profile)
self._apply_profile_templates_event.emit()
phvalguima marked this conversation as resolved.
Show resolved Hide resolved
if self.unit.is_leader():
self.peers_data.put(
Scope.APP,
phvalguima marked this conversation as resolved.
Show resolved Hide resolved
PERFORMANCE_PROFILE,
PerformanceType(
self.config.get(PERFORMANCE_PROFILE, PerformanceType.PRODUCTION)
),
)
event.defer()
return
restart_requested = True

# Finally, we only need to restart in this case if we have already requested a restart
# and the service is actually running
if restart_requested and self.opensearch.is_service_started():
self._restart_opensearch_event.emit()
except (OpenSearchNotFullyReadyError, OpenSearchPluginError) as e:
if isinstance(e, OpenSearchNotFullyReadyError):
Expand Down
21 changes: 20 additions & 1 deletion lib/charms/opensearch/v0/opensearch_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

from charms.opensearch.v0.constants_tls import CertType
from charms.opensearch.v0.helper_security import normalized_tls_subject
from charms.opensearch.v0.models import App
from charms.opensearch.v0.models import App, OpenSearchPerfProfile
from charms.opensearch.v0.opensearch_distro import OpenSearchDistribution

# The unique Charmhub library identifier, never change it
Expand Down Expand Up @@ -69,6 +69,25 @@ def set_client_auth(self):
"-Djdk.tls.client.protocols=TLSv1.2",
)

def apply_performance_profile(self, profile: OpenSearchPerfProfile):
"""Apply the performance profile to the opensearch config."""
self._opensearch.config.replace(
self.JVM_OPTIONS,
"-Xms[0-9]+[kmgKMG]",
f"-Xms{str(profile.heap_size_in_kb)}k",
regex=True,
)

self._opensearch.config.replace(
self.JVM_OPTIONS,
"-Xmx[0-9]+[kmgKMG]",
f"-Xmx{str(profile.heap_size_in_kb)}k",
regex=True,
)

for key, val in profile.opensearch_yml.items():
self._opensearch.config.put(self.CONFIG_YML, key, val)

def set_admin_tls_conf(self, secrets: Dict[str, any]):
"""Configures the admin certificate."""
self._opensearch.config.put(
Expand Down
38 changes: 36 additions & 2 deletions lib/charms/opensearch/v0/opensearch_distro.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

import requests
import urllib3.exceptions
from charms.opensearch.v0.constants_charm import GeneratedRoles
from charms.opensearch.v0.constants_charm import PERFORMANCE_PROFILE, GeneratedRoles
from charms.opensearch.v0.helper_charm import (
format_unit_name,
mask_sensitive_information,
Expand All @@ -27,7 +27,12 @@
from charms.opensearch.v0.helper_conf_setter import YamlConfigSetter
from charms.opensearch.v0.helper_http import error_http_retry_log
from charms.opensearch.v0.helper_networking import get_host_ip, is_reachable
from charms.opensearch.v0.models import App, StartMode
from charms.opensearch.v0.models import (
App,
OpenSearchPerfProfile,
PerformanceType,
StartMode,
)
from charms.opensearch.v0.opensearch_exceptions import (
OpenSearchCmdError,
OpenSearchError,
Expand Down Expand Up @@ -98,6 +103,9 @@ def __init__(self, charm, peer_relation_name: str):
self.config = YamlConfigSetter(base_path=self.paths.conf)
self._charm = charm
self._peer_relation_name = peer_relation_name
self.perf_profile = OpenSearchPerfProfile.from_str(
phvalguima marked this conversation as resolved.
Show resolved Hide resolved
self._charm.config.get(PERFORMANCE_PROFILE, "production")
)

def install(self):
"""Install the package."""
Expand Down Expand Up @@ -398,6 +406,32 @@ def _build_paths(self) -> Paths:
"""Build the Paths object."""
pass

def apply_perf_templates_if_needed(self): # noqa: C901
"""Apply performance templates if needed."""
if self.perf_profile.typ == PerformanceType.TESTING:
# We try to remove the index and components' templates
for endpoint in [
"/_index_template/charmed-index-tpl",
]:
try:
self.request("DELETE", endpoint)
except OpenSearchHttpError:
pass
# Nothing to do anymore
return

for idx, template in self.perf_profile.charmed_index_template.items():
phvalguima marked this conversation as resolved.
Show resolved Hide resolved
try:
self.request("POST", f"/_index_template/{idx}", template)
except OpenSearchHttpError as e:
logger.error(f"Failed to apply index template: {e}")

for idx, template in self.perf_profile.charmed_component_templates.items():
try:
self.request("POST", f"/_component_template/{idx}", template)
except OpenSearchHttpError as e:
logger.error(f"Failed to apply component template: {e}")

def _set_env_variables(self):
"""Set the necessary environment variables."""
os.environ["OPENSEARCH_HOME"] = self.paths.home
Expand Down
Loading
Loading