Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-4115] Performance Profile Support #466

Merged
merged 79 commits into from
Oct 21, 2024
Merged
Show file tree
Hide file tree
Changes from 78 commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
a62f180
Sync docs from Discourse (#451)
github-actions[bot] Sep 26, 2024
c9edade
[DPE-5558] Break CA rotation into integration test groups (#458)
phvalguima Sep 27, 2024
3845e38
Adding first batch of changes to the charm on how to process profiles…
phvalguima Sep 27, 2024
7b42f47
Add unit tests for profile management
phvalguima Oct 1, 2024
a3d033c
Extend replace() to cover multilines and look into all possible options
phvalguima Oct 1, 2024
776c3f4
Add index/component template APIs
phvalguima Oct 1, 2024
4b52e33
Add support for profile option in the integration tests
phvalguima Oct 1, 2024
50ea12b
lint fixes
phvalguima Oct 1, 2024
51c7cd5
Fix first batch of unit tests
phvalguima Oct 2, 2024
73cd84b
Remove some dead LoCs because of commenting out lines of code
phvalguima Oct 2, 2024
7c47f20
Update to 1 instead of 1-all
phvalguima Oct 2, 2024
64f9bc9
Update the changes following feedback
phvalguima Oct 11, 2024
c856d3a
lint fix
phvalguima Oct 11, 2024
9cecd5f
Update more tests and fixes
phvalguima Oct 12, 2024
6a85319
Rollback the original internal_users.yml
phvalguima Oct 12, 2024
2d9ce65
Merge remote-tracking branch 'origin' into DPE-4115-performance-profiles
phvalguima Oct 12, 2024
8712a84
Merge branch '2/edge' into DPE-4115-performance-profiles
phvalguima Oct 13, 2024
5363de7
Simplify test_charm as changing to production profile causes a lot of…
phvalguima Oct 13, 2024
60292ed
Remove service started + add set_watermark to small deployment on plu…
phvalguima Oct 13, 2024
b63d02d
Moved test HA to set profile=staging
phvalguima Oct 13, 2024
51d92e7
Remove any profile change from int. tests; fix integration tests
phvalguima Oct 13, 2024
9fab851
Move to the Deployment Description
phvalguima Oct 15, 2024
bfe940e
Remove refs to template apply event
phvalguima Oct 15, 2024
9b50a3d
Fixes following review
phvalguima Oct 15, 2024
42f464d
Roll back internal_users
phvalguima Oct 15, 2024
a6ee6db
Add peer relation listener
phvalguima Oct 15, 2024
d53a4f7
Add _on_install hook
phvalguima Oct 15, 2024
b1e77b5
Add perf profile to track install event
phvalguima Oct 15, 2024
db0d13e
Check file exists on _on_install
phvalguima Oct 15, 2024
86331be
Fix peer cluster relation
phvalguima Oct 16, 2024
22fbe66
Fix the config-changed routine
phvalguima Oct 16, 2024
13082f0
Update the e.response_code on apply template routine
phvalguima Oct 16, 2024
b8c3c59
Fix the append scenarios in replace()
phvalguima Oct 16, 2024
7670d8e
Add multiline replace
phvalguima Oct 16, 2024
37d94d5
Merge remote-tracking branch 'origin/main' into DPE-4115-performance-…
phvalguima Oct 16, 2024
78deee7
Merge remote-tracking branch 'origin/DPE-5677-fix-replace-file-persis…
phvalguima Oct 16, 2024
59790ae
Add profile change in test_charm.py
phvalguima Oct 16, 2024
1038a9b
Remove repeated code
phvalguima Oct 16, 2024
1450e56
Fix path in helper_conf_setter.replace()
phvalguima Oct 16, 2024
5e12ca8
Update lib/charms/opensearch/v0/opensearch_base_charm.py
phvalguima Oct 16, 2024
b6af43e
Remove perf_profile from opensearch_distro
phvalguima Oct 16, 2024
4f46a44
Add post merge with upstream branch
phvalguima Oct 16, 2024
a4ae60f
Merge remote-tracking branch 'origin' into DPE-4115-performance-profiles
phvalguima Oct 16, 2024
7dbd1e3
Move profile to PeerClusterConfig
phvalguima Oct 17, 2024
ebc0e58
Add minor fixes for performance profile
phvalguima Oct 17, 2024
a001dd2
Set correct order in config-changed
phvalguima Oct 17, 2024
07f99a1
Add config-changed
phvalguima Oct 17, 2024
b2c27d4
Add peer relation support and restart logic
phvalguima Oct 17, 2024
e59a628
Merge remote-tracking branch 'origin' into DPE-4115-performance-profiles
phvalguima Oct 18, 2024
deae3c6
Remove non-testing profile
phvalguima Oct 18, 2024
345385b
Add support for upgrade from older versions
phvalguima Oct 18, 2024
6c72f33
Rollback unit test + fix refresh command
phvalguima Oct 18, 2024
7c6df55
Rollback to return None if peer relation not set
phvalguima Oct 18, 2024
cea3c29
Rollback the config-changed to use refresh_relation_data
phvalguima Oct 18, 2024
cd6b419
Fix return empty after upgrade
phvalguima Oct 18, 2024
7d5f232
Update profiles
phvalguima Oct 18, 2024
ebae15a
Add upgrade charm check
phvalguima Oct 18, 2024
0cd057d
Minor fixes for the upgrade + reviews
phvalguima Oct 18, 2024
6a076da
Simplify the peer-cluster event
phvalguima Oct 18, 2024
b6a2989
Remove the ^
phvalguima Oct 18, 2024
44014ff
Fix unit tests
phvalguima Oct 18, 2024
9e60288
Move away from refresh_relation_data
phvalguima Oct 18, 2024
0754ce6
Move to run()
phvalguima Oct 18, 2024
1b69290
Simplify current()
phvalguima Oct 18, 2024
0dcdcb8
fix lint
phvalguima Oct 18, 2024
960c1ef
Fix upgrade assert
phvalguima Oct 18, 2024
c1f6b32
fix lint
phvalguima Oct 18, 2024
60e829a
Update helpers.py
phvalguima Oct 19, 2024
9c20cd3
Add logging; update upgrade helper to dispatch arguments in the corre…
Oct 19, 2024
fc2146c
Update test_ha_multi_clusters.py
phvalguima Oct 20, 2024
4a8f5f0
Update test_ha_multi_clusters.py
phvalguima Oct 20, 2024
f481297
Update test_tls.py
phvalguima Oct 20, 2024
72b8517
Update test_ca_rotation.py
phvalguima Oct 20, 2024
a67e876
Fix: update config options across CI
phvalguima Oct 20, 2024
047579d
Move away from main orch. leadership
phvalguima Oct 21, 2024
cc12e1a
Readd the apply index-template bool
phvalguima Oct 21, 2024
e7af36d
Add custom event with property
phvalguima Oct 21, 2024
4419a66
Reorder self.current + apply perf. templates
phvalguima Oct 21, 2024
1df45a7
Update helper_conf_setter.py
phvalguima Oct 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,13 @@ options:
default: true
type: boolean
description: Enable opensearch-knn

profile:
type: string
default: "production"
Mehdi-Bendriss marked this conversation as resolved.
Show resolved Hide resolved
description: |
Profile representing the scope of deployment, and used to tune resource allocation.
Allowed values are: "production", "staging" or "testing"
Production will tune opensearch for maximum performance while default will tune for
minimal running performance.
Performance tuning is described on: https://opensearch.org/docs/latest/tuning-your-cluster/performance/
2 changes: 2 additions & 0 deletions lib/charms/opensearch/v0/constants_charm.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,3 +118,5 @@

# User-face Backup ID format
OPENSEARCH_BACKUP_ID_FORMAT = "%Y-%m-%dT%H:%M:%SZ"

PERFORMANCE_PROFILE = "profile"
5 changes: 3 additions & 2 deletions lib/charms/opensearch/v0/helper_conf_setter.py
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,6 @@ def replace(
output_file: Target file for the result config, by default same as config_file
"""
path = f"{self.base_path}{config_file}"

if not exists(path):
raise FileNotFoundError(f"{path} not found.")

Expand All @@ -291,8 +290,10 @@ def replace(

if output_file is None:
output_file = config_file
elif output_file != config_file:
path = f"{self.base_path}{output_file}"

with open(output_file, "w") as f:
with open(path, "w") as f:
f.write(data)

@override
Expand Down
73 changes: 73 additions & 0 deletions lib/charms/opensearch/v0/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@
LIBPATCH = 1


MIN_HEAP_SIZE = 1024 * 1024 # 1GB in KB
MAX_HEAP_SIZE = 32 * MIN_HEAP_SIZE # 32GB in KB


class Model(ABC, BaseModel):
"""Base model class."""

Expand Down Expand Up @@ -153,6 +157,14 @@ class DeploymentType(BaseStrEnum):
OTHER = "other"


class PerformanceType(BaseStrEnum):
"""Performance types available."""

PRODUCTION = "production"
STAGING = "staging"
TESTING = "testing"


class StartMode(BaseStrEnum):
"""Mode of start of units in this deployment."""

Expand Down Expand Up @@ -204,6 +216,10 @@ class PeerClusterConfig(Model):
cluster_name: str
init_hold: bool
roles: List[str]
# We have a breaking change in the model
# For older charms, this field will not exist and they will be set in the
# profile called "testing".
profile: Optional[PerformanceType] = PerformanceType.TESTING
data_temperature: Optional[str] = None

@root_validator
Expand Down Expand Up @@ -346,3 +362,60 @@ def promote_failover(self) -> None:
self.main_app = self.failover_app
self.main_rel_id = self.failover_rel_id
self.delete("failover")


class OpenSearchPerfProfile(Model):
"""Generates an immutable description of the performance profile."""

typ: PerformanceType
heap_size_in_kb: int = MIN_HEAP_SIZE
opensearch_yml: Dict[str, str] = {}
charmed_index_template: Dict[str, str] = {}
charmed_component_templates: Dict[str, str] = {}

@root_validator
def set_options(cls, values): # noqa: N805
"""Generate the attributes depending on the input."""
# Check if PerformanceType has been rendered correctly
# if an user creates the OpenSearchPerfProfile
if "typ" not in values:
raise AttributeError("Missing 'typ' attribute.")

if values["typ"] == PerformanceType.TESTING:
values["heap_size_in_kb"] = MIN_HEAP_SIZE
return values

mem_total = OpenSearchPerfProfile.meminfo()["MemTotal"]
mem_percent = 0.50 if values["typ"] == PerformanceType.PRODUCTION else 0.25

values["heap_size_in_kb"] = min(int(mem_percent * mem_total), MAX_HEAP_SIZE)

if values["typ"] != PerformanceType.TESTING:
values["opensearch_yml"] = {"indices.memory.index_buffer_size": "25%"}

values["charmed_index_template"] = {
"charmed-index-tpl": {
"index_patterns": ["*"],
phvalguima marked this conversation as resolved.
Show resolved Hide resolved
"template": {
"settings": {
"number_of_replicas": "1",
},
},
},
}

return values

@staticmethod
def meminfo() -> dict[str, float]:
"""Read the /proc/meminfo file and return the values.

According to the kernel source code, the values are always in kB:
https://github.com/torvalds/linux/blob/
2a130b7e1fcdd83633c4aa70998c314d7c38b476/fs/proc/meminfo.c#L31
"""
with open("/proc/meminfo") as f:
meminfo = f.read().split("\n")
meminfo = [line.split() for line in meminfo if line.strip()]

return {line[0][:-1]: float(line[1]) for line in meminfo}
58 changes: 41 additions & 17 deletions lib/charms/opensearch/v0/opensearch_base_charm.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

from charms.grafana_agent.v0.cos_agent import COSAgentProvider
from charms.opensearch.v0.constants_charm import (
PERFORMANCE_PROFILE,
AdminUser,
AdminUserInitProgress,
AdminUserNotConfigured,
Expand Down Expand Up @@ -49,7 +50,11 @@
generate_hashed_password,
generate_password,
)
from charms.opensearch.v0.models import DeploymentDescription, DeploymentType
from charms.opensearch.v0.models import (
DeploymentDescription,
DeploymentType,
PerformanceType,
)
from charms.opensearch.v0.opensearch_backups import backup
from charms.opensearch.v0.opensearch_config import OpenSearchConfig
from charms.opensearch.v0.opensearch_distro import OpenSearchDistribution
Expand All @@ -74,6 +79,7 @@
OpenSearchProvidedRolesException,
StartMode,
)
from charms.opensearch.v0.opensearch_performance_profile import OpenSearchPerformance
from charms.opensearch.v0.opensearch_plugin_manager import OpenSearchPluginManager
from charms.opensearch.v0.opensearch_plugins import OpenSearchPluginError
from charms.opensearch.v0.opensearch_relation_peer_cluster import (
Expand Down Expand Up @@ -246,6 +252,8 @@ def __init__(self, *args, distro: Type[OpenSearchDistribution] = None):
metrics_rules_dir="./src/alert_rules/prometheus",
log_slots=["opensearch:logs"],
)

self.performance_profile = OpenSearchPerformance(self)
# Ensure that only one instance of the `_on_peer_relation_changed` handler exists
# in the deferred event queue
self._is_peer_rel_changed_deferred = False
Expand Down Expand Up @@ -657,8 +665,19 @@ def _on_update_status(self, event: UpdateStatusEvent): # noqa: C901
# handle when/if certificates are expired
self._check_certs_expiration(event)

def trigger_restart(self):
"""Trigger a restart of the service."""
self._restart_opensearch_event.emit()

def _on_config_changed(self, event: ConfigChangedEvent): # noqa C901
"""On config changed event. Useful for IP changes or for user provided config changes."""
if not self.performance_profile.current:
# We are running (1) install or (2) an upgrade on instance that pre-dates profile
# First, we set this unit's effective profile -> 1G heap and no index templates.
# Our goal is to make sure this value exists once the refresh is finished
# and it represents the accurate value for this unit.
self.performance_profile.current = PerformanceType.TESTING

if self.opensearch_config.update_host_if_needed():
self.status.set(MaintenanceStatus(TLSNewCertsRequested))
self.tls.delete_stored_tls_resources()
Expand All @@ -680,27 +699,27 @@ def _on_config_changed(self, event: ConfigChangedEvent): # noqa C901
# handle cluster change to main-orchestrator (i.e: init_hold: true -> false)
self._handle_change_to_main_orchestrator_if_needed(event, previous_deployment_desc)

# todo: handle gracefully configuration setting at start of the charm
if not self.plugin_manager.check_plugin_manager_ready():
if self.upgrade_in_progress:
phvalguima marked this conversation as resolved.
Show resolved Hide resolved
# The following changes in _on_config_changed are not supported during an upgrade
# Therefore, we leave now
logger.warning(
"Changing config during an upgrade is not supported. The charm may be in a broken, "
"unrecoverable state"
)
event.defer()
return

perf_profile_needs_restart = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed, it will be overwritten with True or False in line 738.

plugin_needs_restart = False

try:
if not self.plugin_manager.check_plugin_manager_ready():
raise OpenSearchNotFullyReadyError()

if self.unit.is_leader():
self.status.set(MaintenanceStatus(PluginConfigCheck), app=True)

if self.plugin_manager.run():
if self.upgrade_in_progress:
logger.warning(
"Changing config during an upgrade is not supported. The charm may be in a broken, "
"unrecoverable state"
)
event.defer()
return

self._restart_opensearch_event.emit()
plugin_needs_restart = self.plugin_manager.run()
except (OpenSearchNotFullyReadyError, OpenSearchPluginError) as e:
if isinstance(e, OpenSearchNotFullyReadyError):
logger.warning("Plugin management: cluster not ready yet at config changed")
Expand All @@ -711,11 +730,16 @@ def _on_config_changed(self, event: ConfigChangedEvent): # noqa C901
# config-changed is called again.
if self.unit.is_leader():
self.status.clear(PluginConfigCheck, app=True)
return
else:
if self.unit.is_leader():
self.status.clear(PluginConfigCheck, app=True)
self.status.clear(PluginConfigChangeError, app=True)

if self.unit.is_leader():
self.status.clear(PluginConfigCheck, app=True)
self.status.clear(PluginConfigChangeError, app=True)
perf_profile_needs_restart = self.performance_profile.apply(
self.config.get(PERFORMANCE_PROFILE)
)
if plugin_needs_restart or perf_profile_needs_restart:
self._restart_opensearch_event.emit()

def _on_set_password_action(self, event: ActionEvent):
"""Set new admin password from user input or generate if not passed."""
Expand Down
21 changes: 20 additions & 1 deletion lib/charms/opensearch/v0/opensearch_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

from charms.opensearch.v0.constants_tls import CertType
from charms.opensearch.v0.helper_security import normalized_tls_subject
from charms.opensearch.v0.models import App
from charms.opensearch.v0.models import App, OpenSearchPerfProfile
from charms.opensearch.v0.opensearch_distro import OpenSearchDistribution

# The unique Charmhub library identifier, never change it
Expand Down Expand Up @@ -69,6 +69,25 @@ def set_client_auth(self):
"-Djdk.tls.client.protocols=TLSv1.2",
)

def apply_performance_profile(self, profile: OpenSearchPerfProfile):
"""Apply the performance profile to the opensearch config."""
self._opensearch.config.replace(
self.JVM_OPTIONS,
"-Xms[0-9]+[kmgKMG]",
f"-Xms{str(profile.heap_size_in_kb)}k",
regex=True,
)

self._opensearch.config.replace(
self.JVM_OPTIONS,
"-Xmx[0-9]+[kmgKMG]",
f"-Xmx{str(profile.heap_size_in_kb)}k",
regex=True,
)

for key, val in profile.opensearch_yml.items():
self._opensearch.config.put(self.CONFIG_YML, key, val)

def set_admin_tls_conf(self, secrets: Dict[str, any]):
"""Configures the admin certificate."""
self._opensearch.config.put(
Expand Down
3 changes: 3 additions & 0 deletions lib/charms/opensearch/v0/opensearch_peer_clusters.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,7 @@ def _user_config(self):
for option in self._charm.config.get("roles", "").split(",")
if option
],
profile=self._charm.performance_profile.current.typ.value,
)

def _new_cluster_setup(self, config: PeerClusterConfig) -> DeploymentDescription:
Expand Down Expand Up @@ -222,6 +223,7 @@ def _new_cluster_setup(self, config: PeerClusterConfig) -> DeploymentDescription
init_hold=config.init_hold,
roles=config.roles,
data_temperature=config.data_temperature,
profile=self._charm.performance_profile.current.typ.value,
),
start=start_mode,
pending_directives=directives,
Expand Down Expand Up @@ -270,6 +272,7 @@ def _existing_cluster_setup(
init_hold=prev_deployment.config.init_hold,
roles=config.roles,
data_temperature=config.data_temperature,
profile=self._charm.performance_profile.current.typ.value,
),
start=start_mode,
state=deployment_state,
Expand Down
Loading
Loading