-
Notifications
You must be signed in to change notification settings - Fork 55
chore: patch the heat liveness probes #973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The heat liveness probes are failing too fast, especially when the cluster is busy. This change extends the liveness probes so that they're not creating a problem when the cluster is under load. Signed-off-by: Kevin Carter <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the liveness probe settings for the heat-api and heat-cfn deployments to prevent premature probe failures when the cluster is busy.
- Extends the liveness probe timeout and delay settings for heat-api.
- Extends the liveness probe timeout and delay settings for heat-cfn.
aedan
approved these changes
May 7, 2025
awfabian-rs
pushed a commit
that referenced
this pull request
May 14, 2025
The heat liveness probes are failing too fast, especially when the cluster is busy. This change extends the liveness probes so that they're not creating a problem when the cluster is under load. Signed-off-by: Kevin Carter <[email protected]> (cherry picked from commit f9d39ba)
awfabian-rs
pushed a commit
that referenced
this pull request
May 14, 2025
The heat liveness probes are failing too fast, especially when the cluster is busy. This change extends the liveness probes so that they're not creating a problem when the cluster is under load. Signed-off-by: Kevin Carter <[email protected]> (cherry picked from commit f9d39ba)
rackerchris
pushed a commit
to rackerchris/genestack
that referenced
this pull request
Jun 10, 2025
The heat liveness probes are failing too fast, especially when the cluster is busy. This change extends the liveness probes so that they're not creating a problem when the cluster is under load. Signed-off-by: Kevin Carter <[email protected]> (cherry picked from commit f9d39ba)
the2hill
added a commit
that referenced
this pull request
Jun 16, 2025
* OSPC-1087: compress OVN backups before uploading to swift container (#964) (cherry picked from commit 7c629c6) * fix: Updating mysql exporter to better monitor slave status (#965) (cherry picked from commit 54f8606) * fix: Reverting prometheus override pruning to resolve issues (#966) The pruned overrides were installing grafana and not allowing monitoring to be discovered. Revert that until we shake it down throughly. (cherry picked from commit 87e6b2f) * chore: Adding initial mariadb alerts focused on replication status (#967) (cherry picked from commit 4391bdb) * docs: Add tip about creating the creator role (#963) The default example mapping includes the "creator" role, but it does not exist by default after completing most of the OpenStack setup. To avoid errors when logging in with a valid Rackspace username/password, manually create the "creator" role. Without it, Keystone will return the following error: ERROR keystone.auth.plugins.mapped [None-4e390453-680f-4a5d-a315-2a0ac7693033 - - - - - -] Role creator was specified in the mapping but does not exist. All roles specified in a mapping must exist before assignment. (cherry picked from commit ece447d) * OSPC-1285: Disable local log storage for amphora (#972) (cherry picked from commit 8d6918d) * chore: Adding pod state alerts to better track restart failures (#969) (cherry picked from commit 2212522) * chore: patch the heat liveness probes (#973) The heat liveness probes are failing too fast, especially when the cluster is busy. This change extends the liveness probes so that they're not creating a problem when the cluster is under load. Signed-off-by: Kevin Carter <[email protected]> (cherry picked from commit f9d39ba) * fix: add random mac to all physical interfaces created (#971) This change will ensure that our OVN setup is consistent and functional at scale when running with multiple provider networks. This change allows the ovn setup to generate a unique mac address per-physical interface name using the hostname + interface name as the seed. By setting the hostname and interface name as the seed, we'll ensure that the mac generated is unique per-host, but consistent should the setup tools ever be rerun. Signed-off-by: Kevin Carter <[email protected]> (cherry picked from commit 2b6b1d9) * chore: Docs status page autorefresh (#974) (cherry picked from commit ef18ca1) * OSPC-1046: Prune OVNDB backups in Swift (#968) (cherry picked from commit a7db40b) * build(deps): bump the pip group across 1 directory with 2 updates (#948) * build(deps): bump the pip group across 1 directory with 2 updates Bumps the pip group with 2 updates in the / directory: [cryptography](https://github.com/pyca/cryptography) and [python-openstackclient](https://docs.openstack.org/python-openstackclient/latest/). Updates `cryptography` from 43.0.1 to 44.0.1 - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](pyca/cryptography@43.0.1...44.0.1) Updates `python-openstackclient` from 6.2.0 to 6.3.0 --- updated-dependencies: - dependency-name: cryptography dependency-version: 44.0.1 dependency-type: direct:production dependency-group: pip - dependency-name: python-openstackclient dependency-version: 6.3.0 dependency-type: direct:production dependency-group: pip ... Signed-off-by: dependabot[bot] <[email protected]> * Update requirements.txt only rev python-openstackclient. --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kevin Carter <[email protected]> (cherry picked from commit 9737b8e) * feat: add per-pr smoke tests (#935) This change will build a simple three node cloud environment on-top of openstack flex in dfw. The job will build, evaluate, and return the state of the build upon completion. The hyperconverged lab script is used to run the base test. The new job will also ensure that it cleans up resources built no matter the state of the build. Signed-off-by: Kevin Carter <[email protected]> (cherry picked from commit fc99a98) * FIX (ansible): cinder deployment playbooks need to use inventory_hostname (#977) This fix replace ansible_fqdn that relies on the order of the 127.0.0.1 line in the /etc/hosts file and is prone to using 'localhost.localdomain' rather than the actual hostname in cinder.conf. Using inventory_hostname ensures that the actual hostname is used. Tested in new Flex deploy--correct behavior observed with change. (cherry picked from commit 9ed2914) * feat: Added policies for rabbitmq (#975) * Added policies for rabbitmq These policies limit quorum queues replication to 3. * Rabbit Policies distributed (cherry picked from commit 02309b0) * OSPC-908 Standardize subdirectory structure for services in base-kustomize (#970) It updates the directory structure for remaining base-kustomize services to follow standardized pattern where each service contains a subdirectory named - base. Also, changes the docs, workflows and references to reflect the new structure. (cherry picked from commit 78ca512) * chore: (python-openstackclient) update minimium version (#980) (cherry picked from commit 61b0814) * feat: Add endpoints.yaml to hyperconverged lab script to enable external OpenStack Access (#986) (cherry picked from commit 165d081) * feat: Update .original-images.json (#983) Added fluent-bit image (cherry picked from commit 3019741) * fix: Pin helm version and update keystone image (#988) * fix(helm): Pin helm version to 3.17.3 Helm version 3.18.0 was released a few days ago and leads to (at the very least) malformatted DB_CONNECTION strings which was noticed when the db-sync job was failing with a traceback for keystone. Jira: OSPC-1330 * fix(keystone): Update to latest keystone-rxt image This fixes 84977a9 which adds a new image for keystone that supports rbac. Though it also includes another bump to the image to address a few minor changes to the rbac mapping. Note: After this is merged, we will need to update keystone-helm-overrides to include the new image from Quay after it has built and uploaded successfully. (cherry picked from commit a9caa99) * Add barbican-exporter (#990) (cherry picked from commit 80b6254) * Update release-glance.yml (#994) correct workflow name for glance image (cherry picked from commit 5552e70) * Update release-nova-oslodb.yaml (#995) (cherry picked from commit 8b58aa6) * Update release-octavia-ovn.yml (#999) (cherry picked from commit 7ecddc2) * Update release-horizon-rxt.yml (#998) (cherry picked from commit 895adf4) * Update release-neutron-oslodb.yaml (#996) (cherry picked from commit 95c30fa) * feat: add ability to run adhoc builds (#976) Signed-off-by: Kevin Carter <[email protected]> (cherry picked from commit c22cafe) * feat: (hpa) convert to using cpu metrics with mem metrics (#982) (cherry picked from commit 5f020eb) * Limit memcached to control plane placement (#1002) The memcached helm chart now places memcached on the openstack controlplane using the label `openstack-control-plane=enabled`. Also statefulset are removed to primarily serve from memory and limit the min replica set to 1 as default setting. (cherry picked from commit 4a5c78b) * Update network config example (#1001) - The netplan config is updated to use standard bridges as common example - Add clarifying comments of what to configure for kubeovn (cherry picked from commit 5fb6c01) * Documentation updates to increase readability (#1004) - Adding full paths for executables - Cleanup secrets generation Depends-On: #1001 (cherry picked from commit a1be3b3) * Add Masakari to Genestack (#1007) * Add Masakari to Genestack * Add Masakari to Genestack (cherry picked from commit 439524f) * Docs: Add Decommission Cinder Block Node Process doc to Op Guide (#1013) Added Decom process doc FIXED: mkdocs.yml spacing. Line 275 list items needed indented with two more spaces (cherry picked from commit 12421c7) * fix: (hpa) rework hpa values based on minReplicas=2 (#1014) * fix: (hpa) rework hpa values based on minReplicas=1 * fix: (hpa) increase minReplicas to 2 for fault tolerence * fix: (hpa) clean up some yaml lint errors * fix: (hpa) remove extra metrics * fix: (hpa) remove commented out pre-committ testing config --------- Co-authored-by: root <[email protected]> (cherry picked from commit 9770188) * Feat: Add predictable iscsi initiator name to Ansible playbooks (#1016) This update protects against the default ISO install initiator name being used. It also protects against any cloud image type installs that set GenerateName=yes prior to a restart of iscsid service. It predictably sets a unique initiator name. IF the initator name is already set correctly, no action is taken. (cherry picked from commit ae6a18c) * feat: add SAML federation support (#989) This change documents the process of setting up federation support within Keystone and Skyline. Documentation has been added highlighting how to setup SAML using Auth-0 as an example. Additional examples to be added later. Depends-On: #992 Signed-off-by: Kevin Carter <[email protected]> (cherry picked from commit 27e3a7d) * feat: add new keystone images to support federation (#992) Signed-off-by: Kevin Carter <[email protected]> (cherry picked from commit 633c68d) * fix: restore prometheus-helm-overrides.yaml to original --------- Signed-off-by: Kevin Carter <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: niti6869 <[email protected]> Co-authored-by: phillip.toohill <[email protected]> Co-authored-by: Luke Repko <[email protected]> Co-authored-by: Kevin Carter <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dan With <[email protected]> Co-authored-by: Jake Briggs <[email protected]> Co-authored-by: Gaurav-t <[email protected]> Co-authored-by: Ken Crandall <[email protected]> Co-authored-by: Pratik Bandarkar <[email protected]> Co-authored-by: ALEXIS CARBILLET <[email protected]> Co-authored-by: Bjoern Teipel <[email protected]> Co-authored-by: Zain <[email protected]> Co-authored-by: root <[email protected]>
the2hill
pushed a commit
that referenced
this pull request
Jun 17, 2025
The heat liveness probes are failing too fast, especially when the cluster is busy. This change extends the liveness probes so that they're not creating a problem when the cluster is under load. Signed-off-by: Kevin Carter <[email protected]> (cherry picked from commit f9d39ba)
the2hill
added a commit
that referenced
this pull request
Jun 17, 2025
* OSPC-1087: compress OVN backups before uploading to swift container (#964) (cherry picked from commit 7c629c6) * fix: Updating mysql exporter to better monitor slave status (#965) (cherry picked from commit 54f8606) * fix: Reverting prometheus override pruning to resolve issues (#966) The pruned overrides were installing grafana and not allowing monitoring to be discovered. Revert that until we shake it down throughly. (cherry picked from commit 87e6b2f) * chore: Adding initial mariadb alerts focused on replication status (#967) (cherry picked from commit 4391bdb) * docs: Add tip about creating the creator role (#963) The default example mapping includes the "creator" role, but it does not exist by default after completing most of the OpenStack setup. To avoid errors when logging in with a valid Rackspace username/password, manually create the "creator" role. Without it, Keystone will return the following error: ERROR keystone.auth.plugins.mapped [None-4e390453-680f-4a5d-a315-2a0ac7693033 - - - - - -] Role creator was specified in the mapping but does not exist. All roles specified in a mapping must exist before assignment. (cherry picked from commit ece447d) * OSPC-1285: Disable local log storage for amphora (#972) (cherry picked from commit 8d6918d) * chore: Adding pod state alerts to better track restart failures (#969) (cherry picked from commit 2212522) * chore: patch the heat liveness probes (#973) The heat liveness probes are failing too fast, especially when the cluster is busy. This change extends the liveness probes so that they're not creating a problem when the cluster is under load. Signed-off-by: Kevin Carter <[email protected]> (cherry picked from commit f9d39ba) * fix: add random mac to all physical interfaces created (#971) This change will ensure that our OVN setup is consistent and functional at scale when running with multiple provider networks. This change allows the ovn setup to generate a unique mac address per-physical interface name using the hostname + interface name as the seed. By setting the hostname and interface name as the seed, we'll ensure that the mac generated is unique per-host, but consistent should the setup tools ever be rerun. Signed-off-by: Kevin Carter <[email protected]> (cherry picked from commit 2b6b1d9) * chore: Docs status page autorefresh (#974) (cherry picked from commit ef18ca1) * OSPC-1046: Prune OVNDB backups in Swift (#968) (cherry picked from commit a7db40b) * build(deps): bump the pip group across 1 directory with 2 updates (#948) * build(deps): bump the pip group across 1 directory with 2 updates Bumps the pip group with 2 updates in the / directory: [cryptography](https://github.com/pyca/cryptography) and [python-openstackclient](https://docs.openstack.org/python-openstackclient/latest/). Updates `cryptography` from 43.0.1 to 44.0.1 - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](pyca/cryptography@43.0.1...44.0.1) Updates `python-openstackclient` from 6.2.0 to 6.3.0 --- updated-dependencies: - dependency-name: cryptography dependency-version: 44.0.1 dependency-type: direct:production dependency-group: pip - dependency-name: python-openstackclient dependency-version: 6.3.0 dependency-type: direct:production dependency-group: pip ... Signed-off-by: dependabot[bot] <[email protected]> * Update requirements.txt only rev python-openstackclient. --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kevin Carter <[email protected]> (cherry picked from commit 9737b8e) * feat: add per-pr smoke tests (#935) This change will build a simple three node cloud environment on-top of openstack flex in dfw. The job will build, evaluate, and return the state of the build upon completion. The hyperconverged lab script is used to run the base test. The new job will also ensure that it cleans up resources built no matter the state of the build. Signed-off-by: Kevin Carter <[email protected]> (cherry picked from commit fc99a98) * FIX (ansible): cinder deployment playbooks need to use inventory_hostname (#977) This fix replace ansible_fqdn that relies on the order of the 127.0.0.1 line in the /etc/hosts file and is prone to using 'localhost.localdomain' rather than the actual hostname in cinder.conf. Using inventory_hostname ensures that the actual hostname is used. Tested in new Flex deploy--correct behavior observed with change. (cherry picked from commit 9ed2914) * feat: Added policies for rabbitmq (#975) * Added policies for rabbitmq These policies limit quorum queues replication to 3. * Rabbit Policies distributed (cherry picked from commit 02309b0) * OSPC-908 Standardize subdirectory structure for services in base-kustomize (#970) It updates the directory structure for remaining base-kustomize services to follow standardized pattern where each service contains a subdirectory named - base. Also, changes the docs, workflows and references to reflect the new structure. (cherry picked from commit 78ca512) * chore: (python-openstackclient) update minimium version (#980) (cherry picked from commit 61b0814) * feat: Add endpoints.yaml to hyperconverged lab script to enable external OpenStack Access (#986) (cherry picked from commit 165d081) * feat: Update .original-images.json (#983) Added fluent-bit image (cherry picked from commit 3019741) * fix: Pin helm version and update keystone image (#988) * fix(helm): Pin helm version to 3.17.3 Helm version 3.18.0 was released a few days ago and leads to (at the very least) malformatted DB_CONNECTION strings which was noticed when the db-sync job was failing with a traceback for keystone. Jira: OSPC-1330 * fix(keystone): Update to latest keystone-rxt image This fixes 84977a9 which adds a new image for keystone that supports rbac. Though it also includes another bump to the image to address a few minor changes to the rbac mapping. Note: After this is merged, we will need to update keystone-helm-overrides to include the new image from Quay after it has built and uploaded successfully. (cherry picked from commit a9caa99) * Add barbican-exporter (#990) (cherry picked from commit 80b6254) * Update release-glance.yml (#994) correct workflow name for glance image (cherry picked from commit 5552e70) * Update release-nova-oslodb.yaml (#995) (cherry picked from commit 8b58aa6) * Update release-octavia-ovn.yml (#999) (cherry picked from commit 7ecddc2) * Update release-horizon-rxt.yml (#998) (cherry picked from commit 895adf4) * Update release-neutron-oslodb.yaml (#996) (cherry picked from commit 95c30fa) * feat: add ability to run adhoc builds (#976) Signed-off-by: Kevin Carter <[email protected]> (cherry picked from commit c22cafe) * feat: (hpa) convert to using cpu metrics with mem metrics (#982) (cherry picked from commit 5f020eb) * Limit memcached to control plane placement (#1002) The memcached helm chart now places memcached on the openstack controlplane using the label `openstack-control-plane=enabled`. Also statefulset are removed to primarily serve from memory and limit the min replica set to 1 as default setting. (cherry picked from commit 4a5c78b) * Update network config example (#1001) - The netplan config is updated to use standard bridges as common example - Add clarifying comments of what to configure for kubeovn (cherry picked from commit 5fb6c01) * Documentation updates to increase readability (#1004) - Adding full paths for executables - Cleanup secrets generation Depends-On: #1001 (cherry picked from commit a1be3b3) * Add Masakari to Genestack (#1007) * Add Masakari to Genestack * Add Masakari to Genestack (cherry picked from commit 439524f) * Docs: Add Decommission Cinder Block Node Process doc to Op Guide (#1013) Added Decom process doc FIXED: mkdocs.yml spacing. Line 275 list items needed indented with two more spaces (cherry picked from commit 12421c7) * fix: (hpa) rework hpa values based on minReplicas=2 (#1014) * fix: (hpa) rework hpa values based on minReplicas=1 * fix: (hpa) increase minReplicas to 2 for fault tolerence * fix: (hpa) clean up some yaml lint errors * fix: (hpa) remove extra metrics * fix: (hpa) remove commented out pre-committ testing config --------- Co-authored-by: root <[email protected]> (cherry picked from commit 9770188) * Feat: Add predictable iscsi initiator name to Ansible playbooks (#1016) This update protects against the default ISO install initiator name being used. It also protects against any cloud image type installs that set GenerateName=yes prior to a restart of iscsid service. It predictably sets a unique initiator name. IF the initator name is already set correctly, no action is taken. (cherry picked from commit ae6a18c) * feat: add SAML federation support (#989) This change documents the process of setting up federation support within Keystone and Skyline. Documentation has been added highlighting how to setup SAML using Auth-0 as an example. Additional examples to be added later. Depends-On: #992 Signed-off-by: Kevin Carter <[email protected]> (cherry picked from commit 27e3a7d) * feat: add new keystone images to support federation (#992) Signed-off-by: Kevin Carter <[email protected]> (cherry picked from commit 633c68d) * fix: restore prometheus-helm-overrides.yaml to original --------- Signed-off-by: Kevin Carter <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: niti6869 <[email protected]> Co-authored-by: phillip.toohill <[email protected]> Co-authored-by: Luke Repko <[email protected]> Co-authored-by: Kevin Carter <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dan With <[email protected]> Co-authored-by: Jake Briggs <[email protected]> Co-authored-by: Gaurav-t <[email protected]> Co-authored-by: Ken Crandall <[email protected]> Co-authored-by: Pratik Bandarkar <[email protected]> Co-authored-by: ALEXIS CARBILLET <[email protected]> Co-authored-by: Bjoern Teipel <[email protected]> Co-authored-by: Zain <[email protected]> Co-authored-by: root <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The heat liveness probes are failing too fast, especially when the cluster is busy. This change extends the liveness probes so that they're not creating a problem when the cluster is under load.