Releases: rcbops/rpc-maas
1.7.2
Release Notes
1.7.2
New Features
- If maas_rally is configured to write to an influxdb endpoint, a new metric (influxdb_success) and alarm will be created to generate alerts if writing to influxdb fails. A failure to write to influxdb is no longer fatal, which allows performance metrics to still be reported via the MaaS API even if the influxdb endpoint is unavailable.
- Add ability to set the port used by the Ceph rados Gateway service. Use the
radosgw_civetweb_port
variable to set the port. This defaults to8080
to match theceph-ansible
default, but theradosgw_civetweb_port
variable must be set to the same value in your Ceph and MaaS configurations. - maas_rally now adds an 'influxdb_database' tag to influxdb datapoints, which allows for granular routing to different backend influxdb databases using telegraf.
- The maas_rally task arguments are now read from the plugin's configuration file (/etc/rally/maas_rally.yml by default). This eliminates the need to look up things such as network uuids when running a performance scenario manually for troubleshooting purposes.
Upgrade Notes
- Any custom scenarios or overrides setting non-default times and/or concurrency values will need to move these settings to the task_args dictionary.
- Any configuration overrides of the extra_vars dictionary will need to rename the dictionary to task_args.
- After running the maas-openstack-rally.yml playbook the rally_* checks in MaaS will fail until the agent is restarted and check definitions are updated.
Bug Fixes
- Revert limiting enablement of rgw checks to first node of each group. This was an incorrect assumption.
-
- Fixes endpoint handling to better support deployment in Kilo environments
-
- Adjusts rabbitmq_status check to better handle missing RabbitMQ API data
-
- Raises MaaS check timeout to 59 seconds, canonizing a de facto default
- Properly validate logical volume status if HP volume is encrypted
- fixed pip-10 introduced gating issue
- Fix rate functions for swift_account_replication_check, swift_container_replication_check, and swift_object_replication_check.
- openstacksdk has been temporarily pinned to <0.12.0 to work around changes that break maas_rally's resource cleanup
Other Notes
- Improvements were made to maas_rally allow running the maas-openstack-rally.yml playbook without installing the MaaS agent. This supports use cases where rally performance scenarios need to be run without shipping metrics to the MaaS API.
1.7.1
Release Notes
1.7.1
New Features
- Detailed logging was added to the maas_rally performance monitoring plugin.
- Automatic stale lock and resource cleanup was added to maas_rally. This makes the plugin more robust and resiliant to transitory environmental problems.
- A configurable quota factor was added to the maas_rally plugin. This allows resource cleanup and performance polling to run asynchronously.
- The maas_rally plugin will now generate an alarm event when too many consecutive intervals (default=3) required cleanup of stale resources.
- The maas_rally plugin will now generate an alarm event when too many consecutive intervals (default=3) were aborted waiting for immature locks.
- A rally_diag.sh script is now deployed to all utility containers. This script helps support to quickly identify resources (instances, images, etc) that were created by maas_rally.
Bug Fixes
-
Limit ceph_cluster_stats and ceph_mons_stats checks to groups['mons'][0] and ceph_rgw_stats to groups['rgws'][0] to prevent duplicate alarms on ceph clusters.
-
- Properly configure agent.plugin timeout value in plugin arguments.
- Add override to swift-recon checks and include a parser for timeout in swift-recon.py.
-
Added more meaningful process info in neutron_ovs_agent alarm exception message.
-
Added a new status_err_no_exit function call to allow plugins like neutron_ ovs_agent_check.py to run its cause and report correct metrics
-
Fixed an exotic KeyError premature exit of the rabbitmq_status.py _get_node_metrics check path. (See https://core.rackspace.com/ticket/180307-12728 for reference)
-
Using the new status_err_no_exit function call to allow plugins to run its cause and report correct metrics
-
Fixed an exotic CalledProcessError premature exit of the swift quarantine check path. (See https://core.rackspace.com/ticket/180307-05355 for reference)
-
Using the new status_err_no_exit function call to allow plugins to run its cause and report correct metrics
-
Fixed an exotic KeyError premature exit of the rabbitmq_status check path.
-
- Disable capacitive related checks: cinder_vg_check, ironic_capacity_check, and nova_cloud_stats_check.
- Disable alarms for CDM checks on all hosts except groups['shared-infra_hosts']. This includes cpu_check, disk_utilisation, and memory_check.
- Disable alarms for network_throughput across all hosts.
* Changes to galera_check: * Limit enablement to groups['galera_all'][0].* Remove alarm for aborted_clients.
* Changes to rabbitmq_status: * Limit enablement to groups['rabbitmq_all'][0].* Modify metric msgs_excl_notifications to sum messages from consumed queues only. * Add metric msgs_without_consumers to sum messages from unconsumed queues only.* Fix bug in rabbitmq_qgrowth_excl_notifications alarm removing the division by check period. This is automatically handled by the rate() function. * Restructure rabbitmq_queues_without_consumers alarm with rabbitmq_msgs_without_consumers. This will alarm if unconsumed messages reaches the default threshold of 20000.* Remove default var for unused maas_rabbitmq_queues_without_consumers_limit. * Update maas_rabbitmq_queued_messages_excluding_notifications_threshold to 5000.* Add maas_rabbitmq_messages_without_consumers_threshold, defaulting to 20000.- Update maas_swift_container_replication_avg_time_threshold from 50 to 300.
Other Notes
- The user configured in openrc_os_username (admin by default) will be granted the admin role on each project created for maas_rally scenarios. This facilitates listing swift containers in the rally_diag.sh script.
1.7.0
Release Notes
1.7.0
- MaaS for Designate (initial stage)
- Some Maas rally improvements: load plugin from class, alarm checks a now parameterized by (critical and warning), and config file location adjustment
- Switches check now uses Octavia V2 API
- Ceph gateways are more stable now.
1.6.0
Release Notes
1.6.0
No release notes
1.5.0
Release Notes
1.5.0
No release notes
1.4.0
Release Notes
1.4.0
No release notes
1.3.1
Release Notes
1.3.1
No release notes
1.3.0
Release Notes
1.3.0
No release notes
1.2.2: Merge pull request #379 from npawelek/add_poller_proxy_config
Add monitoring_proxy_url to poller configuration
Release 1.2.1
f685fba Fix ternary logic for setting holland_venv_bin
e66cd5c Allow holland to deploy on all rpc versions
924ae22 Fix typos in plugin and template
00e70d1 Fix issue with template population