Skip to content

Releases: rcbops/rpc-maas

1.7.2

01 May 15:55
101b2a7
Compare
Choose a tag to compare

Release Notes

1.7.2

New Features

  • If maas_rally is configured to write to an influxdb endpoint, a new metric (influxdb_success) and alarm will be created to generate alerts if writing to influxdb fails. A failure to write to influxdb is no longer fatal, which allows performance metrics to still be reported via the MaaS API even if the influxdb endpoint is unavailable.
  • Add ability to set the port used by the Ceph rados Gateway service. Use the radosgw_civetweb_port variable to set the port. This defaults to 8080 to match the ceph-ansible default, but the radosgw_civetweb_port variable must be set to the same value in your Ceph and MaaS configurations.
  • maas_rally now adds an 'influxdb_database' tag to influxdb datapoints, which allows for granular routing to different backend influxdb databases using telegraf.
  • The maas_rally task arguments are now read from the plugin's configuration file (/etc/rally/maas_rally.yml by default). This eliminates the need to look up things such as network uuids when running a performance scenario manually for troubleshooting purposes.

Upgrade Notes

  • Any custom scenarios or overrides setting non-default times and/or concurrency values will need to move these settings to the task_args dictionary.
  • Any configuration overrides of the extra_vars dictionary will need to rename the dictionary to task_args.
  • After running the maas-openstack-rally.yml playbook the rally_* checks in MaaS will fail until the agent is restarted and check definitions are updated.

Bug Fixes

  • Revert limiting enablement of rgw checks to first node of each group. This was an incorrect assumption.
    • Fixes endpoint handling to better support deployment in Kilo environments
    • Adjusts rabbitmq_status check to better handle missing RabbitMQ API data
    • Raises MaaS check timeout to 59 seconds, canonizing a de facto default
  • Properly validate logical volume status if HP volume is encrypted
  • fixed pip-10 introduced gating issue
  • Fix rate functions for swift_account_replication_check, swift_container_replication_check, and swift_object_replication_check.
  • openstacksdk has been temporarily pinned to <0.12.0 to work around changes that break maas_rally's resource cleanup

Other Notes

  • Improvements were made to maas_rally allow running the maas-openstack-rally.yml playbook without installing the MaaS agent. This supports use cases where rally performance scenarios need to be run without shipping metrics to the MaaS API.

1.7.1

28 Mar 15:29
765c204
Compare
Choose a tag to compare

Release Notes

1.7.1

New Features

  • Detailed logging was added to the maas_rally performance monitoring plugin.
  • Automatic stale lock and resource cleanup was added to maas_rally. This makes the plugin more robust and resiliant to transitory environmental problems.
  • A configurable quota factor was added to the maas_rally plugin. This allows resource cleanup and performance polling to run asynchronously.
  • The maas_rally plugin will now generate an alarm event when too many consecutive intervals (default=3) required cleanup of stale resources.
  • The maas_rally plugin will now generate an alarm event when too many consecutive intervals (default=3) were aborted waiting for immature locks.
  • A rally_diag.sh script is now deployed to all utility containers. This script helps support to quickly identify resources (instances, images, etc) that were created by maas_rally.

Bug Fixes

  • Limit ceph_cluster_stats and ceph_mons_stats checks to groups['mons'][0] and ceph_rgw_stats to groups['rgws'][0] to prevent duplicate alarms on ceph clusters.

    • Properly configure agent.plugin timeout value in plugin arguments.
    • Add override to swift-recon checks and include a parser for timeout in swift-recon.py.
  • Added more meaningful process info in neutron_ovs_agent alarm exception message.

  • Added a new status_err_no_exit function call to allow plugins like neutron_ ovs_agent_check.py to run its cause and report correct metrics

  • Fixed an exotic KeyError premature exit of the rabbitmq_status.py _get_node_metrics check path. (See https://core.rackspace.com/ticket/180307-12728 for reference)

  • Using the new status_err_no_exit function call to allow plugins to run its cause and report correct metrics

  • Fixed an exotic CalledProcessError premature exit of the swift quarantine check path. (See https://core.rackspace.com/ticket/180307-05355 for reference)

  • Using the new status_err_no_exit function call to allow plugins to run its cause and report correct metrics

  • Fixed an exotic KeyError premature exit of the rabbitmq_status check path.

    • Disable capacitive related checks: cinder_vg_check, ironic_capacity_check, and nova_cloud_stats_check.
    • Disable alarms for CDM checks on all hosts except groups['shared-infra_hosts']. This includes cpu_check, disk_utilisation, and memory_check.
    • Disable alarms for network_throughput across all hosts.

    * Changes to galera_check: * Limit enablement to groups['galera_all'][0].* Remove alarm for aborted_clients.
    * Changes to rabbitmq_status: * Limit enablement to groups['rabbitmq_all'][0].* Modify metric msgs_excl_notifications to sum messages from consumed queues only. * Add metric msgs_without_consumers to sum messages from unconsumed queues only.* Fix bug in rabbitmq_qgrowth_excl_notifications alarm removing the division by check period. This is automatically handled by the rate() function. * Restructure rabbitmq_queues_without_consumers alarm with rabbitmq_msgs_without_consumers. This will alarm if unconsumed messages reaches the default threshold of 20000.* Remove default var for unused maas_rabbitmq_queues_without_consumers_limit. * Update maas_rabbitmq_queued_messages_excluding_notifications_threshold to 5000.* Add maas_rabbitmq_messages_without_consumers_threshold, defaulting to 20000.

    • Update maas_swift_container_replication_avg_time_threshold from 50 to 300.

Other Notes

  • The user configured in openrc_os_username (admin by default) will be granted the admin role on each project created for maas_rally scenarios. This facilitates listing swift containers in the rally_diag.sh script.

1.7.0

01 Mar 14:10
200ee30
Compare
Choose a tag to compare

Release Notes

1.7.0

  • MaaS for Designate (initial stage)
  • Some Maas rally improvements: load plugin from class, alarm checks a now parameterized by (critical and warning), and config file location adjustment
  • Switches check now uses Octavia V2 API
  • Ceph gateways are more stable now.

1.6.0

01 Feb 11:24
Compare
Choose a tag to compare

Release Notes

1.6.0

No release notes

1.5.0

02 Jan 19:19
Compare
Choose a tag to compare

Release Notes

1.5.0

No release notes

1.4.0

01 Dec 09:57
Compare
Choose a tag to compare

Release Notes

1.4.0

No release notes

1.3.1

14 Nov 21:20
Compare
Choose a tag to compare

Release Notes

1.3.1

No release notes

1.3.0

01 Nov 09:17
Compare
Choose a tag to compare

Release Notes

1.3.0

No release notes

1.2.2: Merge pull request #379 from npawelek/add_poller_proxy_config

27 Oct 21:11
aaa614e
Compare
Choose a tag to compare
Add monitoring_proxy_url to poller configuration

Release 1.2.1

14 Sep 15:44
Compare
Choose a tag to compare
f685fba Fix ternary logic for setting holland_venv_bin
e66cd5c Allow holland to deploy on all rpc versions
924ae22 Fix typos in plugin and template
00e70d1 Fix issue with template population