Releases: aerospike/aerospike-monitoring
Aerospike Monitoring v2.8.0
Description
NOTE: The v2.8.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- This release includes 1 major feature - Connector Dashboard, Alerts and topology.
- Aerospike Monitoring Stack version 2.8.0 adds 2 dashboard, alerts and bug fixes:
- 2 dashboards to monitor connectors and connector JVM metrics.
- Enhanced alerts to cover various aspects of Connector key metric thresholds and JVM health.
NOTE:
- Aerospike Prometheus exporter 1.13.0 or greater must be used to get the Aerospike 6.4 metrics.
- The Multi-Cluster View dashboard now requires the Diagram Panel plugin.
Features
- [OM-64] - Create predefined Prometheus alert rules for Connectors.
- This release include 6 alerts to cover mandatory functional and process/health of the Connectors.
- Key alerts covered are connector-status, connector-request-lag, connector-request-errors, jvm heap, jvm cpu and jvm gc.
- This release include 6 alerts to cover mandatory functional and process/health of the Connectors.
- [OM-56] - Connectors alerts & Dashboards
- Connector view dashboard which helps to monitor 6 connectors.
- Connectors supported are - xdr-proxy, kafka-outboud, pulsar-outbound, esp-outbound, elastic-search and jms-outbound.
- Key metrics covered are - request lag, request error, success, skipped, connections, xdr record byte size, etc....
- Connector view dashboard which helps to monitor 6 connectors.
- [OM-107] - Create a dashboard for a Connector(s)
- Connector JVM view dashboard which helps to monitor JVM health of 6 Connectors.
- Connectors supported are - xdr-proxy, kafka-outboud, pulsar-outbound, esp-outbound, elastic-search and jms-outbound.
- Key metrics covered are - uptime, cpu, memory, threads, files, classes and buffers.
- Multi-cluster view dashboard is enhanced to display Aerospike Server topology using the cluster-name and xdr dc configurations.
NOTE:- To view data replication topology in multi-cluster-view.
- The cluster-name is mandatory and destination cluster-name is configured as the name of dc in xdr section of the Aerospike Server configuration.
- Connector JVM view dashboard which helps to monitor JVM health of 6 Connectors.
Fixes
- [OM-122] - Avoid duplicate defrag metric values on the namespace dashboard.
- [OM-113] - Namespace view dashboard - average objects per sprig stat.
- [OM-120] - Add high-water mark breached to the Rolling Restart dashboard.
Aerospike Monitoring v2.7.0
Description
NOTE: The v2.7.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- This release includes 2 major features - Enhanced Alerts and All Flash use-case dashboard.
- Aerospike Monitoring Stack version 2.7.0 adds new dashboard and bug fixes:
- All Flash dashboard, various key metrics which should be monitored while working with flash storage at both index and sindex.
- Enhanced alerts to cover various aspects of server metrics, this release covers alerts on Namespaces, XDR, Latencies, Best checks, Node-exporter etc...
NOTE:
- Aerospike Prometheus exporter 1.13.0 or greater must be used to get the Aerospike 6.4 metrics.
Features
- [OM-104] - Add new XDR bytes-shipped metrics to dashboards.
- Display bytes-shipped both as stat and time-series which can help monitoring the replication progress.
- [OM-98] - Observability & Management Alerts - Enhance / enrich prometheus alerts from ACMS.
- This release includes 40 alerts covering various metrics of Aerospike Server, some key areas are:
- Namespaces, Latencies, data replication (xdr), set, node-exporter, flash , best checks etc...
- This release includes 40 alerts covering various metrics of Aerospike Server, some key areas are:
- [OM-93] - Use-case Dashboard: all-flash.
- A new use-case dashboard is introduced in this release, this dashboard focuses mainly on key metrics and alerts related to flash usage.
- Some key metrics are average-objects per sprig, index-pressure, primary index flash and secondary index flash etc...
- A new use-case dashboard is introduced in this release, this dashboard focuses mainly on key metrics and alerts related to flash usage.
- [OM-48] - Use-case Dashboard Organization & Naming.
- Added brief descriptions on each dashboard and updated tags to identify each dashboard easily.
- [OM-111] - Observability dashboard unit tests.
- Created a framework to test our dashboard automatically including panels, expression / queries, layout and expression results.
- [OM-103] - Add user stat related alerts.
- Added user stat specific alerts covering connections, connection churn etc...
- [OM-101] - Add warning for best practice failures.
- Alerts if best-practices are not followed while setting up the Aerospike server, this flag is sent by the server after a series of checks.
- [OM-102] - Add warning for node-exporter not being present.
- As a precursor to integrate node-exporter metrics into Aerospike Monitoring stack, this alert is introduced if node-exporter is not configured, raising a warning alert in the Alerts View dashboard.
Aerospike Monitoring v2.6.1
Description
- The v2.6.1 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- Aerospike Monitoring Stack version 2.6.1 adds bug fixes.
NOTE:
- Aerospike Prometheus exporter 1.12.0 or greater must be used to get the Aerospike 6.3 metrics.
- Deprecated
- Existing Alerts dashboard is deprecated and will be removed in future releases.
- Existing Jobs dashboard is deprecated and will be removed in future releases.
Fixes
- [OM-100]- Issues in Multi-cluster view dashboard
-- Corrected label and unit in XDR panel.
-- Corrected links from XDR and Latencies to respective dashboards (instead of cluster-view).
-- Added a alert-severity based filter. - Issues in Alerts view
-- Panel colors are corrected according to the severity types. - Issues in Unique Data view
-- Unique data bytes are not shown correctly when custom labels are enabled in configuration.
-- Added historical time-series for unique data-bytes data point.
Aerospike Monitoring v2.6.0
Description
NOTE: The v2.6.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- This release eliminates instances of hard coded values for variables. As a result, the user needs to ensure that the Aerospike Prometheus data source is selected as a default in order for dashboard data to populate correctly.
- Aerospike Monitoring Stack version 2.6.0 adds new dashboard and bug fixes
- Rolling restarts dashboard, various key metrics which should be monitored during specific use cases
- Alerts View dashboard, adopting more meaningful alert severity levels
NOTE:
- Aerospike Prometheus exporter 1.12.0 or greater must be used to get the Aerospike 6.3 metrics.
- Deprecated
- Existing Alerts dashboard is deprecated and will be removed in future releases.
- Existing Jobs dashboard is deprecated and will be removed in future releases.
Features
-
OM-79 - Rolling Restarts dashboard, data is shown in group like stats, error and resources.
- This dashboard curates various key metrics which should be monitored during specific use cases, like
- Node restart
- Software upgrade
- Investigation
- etc...
- Resource utilization is displayed for the TopK major consumers at a service and namespace level.
- This dashboard curates various key metrics which should be monitored during specific use cases, like
-
OM-85 - Added the new Alerts view dashboard. This visualizes alerts according to the severity as count and each alert.
- Newly adopted alert levels in decreasing order
critical
,error
,warn
andinfo
.
- This dashboard replaces the existing Alerts dashboard.
- Newly adopted alert levels in decreasing order
-
OM-82 - All Aerospike dashboards and panel visualizations are modified according to the Grafana 9.x version.
-
OM-49 - Improved and reorganized Aerospike Monitoring stack examples
- Reorganized docker compose file in relevant folder.
- Added examples on how to use AeroLab which can spin up Aerospike clusters per Proof of Concept (POC) needs.
Fixes
- OM-82 - Includes bug fixes related to queries and visualizations
- All queries now include proper regex pattern to honor single or multiple value template variable selection.
- All Time-Series are adjusted to use range vector.
- All dashboard have standardized template variable and same order.
Aerospike Monitoring v2.5.0
Description
NOTE: The v2.5.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- Aerospike Monitoring Stack version 2.5.0 adds the new Multi Cluster view dashboard, Otel integration examples and bug fixes.
NOTE: Aerospike Prometheus exporter 1.11.0 or greater must be used to get the Aerospike 6.3 metrics.
Features
-
OM-45 - Added the new Multi cluster view dashboard. This visualizes multiple clusters across regions and data centers with a focus on health. This dashboard consists of 4 panels.
- Geomap panel - displays multiple cluster view.
- Cluster panel - displays key metrics like size, alerts, XDR lag, Read & Write latencies.
- Node panel - uses the Polystat plugin and displays nodes in Green or Red indicating the health.
- Namespace panel - displays namespaces in Green or Red indicating the health.
Key metrics used in this dashboard
-aerospike_node_up
-aerospike_namespace_objects
-aerospike_node_stats_cluster_size
-aerospike_xdr_lag
-aerospike_latencies_write_ms_bucket
-aerospike_latencies_read_ms_bucket
-
OM-60 - Added new examples on how to integrate Aerospike prometheus exporter with the Otel collector and export metrics to a partner solution
partner integration examples are provided for NewRelic, Datadog and Cloudwatch.
Fixes
- OM-76 - In the Namespace dashboard, the Defrag row hides anomalies as a result of aggregation.
- Removed the Defrag row, as aggregation is removed and moved from the defrag panels to the namespace row to display defrag metrics for each namespace.
Aerospike Monitoring v2.4.0
Description
NOTE: the v2.4.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- Aerospike Monitoring Stack version 2.4.0 adds support for metrics introduced in Aerospike 6.3.
NOTE: Aerospike Prometheus exporter 1.11.0 or greater must be used to get the Aerospike 6.3 metrics.
Features
- OM-62 - Added defrag metrics to the namespace view dashboard.
- Adds
aerospike_namespace_storage_engine_defrag_lwm_pct
. - Adds
aerospike_namespace_storage_engine_file_defrag_q
. - Adds
aerospike_namespace_storage_engine_device_defrag_q
. - Adds
aerospike_namespace_storage_engine_file_defrag_reads
. - Adds
aerospike_namespace_storage_engine_device_defrag_reads
. - Adds
aerospike_namespace_storage_engine_file_defrag_writes
. - Adds
aerospike_namespace_storage_engine_device_defrag_writes
.
- Adds
Fixes
- OM-61 - In namespace view dashboard NSUP Cycle is summed, instead of showing max/average.
- Fixed NSUP metrics panel to show maximum and average
aerospike_namespace_nsup_cycle_duration
. - Fixed NSUP metrics panel to show maximum and average
aerospike_namespace_nsup_cycle_deleted_pct
.
- Fixed NSUP metrics panel to show maximum and average
- OM-22 - Migration summary doubles up in cluster view dashboard.
- Fixed migration metrics panel to show
aerospike_namespace_migrate_rx_partitions_remaining
andaerospike_namespace_migrate_tx_partitions_remaining
separately.
- Fixed migration metrics panel to show
Aerospike Monitoring v2.3.1
Fixes
- [OM-37] - Issues in Set view, Unique data view, Sindex view, Namespace view and Node view:
- Fixed issue in "Set view" dashboard to remove hardcoded datasource.
- Re-exported Set view, Unique data view, Sindex view, Namespace view and Node view dashboards with right configurations so they are suitable to be made available in Grafana Cloud.
Aerospike Monitoring v2.3.0
Description
NOTE: the v2.3.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- Aerospike Monitoring Stack version 2.3.0 adds support for metrics introduced in Aerospike 6.3.
NOTE: Aerospike Prometheus exporter 1.10.0 or greater must be used to get the Aerospike 6.3 metrics.
Features
- Added 6.3 metrics:
- Adds
aerospike_sindex_used_bytes
secondary index metric. - Adds
aerospike_namespace_nsup_cycle_deleted_pct
NSUP metric. - Adds
aerospike_sets_stop_writes_size
set level configuration.
- Adds
- Updated memory used panel in secondary index to consider
aerospike_sindex_used_bytes
oraerospike_sindex_memory_used
asaerospike_sindex_memory_used
is deprecated in Aerospike 6.3. - Added nsup metrics panel to Namespace view dashboard.
- Added set level quotas panel to Namespace view dashboard.
- Added a new dashboard displaying set level metrics.
- Added a new dashboard displaying unique data usage.
- Added 4 new prometheus alerts:
NamespaceSupervisorFallingBehind
when NSUP is falling behind and/or display the length of time the most recent NSUP cycle lasted.NamespaceFreeMemoryCloseToStopWrites
when one of your Aerospike nodes memory is close to the stop writes limit configured for a namespace.NamespaceSetQuotaWarning
when one of your Aerospike nodes is at 80% of the quota you have configured on a set.NamespaceSetQuotaAlert
when one of your Aerospike nodes is at 99% of the quota you have configured on a set.
Aerospike Monitoring v2.2.0
Description
NOTE: the v2.2.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- Aerospike Monitoring Stack version 2.2.0 adds support for metriics introduced in Aerospike 6.1.
NOTE: Aerospike Prometheus exporter 1.8.0 or greater must be used to get the Aerospike 6.1 metrics.
Features
- [TOOLS-2087] - Add server 6.1 metrics.
- Adds aerospike_xdr_bytes_shipped.
- Adds aerospike_sindex_entries_per_bval.
- Adds aerospike_sindex_entries_per_rec.
- [TOOLS-2132] Replace latency panels with heat map and percentiles.
Aerospike Monitoring v2.1.0
Description
NOTE: the v2.1.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- Add support for the batch-index latency metrics aerospike_latencies_batch_index_us_bucket and aerospike_latencies_batch_index_us_count.
NOTE: Aerospike Prometheus exporter 1.7.0 or greater must be used to get the batch-index latency metrics.
Features
- [TOOLS-2069] - Add batch-index latency panels.