Skip to content

Releases: Netflix/atlas

v1.5.3-rc.1

05 Jun 22:25
Compare
Choose a tag to compare
v1.5.3-rc.1 Pre-release
Pre-release

Primary changes:

  • #613, fix possible deadlock on startup (thanks to @childofsoong for reporting).

A comprehensive list of changes can be found in the commit log: v1.5.2...v1.5.3-rc.1

v1.5.2

03 May 14:51
Compare
Choose a tag to compare

Primary changes:

  • #482, fix bug with equality check on SmallHashMap
  • #487, fix possible IAE if NaN values are in sorted set
  • #498, honor the order setting for default sort
  • #515, fix bounds exception for Integer.MIN_VALUE hash
  • #526, improve performance of SmallHashMap.{+,-}
  • #527, additional build metadata in jar manifest
  • #532, fix prefix for cloudwatch sqs metrics

A comprehensive list of changes can be found in the commit log: v1.5.1...v1.5.2

v1.6.0-rc.4

03 May 20:10
Compare
Choose a tag to compare
v1.6.0-rc.4 Pre-release
Pre-release

Not recommended for general use. First RC with streaming eval lib.

A comprehensive list of changes can be found in the commit log: v1.6.0-rc.3...v1.6.0-rc.4

v1.6.0-rc.3

26 Feb 22:39
Compare
Choose a tag to compare
v1.6.0-rc.3 Pre-release
Pre-release

Not recommended for general use. First RC after switching from Spray to Akka-HTTP.

A comprehensive list of changes can be found in the commit log: v1.6.0-rc.2...v1.6.0-rc.3

v1.6.0-rc.2

01 Feb 04:54
Compare
Choose a tag to compare
v1.6.0-rc.2 Pre-release
Pre-release

Not recommended for general use. Checkpoint before #490.

A comprehensive list of changes can be found in the commit log: v1.6.0-rc.1...v1.6.0-rc.2

v1.6.0-rc.1

05 Jan 17:11
Compare
Choose a tag to compare
v1.6.0-rc.1 Pre-release
Pre-release

Not recommended for general use.

A comprehensive list of changes can be found in the commit log: v1.5.0...v1.6.0-rc.1

v1.5.1

15 Nov 21:33
Compare
Choose a tag to compare

Primary changes:

  • #450, update to spray 1.3.4 for security fix
  • #451, fix port setting in sample memory.conf
  • #453, fix regex matching when using end anchor

A comprehensive list of changes can be found in the commit log: v1.5.0...v1.5.1

v1.5.0

29 Oct 16:09
Compare
Choose a tag to compare

Primary changes:

Query Enhancements

There have been several additions to the stack language.

Percentiles

There is now a :percentiles operator (#291). This is used to provide a fairly accurate estimate of the percentile for timers and distribution summaries. If the data is reported correctly then the backend can then create an estimate for arbitrary percentiles while still being able to slice and dice by the tags.

Prior to this we would mostly use percentiles computed locally and then have to apply simple aggregates on the percentiles, such as average or max of the 99th percentile per node. That leads to results that are dubious at best.

Sliding DES

Sliding DES is an windowed variant of DES that is deterministic (#222). In particular, it makes it easier to use historical data to understand:

  1. Why did an alarm fire?
  2. When would an alarm have fired?

See the :sdes docs for more information.

Multi-Level Group By

Additional groupings can now be performed (#265). The primary use-case is to allow mixed aggregations such as finding the average CPU by cluster and asg then finding the max of those averages by cluster. Sample usage:

name,ssCpuUser,:eq,:avg,(,nf.cluster,nf.asg,),:by,
:max,(,nf.cluster,),:by

The key list for the subsequent group by must be a subset of the previous.

Head

The :head operator allows the first N lines resulting from a group by (#138). This will likely be further optimized in later versions to allow for efficient sampling over large group by results.

Graph Rendering

New image rendering (#160, #191). Replaced the fork of RRD4j with custom rendering engine for the PNG images. This allowed us to fix a number of long standing issues and gives us flexibility to make future usability improvements since compatibility with RRD is not a requirement for us. Key changes:

Multiple Time Zones

The X-axis can now show multiple time zones (#198).

multi-zone

Sorting Graph Legend

The graph legend can now be sorted (#349). Previously this was only available via the dynamic charts in the internal UI.

DefaultMax Descending
default max descending

Exact Size Images

In previous versions,the width and height params were for the plot canvas area. The actual image size would be automatically calculated to fit the other details like legends without squishing them and making the graph unusable. Exact sizes can now be used and the layout is responsive so some elements can be suppressed if the provided dimensions are too small (#203).

For more details see the graph layout docs.

Warnings

Indicate if options cannot be satisfied or there was some other degradation.

warning

Anonymization

In prior versions there was only_graph, but sometimes that takes it too far. In particular, it also removes the time axis which might be useful in some contexts where you don't want to share other precise numbers. There is now improved support for anonymizing the data in images (#202). This can be useful for presentations or communicating with external support. Example:

anonymous

Zoom

There is now a zoom parameter that can be used on image (#161). This is primarily used to get clearer images on retina displays by specifying zoom=2 and scaling to half the size in the browser.

Y-Axis Offset Labels

Tick labels on the Y-axis will now use an offset format if there is a small delta with a large base value. In 1.4.x this would result in duplicate tick labels.

yoffset-labels

Image Metadata

If enabled, then images will contain metadata so tools can retrieve the original graph URL and exact times if the image is relative (#307). The primary use-case for Netflix is providing information to bots when graph images are put into chat during an incident. The metadata allows links to be generated so others can get the same view and investigate further. It also helps for archiving and capturing the relevant data for the incident for later review. To enable the following property needs to be set:

atlas.webapi.graph.png-metadata-enabled=true

Then the graph URL will be added as the Source field and the exact times will be specified in the Description field. These can then be accessed with libraries or tools for fetching image metadata such as exiftool:

$ exiftool example.png | grep -E 'Source|Description'
Source        : http://localhost:7101/api/v1/graph?q=name,sps,:eq,(,nf.cluster,),:by
Description   : start=2016-11-16T10:52:00Z, end=2016-11-16T13:52:00Z

Standard fields are used so the metadata can also be seen using common viewers such as the inspector in Preview:

preview

Misc

JSON With Rendering Metadata

There is now a v2.json output format that includes enough metadata to precisely recreate the image (#298). This allows dynamic client side rendering to match the images so that transitions are much less jarring for the user. This helps to minimize confusion and delay when working on production issues.

Query Index

Adds index for efficiently checking a datapoint agains many queries (#258). The primary index for the backend makes it fast to find all of the data points or time series that match a given query based on the associated tags. For a number of use-cases we need to go in the other direction, such as:

  • Streaming expression evaluation. To reduce mean time to detect (MTTD) many alerting use-cases are now moving to run the query expressions online as the data flows through rather than against the main backend after it has been indexed. We need to quickly find the matching expressions for each data point as they are coming in.
  • Unused data reports. To help keep costs under control we generate reports of data that is unused so we can identify large volumes of data that are not being queried. For this use case we generate a query index based on access logs and then use map reduce jobs to check each data point for matching queries in the index.

Configure Reserved Keys

Reserved keys are tag prefixes restricted for use by the infrastructure rather than the user providing metrics. At Netflix we restrict two prefixes:

  • atlas: used for tags that Atlas uses to determine behavior.
  • nf: common infrastructure tags automatically applied by the internal plugin based on the deployment context.

The set of reserved keys can now be configured via the atlas.webapi.publish.rules setting rather than being forced to use the default set we use at Netflix. See the reference.conf for an example.

Grafana

There has been some community work on integration with Grafana (#289).

Others

v1.4.7

01 Sep 16:14
Compare
Choose a tag to compare

Primary changes:

  • Fix response content type of partial failures to publish endpoint (#431).

A comprehensive list of changes can be found in the commit log: v1.4.6...v1.4.7

v1.5.0-rc.9

30 Jul 21:50
Compare
Choose a tag to compare
v1.5.0-rc.9 Pre-release
Pre-release

Verify that release publishing still works after build changes. Not recommended for general use.

A comprehensive list of changes can be found in the commit log: v1.5.0-rc.8...v1.5.0-rc.9