Skip to content

druid-0.10.0

Compare
Choose a tag to compare
@gianm gianm released this 18 Apr 20:34
· 6899 commits to master since this release

Druid 0.10.0 contains hundreds of performance improvements, stability improvements, and bug fixes from over 40 contributors. Major new features include a built-in SQL layer, numeric dimensions, Kerberos authentication support, a revamp of the "index" task, a new "like" filter, large columns, ability to run the coordinator and overlord as a single service, better performing defaults, and eight new extensions.

If you are upgrading from a previous version of Druid, please see "Updating from 0.9.2 and earlier" below for upgrade notes, including some backwards incompatible changes.

The full list of changes is here: https://github.com/druid-io/druid/pulls?utf8=%E2%9C%93&q=is%3Apr%20is%3Aclosed%20milestone%3A0.10.0

Documentation for this release is at: http://druid.io/docs/0.10.0/

Highlights

Built-in SQL

Druid now includes a built-in SQL server powered by Apache Calcite. Druid provides two SQL APIs: HTTP POST and JDBC. This provides an alternative to Druid's native JSON API which is more familiar to new developers, and which makes it easier to integrate pre-existing applications that natively speak SQL. Not all Druid and SQL features are supported by the SQL layer in this initial release, but we intend to expand both in future releases.

SQL support can be enabled by setting druid.sql.enable=true in your configuration. See http://druid.io/docs/0.10.0/querying/sql.html for details and documentation.

Added in #3682 by @gianm.

Numeric dimensions

Druid now supports numeric dimensions at ingestion and query time. Users can ingest long and float columns as dimensions (i.e., treating the numeric columns as part of the ingestion-time grouping key instead of as aggregators, if rollup is enabled). Additionally, Druid queries can now accept any long or float column as a dimension for grouping or for filtering.

There are performance tradeoffs between string and numeric columns. Numeric columns are generally faster to group on than string columns. Numeric columns don't have indexes, so they are generally slower to filter on than string columns.

See http://druid.io/docs/0.10.0/ingestion/schema-design.html#numeric-dimensions for ingestion documentation and http://druid.io/docs/0.10.0/querying/dimensionspecs.html for query documentation.

Added in #3838, #3966, and other patches by @jon-wei.

Kerberos authentication support

Added a new extension named 'druid-kerberos' which adds support for User Authentication for Druid Nodes using Kerberos. It uses the simple and protected GSSAPI negotiation mechanism, SPNEGO(https://en.wikipedia.org/wiki/SPNEGO) for authentication via HTTP.

See http://druid.io/docs/0.10.0/development/extensions-core/druid-kerberos.html for documentation on how to configure kerberos authentication.

Added in #3853 by @nishantmonu51.

Index task revamp

The indexing task was re-written to improve performance, particularly for jobs spanning multiple intervals that generated many shards. The segmentGranularity intervals can now be automatically determined and no longer needs to be specified, but ingestion time can be reduced if both intervals and numShards are provided.

Additionally, the indexing task now supports an appendToExisting flag which causes the data to be indexed as an additional shard of the current version rather than as a new version overshadowing the previous version.

See http://druid.io/docs/0.10.0/ingestion/tasks.html#index-task for documentation.

Added in #3611 by @dclim.

Like filter

Druid now includes a "like" filter that enables SQL LIKE-style filtering, such as foo LIKE 'bar%'. The implementation is generally faster than regex filters, and is encouraged over regex filters when possible. In particular, like filters on prefixes such as bar% are significantly faster than equivalent regex filters such as ^bar.*.

See http://druid.io/docs/0.10.0/querying/filters.html#like-filter for documentation.

Added in #3642 by @gianm.

Large columns

Druid now supports individual columns larger than 2GB. This feature is not typically required, since general guidance is that segments should generally be 500MB–1GB in size, but is useful in situations where one column is much larger than all the others (for example, large sketches).

This functionality is available to all Druid users and no special configuration is necessary when using the built-in column types. If you have developed a custom metric column type as a Druid extension, you can enable large column support by overriding getSerializer in your ComplexMetricsSerde.

Added in #3743 by @akashdw.

Coordinator/Overlord combination option

Druid deployments can now be simplified by combining the Coordinator and Overlord functions into the Coordinator process. To do this, set druid.coordinator.asOverlord.enabled and druid.coordinator.asOverlord.overlordService appropriately on your Coordinators and then stop your Overlords.
Overlord console would be available on http://coordinator-host:port/console.html.

This is currently an experimental feature and is off by default. We intend to consider making this the default in a future version of Druid.

See http://druid.io/docs/0.10.0/configuration/coordinator.html for documentation on this feature and configuration options.

Added in #3711 by @himanshug.

Better performing defaults

This release changes two default settings to improve out-of-the-box performance:

  • The buildV9Directly option introduced in Druid 0.9.0 is now enabled by default. This option improves performance of indexing by creating the v9 data format directly rather than creating v8 first and then converting to v9. If necessary, you can roll back to the old code by setting "buildV9Directly" to false in your indexing tasks.

  • The v2 groupBy engine introduced in Druid 0.9.2 is now enabled by default. This new groupBy engine was rewritten from the ground up for better performance and memory management. If necessary, you can roll back to the old engine by setting either "druid.groupBy.query.defaultStrategy" in your runtime.properties, or "groupByStrategy" in your query context, to "v1". See http://druid.io/docs/0.10.0/querying/groupbyquery.html for details on the differences between groupBy v1 and v2.

Other performance improvements

In addition to better performing defaults, Druid 0.10.0 has a number of other performance improvements, including:

  • Concise bitset union, intersection, and iteration optimization (#3883) by @leventov
  • DimensionSelector-based value matching optimization (#3858) by @leventov
  • Search query strategy for choosing index-based vs. cursor-based execution (#3792) by @jihoonson
  • Bitset iteration optimization (#3753) by @leventov
  • GroupBy optimization for granularity "all" (#3740) by @gianm
  • Disable flush after every DefaultObjectMapper write (#3748) by @jon-wei
  • Short-circuiting AND filter (#3676) by @gianm
  • Improved performance of IndexMergerV9 (#3440) by @leventov

New extensions

And much more!

The full list of changes is here: https://github.com/druid-io/druid/pulls?utf8=%E2%9C%93&q=is%3Apr%20is%3Aclosed%20milestone%3A0.10.0

Updating from 0.9.2 and earlier

Please see below for changes between 0.9.2 and 0.10.0 that you should be aware of before upgrading. If you're updating from an earlier version than 0.9.2, please see release notes of the relevant intermediate versions for additional notes.

Rolling updates

The standard Druid update process described by http://druid.io/docs/0.10.0/operations/rolling-updates.html should be followed for rolling updates.

Query API changes

Please note the following backwards-incompatible query API changes when updating. Some queries may need to be adjusted to continue to behave as expected.

  • JavaScript query features are now disabled by default for security reasons (#3818). If you use these features, you can re-enable them by setting druid.javascript.enabled=true in your runtime properties. See http://druid.io/docs/0.10.0/development/javascript.html for details, including security considerations.

  • GroupBy queries no longer allow __time as the output name of a dimension, aggregator, or post-aggregator (#3967).

  • Select query pagingSpecs now default to fromNext: true behavior when fromNext is not specified (#3986). Behavior is unchanged for Select queries that did have fromNext specified. If you prefer the old default, then you can change this through the druid.query.select.enableFromNextDefault runtime property. See http://druid.io/docs/0.10.0/querying/select-query.html for details.

  • SegmentMetadata queries no longer include "size" analysis by default (#3773). You can still request "size" analysis by adding "size" to "analysisTypes" at query time.

Deployment and configuration changes

Please note the following deployment-related changes when updating.

  • Druid now requires Java 8 to run (#3914). If you are currently running on Java 7, we suggest upgrading Java first and then Druid.

  • Druid now defaults to the "v2" engine for groupBy rather than the legacy "v1" engine. As part of this, memory usage limits have changed from row-based to byte-based limits, so it is possible that some queries which met resource limits before will now exceed them and fail. You can avoid this by tuning the new groupBy engine appropriately. If necessary, you can roll back to the old engine by setting either "druid.groupBy.query.defaultStrategy" in your runtime.properties, or "groupByStrategy" in your query context, to "v1". See http://druid.io/docs/0.10.0/querying/groupbyquery.html for more details on the differences between groupBy v1 and v2.

    This new groupBy engine was rewritten from the ground up for better performance and memory management. The query API and results are compatible between the two engines; however, there are some differences from a cluster configuration perspective: (1) groupBy v2 uses primarily off-heap memory instead of on-heap memory; (2) it requires one merge buffer for each concurrent query; (3) if you've configured caching for groupBy, it must be on historicals; and (4) it doesn't support chunkPeriod.

  • Druid now has a non-zero default value for druid.processing.numMergeBuffers (#3953). This will increase the amount of direct memory used in its default configuration. This change was made in connection with changing the default groupBy engine, since the new "v2" engine requires merge buffers. If you were formerly using groupBy v1, you should be able to offset this by reducing your JVM heap, since groupBy v2 uses off-heap memory rather than on-heap memory.

  • Druid query-related processes (broker, historical, indexing tasks) now eagerly allocate druid.processing.numThreads number of DirectByteBuffer instances of size druid.processing.buffer.sizeBytes for query processing at startup (#3628). This allocation was lazy in earlier versions of Druid. This change does not affect memory use of a long-running Druid cluster, but it means that memory for processing buffers must be available when Druid starts up.

  • When running in non-UTC timezones, behavior of predefined segmentGranularity constants such as "day" will change when upgrading. Formerly, this would generate segments using local days. In Druid 0.10.0, this will generate segments using UTC days. If you were running Druid in the UTC timezone, this change has no effect. Please note that running Druid in a non-UTC timezone is not officially supported (see http://druid.io/docs/0.10.0/configuration/index.html) and we recommend always running all processes in UTC timezones. To create local-day segments using Druid 0.10.0, you can use a local-time segmentGranularity such as:

"segmentGranularity" : {
     "type" : "period",
     "period" : "P1D",
     "timeZone" : "America/Los_Angeles"
}

Extension changes

  • AggregatorFactory, BufferAggregator, and PostAggregator interfaces have changed (#3894, #3899, #3957, #4071). If you have deployed a custom extension that includes an AggregatorFactory or PostAggregator, it will need to be recompiled. Druid's built-in extensions have all been updated for this change.

  • Extensions targeting Druid 0.10.x must be compiled with JDK 8.

Credits

Thanks to everyone who contributed to this release!

@akashdw
@Aveplatter
@b-slim
@baruchz
@cheddar
@clintropolis
@DaimonPl
@dclim
@dpenas
@drcrallen
@du00cs
@erikdubbelboer
@Fokko
@freakyzoidberg
@GabrielPage
@gianm
@gvsmirnov
@himanshug
@hland
@hzy001
@jaehc
@jihoonson
@jisookim0513
@jkukul
@joanvr
@jon-wei
@kaijianding
@leventov
@mark1900
@michaelschiff
@navis
@ncolomer
@niketh
@nishantmonu51
@pjain1
@praveev
@sirpkt
@tranv94
@xiaoyao1991
@yuusaku-t
@zhxiaogg