forked from apache/datafusion
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ARROW-1252: [Website] Updates for 0.5.0 and short blog post summarizi…
…ng the release Also updated the CHANGELOG.md Author: Wes McKinney <[email protected]> Closes apache#885 from wesm/ARROW-1252 and squashes the following commits: e603f388 [Wes McKinney] Fix up markdown formatting of underscores 797215b2 [Wes McKinney] Release announcement blog post 3babc7b4 [Wes McKinney] Add release page 3fd41e11 [Wes McKinney] First cut revising install page b8416ee5 [Wes McKinney] Add changelog to CHANGELOG.md 3f9dec05 [Wes McKinney] Start on 0.5.0 website updates
- Loading branch information
Showing
7 changed files
with
553 additions
and
93 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
--- | ||
layout: post | ||
title: "Apache Arrow 0.5.0 Release" | ||
date: "2017-07-25 00:00:00 -0400" | ||
author: wesm | ||
categories: [release] | ||
--- | ||
<!-- | ||
{% comment %} | ||
Licensed to the Apache Software Foundation (ASF) under one or more | ||
contributor license agreements. See the NOTICE file distributed with | ||
this work for additional information regarding copyright ownership. | ||
The ASF licenses this file to you under the Apache License, Version 2.0 | ||
(the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
{% endcomment %} | ||
--> | ||
|
||
The Apache Arrow team is pleased to announce the 0.5.0 release. It includes | ||
[**130 resolved JIRAs**][1] with some new features, expanded integration | ||
testing between implementations, and bug fixes. The Arrow memory format remains | ||
stable since the 0.3.x and 0.4.x releases. | ||
|
||
See the [Install Page][2] to learn how to get the libraries for your | ||
platform. The [complete changelog][5] is also available. | ||
|
||
## Expanded Integration Testing | ||
|
||
In this release, we added compatibility tests for dictionary-encoded data | ||
between Java and C++. This enables the distinct values (the *dictionary*) in a | ||
vector to be transmitted as part of an Arrow schema while the record batches | ||
contain integers which correspond to the dictionary. | ||
|
||
So we might have: | ||
|
||
``` | ||
data (string): ['foo', 'bar', 'foo', 'bar'] | ||
``` | ||
|
||
In dictionary-encoded form, this could be represented as: | ||
|
||
``` | ||
indices (int8): [0, 1, 0, 1] | ||
dictionary (string): ['foo', 'bar'] | ||
``` | ||
|
||
In upcoming releases, we plan to complete integration testing for the remaining | ||
data types (including some more complicated types like unions and decimals) on | ||
the road to a 1.0.0 release in the future. | ||
|
||
## C++ Activity | ||
|
||
We completed a number of significant pieces of work in the C++ part of Apache | ||
Arrow. | ||
|
||
### Using jemalloc as default memory allocator | ||
|
||
We decided to use [jemalloc][4] as the default memory allocator unless it is | ||
explicitly disabled. This memory allocator has significant performance | ||
advantages in Arrow workloads over the default `malloc` implementation. We will | ||
publish a blog post going into more detail about this and why you might care. | ||
|
||
### Sharing more C++ code with Apache Parquet | ||
|
||
We imported the compression library interfaces and dictionary encoding | ||
algorithms from the [Apache Parquet C++ library][3]. The Parquet library now | ||
depends on this code in Arrow, and we will be able to use it more easily for | ||
data compression in Arrow use cases. | ||
|
||
As part of incorporating Parquet's dictionary encoding utilities, we have | ||
developed an `arrow::DictionaryBuilder` class to enable building | ||
dictionary-encoded arrays iteratively. This can help save memory and yield | ||
better performance when interacting with databases, Parquet files, or other | ||
sources which may have columns having many duplicates. | ||
|
||
### Support for LZ4 and ZSTD compressors | ||
|
||
We added LZ4 and ZSTD compression library support. In ARROW-300 and other | ||
planned work, we intend to add some compression features for data sent via RPC. | ||
|
||
## Python Activity | ||
|
||
We fixed many bugs which were affecting Parquet and Feather users and fixed | ||
several other rough edges with normal Arrow use. We also added some additional | ||
Arrow type conversions: structs, lists embedded in pandas objects, and Arrow | ||
time types (which deserialize to the `datetime.time` type). | ||
|
||
In upcoming releases we plan to continue to improve [Dask][7] support and | ||
performance for distributed processing of Apache Parquet files with pyarrow. | ||
|
||
## The Road Ahead | ||
|
||
We have much work ahead of us to build out Arrow integrations in other data | ||
systems to improve their processing performance and interoperability with other | ||
systems. | ||
|
||
We are discussing the roadmap to a future 1.0.0 release on the [developer | ||
mailing list][6]. Please join the discussion there. | ||
|
||
[1]: https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.5.0 | ||
[2]: http://arrow.apache.org/install | ||
[3]: http://github.com/apache/parquet-cpp | ||
[4]: https://github.com/jemalloc/jemalloc | ||
[5]: http://arrow.apache.org/release/0.5.0.html | ||
[6]: http://mail-archives.apache.org/mod_mbox/arrow-dev/ | ||
[7]: http://github.com/dask/dask |
Oops, something went wrong.