ARROW-1252: [Website] Updates for 0.5.0 and short blog post summarizi…

…ng the release Also updated the CHANGELOG.md Author: Wes McKinney <[email protected]> Closes apache#885 from wesm/ARROW-1252 and squashes the following commits: e603f388 [Wes McKinney] Fix up markdown formatting of underscores 797215b2 [Wes McKinney] Release announcement blog post 3babc7b4 [Wes McKinney] Add release page 3fd41e11 [Wes McKinney] First cut revising install page b8416ee5 [Wes McKinney] Add changelog to CHANGELOG.md 3f9dec05 [Wes McKinney] Start on 0.5.0 website updates
milevin · Jul 25, 2017 · 540ca92 · 540ca92
1 parent ad8db9d
commit 540ca92
Show file tree

Hide file tree

Showing 7 changed files with 553 additions and 93 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
diff --git a/dev/make_changelog.py b/dev/make_changelog.py
@@ -74,6 +74,7 @@ def format_changelog_website(issues, out):
     CATEGORIES = {
         'New Feature': NEW_FEATURE,
         'Improvement': NEW_FEATURE,
+        'Wish': NEW_FEATURE,
         'Task': NEW_FEATURE,
         'Test': NEW_FEATURE,
         'Bug': BUGFIX

diff --git a/site/_posts/2017-07-24-0.5.0-release.md b/site/_posts/2017-07-24-0.5.0-release.md
@@ -0,0 +1,114 @@
+---
+layout: post
+title: "Apache Arrow 0.5.0 Release"
+date: "2017-07-25 00:00:00 -0400"
+author: wesm
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 0.5.0 release. It includes
+[**130 resolved JIRAs**][1] with some new features, expanded integration
+testing between implementations, and bug fixes. The Arrow memory format remains
+stable since the 0.3.x and 0.4.x releases.
+
+See the [Install Page][2] to learn how to get the libraries for your
+platform. The [complete changelog][5] is also available.
+
+## Expanded Integration Testing
+
+In this release, we added compatibility tests for dictionary-encoded data
+between Java and C++. This enables the distinct values (the *dictionary*) in a
+vector to be transmitted as part of an Arrow schema while the record batches
+contain integers which correspond to the dictionary.
+
+So we might have:
+
+```
+data (string): ['foo', 'bar', 'foo', 'bar']
+```
+
+In dictionary-encoded form, this could be represented as:
+
+```
+indices (int8): [0, 1, 0, 1]
+dictionary (string): ['foo', 'bar']
+```
+
+In upcoming releases, we plan to complete integration testing for the remaining
+data types (including some more complicated types like unions and decimals) on
+the road to a 1.0.0 release in the future.
+
+## C++ Activity
+
+We completed a number of significant pieces of work in the C++ part of Apache
+Arrow.
+
+### Using jemalloc as default memory allocator
+
+We decided to use [jemalloc][4] as the default memory allocator unless it is
+explicitly disabled. This memory allocator has significant performance
+advantages in Arrow workloads over the default `malloc` implementation. We will
+publish a blog post going into more detail about this and why you might care.
+
+### Sharing more C++ code with Apache Parquet
+
+We imported the compression library interfaces and dictionary encoding
+algorithms from the [Apache Parquet C++ library][3]. The Parquet library now
+depends on this code in Arrow, and we will be able to use it more easily for
+data compression in Arrow use cases.
+
+As part of incorporating Parquet's dictionary encoding utilities, we have
+developed an `arrow::DictionaryBuilder` class to enable building
+dictionary-encoded arrays iteratively. This can help save memory and yield
+better performance when interacting with databases, Parquet files, or other
+sources which may have columns having many duplicates.
+
+### Support for LZ4 and ZSTD compressors
+
+We added LZ4 and ZSTD compression library support. In ARROW-300 and other
+planned work, we intend to add some compression features for data sent via RPC.
+
+## Python Activity
+
+We fixed many bugs which were affecting Parquet and Feather users and fixed
+several other rough edges with normal Arrow use. We also added some additional
+Arrow type conversions: structs, lists embedded in pandas objects, and Arrow
+time types (which deserialize to the `datetime.time` type).
+
+In upcoming releases we plan to continue to improve [Dask][7] support and
+performance for distributed processing of Apache Parquet files with pyarrow.
+
+## The Road Ahead
+
+We have much work ahead of us to build out Arrow integrations in other data
+systems to improve their processing performance and interoperability with other
+systems.
+
+We are discussing the roadmap to a future 1.0.0 release on the [developer
+mailing list][6]. Please join the discussion there.
+
+[1]: https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.5.0
+[2]: http://arrow.apache.org/install
+[3]: http://github.com/apache/parquet-cpp
+[4]: https://github.com/jemalloc/jemalloc
+[5]: http://arrow.apache.org/release/0.5.0.html
+[6]: http://mail-archives.apache.org/mod_mbox/arrow-dev/
+[7]: http://github.com/dask/dask