ARROW-1353: [Website] Update website for 0.6.0 release and add short …

…release blog post Author: Wes McKinney <[email protected]> Closes apache#967 from wesm/ARROW-1353 and squashes the following commits: 804fe35 [Wes McKinney] Escape underscores in CHANGELOG.md 1b7c4b6 [Wes McKinney] Finish 0.6.0 blog post a78cb94 [Wes McKinney] Some updates for 0.6.0 site update
milevin · Aug 16, 2017 · 5bf07cf · 5bf07cf
1 parent 0faa17c
commit 5bf07cf
Show file tree

Hide file tree

Showing 6 changed files with 399 additions and 21 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -17,13 +17,119 @@
   under the License.
 -->
 
+# Apache Arrow 0.6.0 (14 August 2017)
+
+## Bug
+
+* ARROW-1192 - [JAVA] Improve splitAndTransfer performance for List and Union vectors
+* ARROW-1195 - [C++] CpuInfo doesn't get cache size on Windows
+* ARROW-1204 - [C++] lz4 ExternalProject fails in Visual Studio 2015
+* ARROW-1225 - [Python] pyarrow.array does not attempt to convert bytes to UTF8 when passed a StringType
+* ARROW-1237 - [JAVA] Expose the ability to set lastSet
+* ARROW-1239 - issue with current version of git-commit-id-plugin
+* ARROW-1240 - security: upgrade logback to address CVE-2017-5929
+* ARROW-1242 - [Java] security - upgrade Jackson to mitigate 3 CVE vulnerabilities
+* ARROW-1245 - [Integration] Java Integration Tests Disabled
+* ARROW-1248 - [Python] C linkage warnings in Clang with public Cython API
+* ARROW-1249 - [JAVA] Expose the fillEmpties function from Nullable<Varlength>Vector.mutator
+* ARROW-1263 - [C++] CpuInfo should be able to get CPU features on Windows
+* ARROW-1265 - [Plasma] Plasma store memory leak warnings in Python test suite
+* ARROW-1267 - [Java] Handle zero length case in BitVector.splitAndTransfer
+* ARROW-1269 - [Packaging] Add Windows wheel build scripts from ARROW-1068 to arrow-dist
+* ARROW-1275 - [C++] Default static library prefix for Snappy should be "\_static"
+* ARROW-1276 - Cannot serializer empty DataFrame to parquet
+* ARROW-1283 - [Java] VectorSchemaRoot should be able to be closed() more than once
+* ARROW-1285 - PYTHON: NotImplemented exception creates empty parquet file
+* ARROW-1287 - [Python] Emulate "whence" argument of seek in NativeFile
+* ARROW-1290 - [C++] Use array capacity doubling in arrow::BufferBuilder
+* ARROW-1291 - [Python] `pa.RecordBatch.from_pandas` doesn't accept DataFrame with numeric column names
+* ARROW-1294 - [C++] New Appveyor build failures
+* ARROW-1296 - [Java] templates/FixValueVectors reset() method doesn't set allocationSizeInBytes correctly
+* ARROW-1300 - [JAVA] Fix ListVector Tests
+* ARROW-1306 - [Python] Encoding? issue with error reporting for `parquet.read_table`
+* ARROW-1308 - [C++] ld tries to link `arrow_static` even when -DARROW_BUILD_STATIC=off
+* ARROW-1309 - [Python] Error inferring List type in `Array.from_pandas` when inner values are all None
+* ARROW-1310 - [JAVA] Revert ARROW-886
+* ARROW-1312 - [C++] Set default value to `ARROW_JEMALLOC` to OFF until ARROW-1282 is resolved
+* ARROW-1326 - [Python] Fix Sphinx build in Travis CI
+* ARROW-1327 - [Python] Failing to release GIL in `MemoryMappedFile._open` causes deadlock
+* ARROW-1328 - [Python] `pyarrow.Table.from_pandas` option `timestamps_to_ms` changes column values
+* ARROW-1330 - [Plasma] Turn on plasma tests on manylinux1
+* ARROW-1335 - [C++] `PrimitiveArray::raw_values` has inconsistent semantics re: offsets compared with subclasses
+* ARROW-1338 - [Python] Investigate non-deterministic core dump on Python 2.7, Travis CI builds
+* ARROW-1340 - [Java] NullableMapVector field doesn't maintain metadata
+* ARROW-1342 - [Python] Support strided array of lists
+* ARROW-1343 - [Format/Java/C++] Ensuring encapsulated stream / IPC message sizes are always a multiple of 8
+* ARROW-1350 - [C++] Include Plasma source tree in source distribution
+* ARROW-187 - [C++] Decide on how pedantic we want to be about exceptions
+* ARROW-276 - [JAVA] Nullable Value Vectors should extend BaseValueVector instead of BaseDataValueVector
+* ARROW-573 - [Python/C++] Support ordered dictionaries data, pandas Categorical
+* ARROW-884 - [C++] Exclude internal classes from documentation
+* ARROW-932 - [Python] Fix compiler warnings on MSVC
+* ARROW-968 - [Python] RecordBatch [i:j] syntax is incomplete
+
+## Improvement
+
+* ARROW-1093 - [Python] Fail Python builds if flake8 yields warnings
+* ARROW-1121 - [C++] Improve error message when opening OS file fails
+* ARROW-1140 - [C++] Allow optional build of plasma
+* ARROW-1149 - [Plasma] Create Cython client library for Plasma
+* ARROW-1173 - [Plasma] Blog post for Plasma
+* ARROW-1211 - [C++] Consider making `default_memory_pool()` the default for builder classes
+* ARROW-1213 - [Python] Enable s3fs to be used with ParquetDataset and reader/writer functions
+* ARROW-1219 - [C++] Use more vanilla Google C++ formatting
+* ARROW-1224 - [Format] Clarify language around buffer padding and alignment in IPC
+* ARROW-1230 - [Plasma] Install libraries and headers
+* ARROW-1243 - [Java] security: upgrade all libraries to latest stable versions
+* ARROW-1251 - [Python/C++] Revise build documentation to account for latest build toolchain
+* ARROW-1253 - [C++] Use pre-built toolchain libraries where prudent to speed up CI builds
+* ARROW-1255 - [Plasma] Check plasma flatbuffer messages with the flatbuffer verifier
+* ARROW-1257 - [Plasma] Plasma documentation
+* ARROW-1258 - [C++] Suppress dlmalloc warnings on Clang
+* ARROW-1259 - [Plasma] Speed up Plasma tests
+* ARROW-1260 - [Plasma] Use factory method to create Python PlasmaClient
+* ARROW-1264 - [Plasma] Don't exit the Python interpreter if the plasma client can't connect to the store
+* ARROW-1274 - [C++] `add_compiler_export_flags()` throws warning with CMake >= 3.3
+* ARROW-1288 - Clean up many ASF license headers
+* ARROW-1289 - [Python] Add `PYARROW_BUILD_PLASMA` option like Parquet
+* ARROW-1301 - [C++/Python] Add remaining supported libhdfs UNIX-like filesystem APIs
+* ARROW-1303 - [C++] Support downloading Boost
+* ARROW-1315 - [GLib] Status check of arrow::ArrayBuilder::Finish() is missing
+* ARROW-1323 - [GLib] Add `garrow_boolean_array_get_values()`
+* ARROW-1333 - [Plasma] Sorting example for DataFrames in plasma
+* ARROW-1334 - [C++] Instantiate arrow::Table from vector of Array objects (instead of Columns)
+
+## New Feature
+
+* ARROW-1076 - [Python] Handle nanosecond timestamps more gracefully when writing to Parquet format
+* ARROW-1104 - Integrate in-memory object store from Ray
+* ARROW-1246 - [Format] Add Map logical type to metadata
+* ARROW-1268 - [Website] Blog post on Arrow integration with Spark
+* ARROW-1281 - [C++/Python] Add Docker setup for running HDFS tests and other tests we may not run in Travis CI
+* ARROW-1305 - [GLib] Add GArrowIntArrayBuilder
+* ARROW-1336 - [C++] Add arrow::schema factory function
+* ARROW-439 - [Python] Add option in `to_pandas` conversions to yield Categorical from String/Binary arrays
+* ARROW-622 - [Python] Investigate alternatives to `timestamps_to_ms` argument in pandas conversion
+
+## Task
+
+* ARROW-1270 - [Packaging] Add Python wheel build scripts for macOS to arrow-dist
+* ARROW-1272 - [Python] Add script to arrow-dist to generate and upload manylinux1 Python wheels
+* ARROW-1273 - [Python] Add convenience functions for reading only Parquet metadata or effective Arrow schema from a particular Parquet file
+* ARROW-1297 - 0.6.0 Release
+* ARROW-1304 - [Java] Fix checkstyle checks warning
+
+## Test
+
+* ARROW-1241 - [C++] Visual Studio 2017 Appveyor build job
+
 # Apache Arrow 0.5.0 (23 July 2017)
 
 ## Bug
 
-* ARROW-1074 - from_pandas doesnt convert ndarray to list
+* ARROW-1074 - `from_pandas` doesnt convert ndarray to list
 * ARROW-1079 - [Python] Empty "private" directories should be ignored by Parquet interface
-* ARROW-1081 - C++: arrow::test::TestBase::MakePrimitive doesn't fill null_bitmap
+* ARROW-1081 - C++: arrow::test::TestBase::MakePrimitive doesn't fill `null_bitmap`
 * ARROW-1096 - [C++] Memory mapping file over 4GB fails on Windows
 * ARROW-1097 - Reading tensor needs file to be opened in writeable mode
 * ARROW-1098 - Document Error?

diff --git a/site/_posts/2017-08-16-0.6.0-release.md b/site/_posts/2017-08-16-0.6.0-release.md
@@ -0,0 +1,112 @@
+---
+layout: post
+title: "Apache Arrow 0.6.0 Release"
+date: "2017-08-16 00:00:00 -0400"
+author: wesm
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 0.6.0 release. It includes
+[**90 resolved JIRAs**][1] with the new Plasma shared memory object store, and
+improvements and bug fixes to the various language implementations. The Arrow
+memory format remains stable since the 0.3.x release.
+
+See the [Install Page][2] to learn how to get the libraries for your
+platform. The [complete changelog][5] is also available.
+
+## Plasma Shared Memory Object Store
+
+This release includes the [Plasma Store][7], which you can read more about in
+the linked blog post. This system was originally developed as part of the [Ray
+Project][8] at the [UC Berkeley RISELab][9]. We recognized that Plasma would be
+highly valuable to the Arrow community as a tool for shared memory management
+and zero-copy deserialization. Additionally, we believe we will be able to
+develop a stronger software stack through sharing of IO and buffer management
+code.
+
+The Plasma store is a server application which runs as a separate process. A
+reference C++ client, with Python bindings, is made available in this
+release. Clients can be developed in Java or other languages in the future to
+enable simple sharing of complex datasets through shared memory.
+
+## Arrow Format Addition: Map type
+
+We added a Map logical type to represent ordered and unordered maps
+in-memory. This corresponds to the `MAP` logical type annotation in the Parquet
+format (where maps are represented as repeated structs).
+
+Map is represented as a list of structs. It is the first example of a logical
+type whose physical representation is a nested type. We have not yet created
+implementations of Map containers in any of the implementations, but this can
+be done in a future release.
+
+As an example, the Python data:
+
+```
+data = [{'a': 1, 'bb': 2, 'cc': 3}, {'dddd': 4}]
+```
+
+Could be represented in an Arrow `Map<String, Int32>` as:
+
+```
+Map<String, Int32> = List<Struct<keys: String, values: Int32>>
+  is_valid: [true, true]
+  offsets: [0, 3, 4]
+  values: Struct<keys: String, values: Int32>
+    children:
+      - keys: String
+          is_valid: [true, true, true, true]
+          offsets: [0, 1, 3, 5, 9]
+          data: abbccdddd
+      - values: Int32
+          is_valid: [true, true, true, true]
+          data: [1, 2, 3, 4]
+```
+## Python Changes
+
+Some highlights of Python development outside of bug fixes and general API
+improvements include:
+
+* New `strings_to_categorical=True` option when calling `Table.to_pandas` will
+  yield pandas `Categorical` types from Arrow binary and string columns
+* Expanded Hadoop Filesystem (HDFS) functionality to improve compatibility with
+  Dask and other HDFS-aware Python libraries.
+* s3fs and other Dask-oriented filesystems can now be used with
+  `pyarrow.parquet.ParquetDataset`
+* More graceful handling of pandas's nanosecond timestamps when writing to
+  Parquet format. You can now pass `coerce_timestamps='ms'` to cast to
+  milliseconds, or `'us'` for microseconds.
+
+## Toward Arrow 1.0.0 and Beyond
+
+We are still discussing the roadmap to 1.0.0 release on the [developer mailing
+list][6]. The focus of the 1.0.0 release will likely be memory format stability
+and hardening integration tests across the remaining data types implemented in
+Java and C++. Please join the discussion there.
+
+[1]: https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.6.0
+[2]: http://arrow.apache.org/install
+[3]: http://github.com/apache/parquet-cpp
+[5]: http://arrow.apache.org/release/0.6.0.html
+[6]: http://mail-archives.apache.org/mod_mbox/arrow-dev/
+[7]: http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/
+[8]: https://ray-project.github.io/ray/
+[9]: https://rise.cs.berkeley.edu/