Version 0.18.0
Multi-index columns support
We continue improving multi-index columns support (#793, #776). We made the following APIs support multi-index columns:
Also, we can set tuple or None name for Series and Index. (#776)
>>> import databricks.koalas as ks
>>> kser = ks.Series([1, 2, 3])
>>> kser.name = ('a', 'b')
>>> kser
0 1
1 2
2 3
Name: (a, b), dtype: int64
Plots
We also continue adding plot APIs as follows:
For Series:
plot.kde()
(#767)
For DataFrame:
plot.hist()
(#780)
Options
In addition, we added the support for namespace-access in options (#785).
>>> import databricks.koalas as ks
>>> ks.options.display.max_rows
1000
>>> ks.options.display.max_rows = 10
>>> ks.options.display.max_rows
10
See also User Guide of our project docs.
Other new features and improvements
We added the following new features:
koalas.DataFrame:
koalas.indexes.Index/MultiIndex
is_boolean
(#795)is_categorical
(#795)is_floating
(#795)is_integer
(#795)is_interval
(#795)is_numeric
(#795)is_object
(#795)
Along with the following improvements:
- Add
index_col
forread_json
(#797) - Add index_col for spark IO reads (#769, #775)
- Add "sep" parameter for read_csv (#777)
- Add axis parameter to dataframe.diff (#774)
- Add read_json and let to_json use spark.write.json (#753)
- Use spark.write.csv in to_csv of Series and DataFrame (#749)
- Handle TimestampType separately when convert to pandas' dtype. (#798)
- Fix
spark_df
whenset_index(.., drop=False)
. (#792)