Version 0.25.0
loc
and iloc
indexers improvement
We improved loc
and iloc
indexers. Now, loc
can support scalar values as indexers (#1172).
>>> import databricks.koalas as ks
>>>
>>> df = ks.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=['cobra', 'viper', 'sidewinder'],
... columns=['max_speed', 'shield'])
>>> df.loc['sidewinder']
max_speed 7
shield 8
Name: sidewinder, dtype: int64
>>> df.loc['sidewinder', 'max_speed']
7
In addition, Series derived from a different Frame can be used as indexers (#1155).
>>> import databricks.koalas as ks
>>>
>>> ks.options.compute.ops_on_diff_frames = True
>>>
>>> df1 = ks.DataFrame({'A': [0, 1, 2, 3, 4], 'B': [100, 200, 300, 400, 500]},
... index=[20, 10, 30, 0, 50])
>>> df2 = ks.DataFrame({'A': [0, -1, -2, -3, -4], 'B': [-100, -200, -300, -400, -500]},
... index=[20, 10, 30, 0, 50])
>>> df1.A.loc[df2.A > -3].sort_index()
10 1
20 0
30 2
Lastly, now loc
uses its natural order according to index identically with pandas' when using the slice (#1159, #1174, #1179). See the example below.
>>> df = ks.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=['cobra', 'viper', 'sidewinder'],
... columns=['max_speed', 'shield'])
>>> df.loc['cobra':'viper', 'max_speed']
cobra 1
viper 4
Name: max_speed, dtype: int64
Other new features and improvements
We added the following new features:
koalas.Series:
get
(#1153)
koalas.Index
koalas.MultiIndex:
Other improvements
- Add support
from_pandas
for Index/MultiIndex. (#1170) - Add a hidden column
__natural_order__
. (#1146) - Introduce
_LocIndexerLike
and consolidate some logic. (#1149) - Refactor
LocIndexerLike.__getitem__
. (#1152) - Remove sort in
GroupBy._reduce_for_stat_function
. (#1147) - Randomize index in tests and fix some window-like functions. (#1151)
- Explicitly don't support
Index.duplicated
(#1131) - Fix
DataFrame._repr_html_()
. (#1177)