-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented GroupBy.tail #1949
Implemented GroupBy.tail #1949
Conversation
databricks/koalas/groupby.py
Outdated
|
||
sdf = kdf._internal.spark_frame | ||
tmp_col = verify_temp_column_name(sdf, "__row_number__") | ||
window = Window.partitionBy(groupkey_scols).orderBy(F.col(NATURAL_ORDER_COLUMN_NAME).desc()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation basically same as GroupBy.head()
except this line - used descending order -.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, shall we combine those two? Like:
def _limit(n, asc: bool):
...
window = ... orderBy(F.col(NATURAL_ORDER_COLUMN_NAME).asc() if asc else F.col(NATURAL_ORDER_COLUMN_NAME).desc())
...
def head(self, n):
return self._limit(n, asc=True)
def tail(self, n):
return self._limit(n, asc=False)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! let me address it. Thanks for the suggestion :)
Codecov Report
@@ Coverage Diff @@
## master #1949 +/- ##
==========================================
- Coverage 94.64% 93.74% -0.91%
==========================================
Files 49 49
Lines 10818 10839 +21
==========================================
- Hits 10239 10161 -78
- Misses 579 678 +99
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, LGTM.
databricks/koalas/groupby.py
Outdated
|
||
sdf = kdf._internal.spark_frame | ||
tmp_col = verify_temp_column_name(sdf, "__row_number__") | ||
window = Window.partitionBy(groupkey_scols).orderBy(F.col(NATURAL_ORDER_COLUMN_NAME).desc()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, shall we combine those two? Like:
def _limit(n, asc: bool):
...
window = ... orderBy(F.col(NATURAL_ORDER_COLUMN_NAME).asc() if asc else F.col(NATURAL_ORDER_COLUMN_NAME).desc())
...
def head(self, n):
return self._limit(n, asc=True)
def tail(self, n):
return self._limit(n, asc=False)
ref #1929 |
Great 👍 ! |
Thanks @ueshin @xinrong-databricks , I'd merge this now. |
This PR proposes
GroupBy.tail()
forDataFrameGroupBy
andSeriesGroupBy
.