Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented dateframe.between_time #2111

Merged
merged 8 commits into from
Mar 20, 2021

Conversation

LSturtew
Copy link
Contributor

@LSturtew LSturtew commented Mar 19, 2021

ref #1929

Implement DataFrame.between_time

>>> i = pd.date_range('2018-04-09', periods=4, freq='1D20min')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> kts = ks.from_pandas(ts)
>>> kts
                     A
2018-04-09 00:00:00  1
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3
2018-04-12 01:00:00  4

>>> kts.between_time('0:15', '0:45')
                     A
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3

You get the times that are *not* between two times by setting
``start_time`` later than ``end_time``:

>>> kts.between_time('0:45', '0:15')
                     A
2018-04-09 00:00:00  1
2018-04-12 01:00:00  4

@codecov-io
Copy link

codecov-io commented Mar 19, 2021

Codecov Report

Merging #2111 (37c4dbc) into master (c8a3c88) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2111   +/-   ##
=======================================
  Coverage   95.35%   95.36%           
=======================================
  Files          60       60           
  Lines       13505    13514    +9     
=======================================
+ Hits        12878    12887    +9     
  Misses        627      627           
Impacted Files Coverage Δ
databricks/koalas/missing/frame.py 100.00% <ø> (ø)
databricks/koalas/frame.py 96.54% <100.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8a3c88...37c4dbc. Read the comment docs.

@xinrong-meng
Copy link
Contributor

Thanks for your PR!

Would you add DataFrame.between_time to https://github.com/databricks/koalas/blob/master/docs/source/reference/frame.rst?

@@ -2977,6 +2977,86 @@ class locomotion
).resolved_copy
return DataFrame(internal)

def between_time(
self, start_time, end_time, include_start=True, include_end=True, axis: Union[int, str] = 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you complete type annotations of parameters?

        start_time: Union[datetime.time, str],
        end_time: Union[datetime.time, str],
        include_start: bool = True,
        include_end: bool = True,

Examples
--------
>>> i = pd.date_range('2018-04-09', periods=4, freq='1D20min')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        >>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
        >>> kts = ks.from_pandas(ts)

can be

        >>> kts = ks.DataFrame({'A': [1, 2, 3, 4]}, index=i)

indexer = index.indexer_between_time(
start_time, end_time, include_start=include_start, include_end=include_end
).to_numpy()
return self.copy().take(indexer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would return self.iloc[indexer] be better?


indexer = index.indexer_between_time(
start_time, end_time, include_start=include_start, include_end=include_end
).to_numpy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to_numpy() is a good idea since all the data will be loaded into the driver's memory.
Let me think about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I used to_numpy() is because without it I got this not implemented error.

databricks.koalas.exceptions.PandasNotImplementedError: The method pd.Index.iter() is not implemented. If you want to collect your data as an NumPy array, use 'to_numpy()' instead.

Copy link
Contributor

@xinrong-meng xinrong-meng Mar 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

        ...
        def pandas_between_time(pdf):
            return pdf.between_time(start_time, end_time, include_start, include_end)

        return self.koalas.apply_batch(pandas_between_time)

?
Then we might also want to add a test

    def test_between_time_no_shortcut(self):
        with ks.option_context("compute.shortcut_limit", 0):
            i = pd.date_range("2018-04-09", periods=4, freq="1D20min")
            ts = pd.DataFrame({"A": [1, 2, 3, 4]}, index=i)
            kts = ks.DataFrame({"A": [1, 2, 3, 4]}, index=i)
            self.assert_eq(
                ts.between_time("0:15", "0:45"), kts.between_time("0:15", "0:45"), almost=True
            )

Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LSturtew Thanks for working on this!

I left comments, most of them are nits. Otherwise, LGTM so far.

databricks/koalas/frame.py Show resolved Hide resolved
databricks/koalas/frame.py Outdated Show resolved Hide resolved
databricks/koalas/frame.py Outdated Show resolved Hide resolved
databricks/koalas/frame.py Outdated Show resolved Hide resolved
databricks/koalas/frame.py Outdated Show resolved Hide resolved
databricks/koalas/tests/test_dataframe.py Outdated Show resolved Hide resolved
databricks/koalas/tests/test_dataframe.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending tests.

@ueshin
Copy link
Collaborator

ueshin commented Mar 20, 2021

@LSturtew Thanks! Let me merge this now.
@xinrong-databricks Please feel free to leave additional comments or submit a follow-up PR if any.

@ueshin ueshin merged commit 583e03d into databricks:master Mar 20, 2021
@xinrong-meng
Copy link
Contributor

Thanks @LSturtew @ueshin !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants