Skip to content

Commit 77ea64e

Browse files
committed
Merge branch 'master' into release
2 parents 26be802 + 7b69907 commit 77ea64e

File tree

20 files changed

+1175
-44
lines changed

20 files changed

+1175
-44
lines changed

docs/getting_started/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ The domain of the intervals can be either numerical, :class:`pandas.Timestamp` o
3636
- have a finite, length
3737
- are left-closed right-open, or right-closed left-open
3838

39-
A small :ref:`case study <user_guide.calendar_example>` using :mod:`piso` can be found in the :ref:`user guide <user_guide>`. Further examples, and a detailed explanation of functionality, are provided in the :ref:`api`.
39+
Several :ref:`case studies <case_studies>` using :mod:`piso` can be found in the :ref:`user guide <user_guide>`. Further examples, and a detailed explanation of functionality, are provided in the :ref:`api`.
4040

4141

4242
Versioning

docs/reference/accessors.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,4 +18,5 @@ Accessors
1818
ArrayAccessor.issubset
1919
ArrayAccessor.coverage
2020
ArrayAccessor.complement
21+
ArrayAccessor.contains
2122
ArrayAccessor.get_indexer

docs/reference/package.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,5 +20,7 @@ Top level functions
2020
issubset
2121
coverage
2222
complement
23+
contains
2324
get_indexer
24-
lookup
25+
lookup
26+
join

docs/release_notes/index.rst

Lines changed: 32 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,37 @@ Release notes
55
========================
66

77

8+
ADD UNRELEASED CHANGES ABOVE THIS LINE
9+
10+
**v0.5.0 2021-11-02**
11+
12+
Added the following methods
13+
14+
- :func:`piso.join` for *join operations* with interval indexes
15+
- :func:`piso.contains`
16+
- :meth:`ArrayAccessor.contains() <piso.accessor.ArrayAccessor.contains>`
17+
18+
Performance improvements for
19+
20+
- :func:`piso.lookup`
21+
- :func:`piso.get_indexer`
22+
23+
824
**v0.4.0 2021-10-30**
925

1026
Added the following methods
1127

12-
- :meth:`piso.lookup`
13-
- :meth:`piso.get_indexer`
28+
- :func:`piso.lookup`
29+
- :func:`piso.get_indexer`
1430
- :meth:`ArrayAccessor.get_indexer() <piso.accessor.ArrayAccessor.get_indexer>`
1531

1632

1733
**v0.3.0 2021-10-23**
1834

1935
Added the following methods
2036

21-
- :meth:`piso.coverage`
22-
- :meth:`piso.complement`
37+
- :func:`piso.coverage`
38+
- :func:`piso.complement`
2339
- :meth:`ArrayAccessor.coverage() <piso.accessor.ArrayAccessor.coverage>`
2440
- :meth:`ArrayAccessor.complement() <piso.accessor.ArrayAccessor.complement>`
2541

@@ -28,9 +44,9 @@ Added the following methods
2844

2945
Added the following methods
3046

31-
- :meth:`piso.isdisjoint`
32-
- :meth:`piso.issuperset`
33-
- :meth:`piso.issubset`
47+
- :func:`piso.isdisjoint`
48+
- :func:`piso.issuperset`
49+
- :func:`piso.issubset`
3450
- :meth:`ArrayAccessor.isdisjoint() <piso.accessor.ArrayAccessor.isdisjoint>`
3551
- :meth:`ArrayAccessor.issuperset() <piso.accessor.ArrayAccessor.issuperset>`
3652
- :meth:`ArrayAccessor.issubset() <piso.accessor.ArrayAccessor.issubset>`
@@ -42,17 +58,17 @@ Added the following methods
4258

4359
The following methods are included in the initial release of `piso`
4460

45-
- :meth:`piso.register_accessors`
46-
- :meth:`piso.union`
47-
- :meth:`piso.intersection`
48-
- :meth:`piso.difference`
49-
- :meth:`piso.symmetric_difference`
61+
- :func:`piso.register_accessors`
62+
- :func:`piso.union`
63+
- :func:`piso.intersection`
64+
- :func:`piso.difference`
65+
- :func:`piso.symmetric_difference`
5066
- :meth:`ArrayAccessor.union() <piso.accessor.ArrayAccessor.union>`
5167
- :meth:`ArrayAccessor.intersection() <piso.accessor.ArrayAccessor.intersection>`
5268
- :meth:`ArrayAccessor.difference() <piso.accessor.ArrayAccessor.difference>`
5369
- :meth:`ArrayAccessor.symmetric_difference() <piso.accessor.ArrayAccessor.symmetric_difference>`
54-
- :meth:`piso.interval.union`
55-
- :meth:`piso.interval.intersection`
56-
- :meth:`piso.interval.difference`
57-
- :meth:`piso.interval.symmetric_difference`
70+
- :func:`piso.interval.union`
71+
- :func:`piso.interval.intersection`
72+
- :func:`piso.interval.difference`
73+
- :func:`piso.interval.symmetric_difference`
5874

docs/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ ipykernel
22
sphinx == 4.0.2
33
nbsphinx == 0.8.6
44
sphinx-panels
5-
staircase
5+
staircase >= 2.1
66
pandas
77
numpy
88
Pygments
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
.. _user_guide.football_example:
2+
3+
4+
Analysis of scores in a football match
5+
=======================================
6+
7+
In this example we will look at a football match from 2009:
8+
9+
The Champions League quarter-final between Chelsea and Liverpool
10+
in 2009 is recognised as among the best games of all time.
11+
Liverpool scored twice in the first half in the 19th and 28th minute.
12+
Chelsea then opened their account in the second half with three
13+
unanswered goals in the 51st, 57th and 76th minute. Liverpool
14+
responded with two goals in the 81st and 83rd minute to put themselves
15+
ahead, however Chelsea drew with a goal in the 89th minute and advanced
16+
to the next stage on aggregate.
17+
18+
19+
We start by importing :mod:`pandas` and :mod:`piso`
20+
21+
.. ipython:: python
22+
23+
import pandas as pd
24+
import piso
25+
26+
27+
For the analysis we will create a :class:`pandas.Series`, indexed by a :class:`pandas.IntervalIndex` for each team. The values of each series will be the team's score and the interval index, defined by :class:`pandas.Timedelta`, will describe the durations corresponding to each score. We define the following function which creates such a Series, given the minute marks for each score.
28+
29+
.. ipython:: python
30+
31+
def make_series(goal_time_mins):
32+
breaks = pd.to_timedelta([0] + goal_time_mins + [90], unit="min")
33+
ii = pd.IntervalIndex.from_breaks(breaks)
34+
return pd.Series(range(len(ii)), index = ii, name="score")
35+
36+
We can now create each Series.
37+
38+
.. ipython:: python
39+
40+
chelsea = make_series([51,57,76,89])
41+
liverpool = make_series([19,28,81,83])
42+
43+
For reference, the Series corresponding to `chelsea` is
44+
45+
.. ipython:: python
46+
47+
chelsea
48+
49+
To enable analysis for separate halves of the game we'll define a similar Series which defines the time intervals for each half
50+
51+
.. ipython:: python
52+
53+
halves = pd.Series(
54+
["1st", "2nd"],
55+
pd.IntervalIndex.from_breaks(pd.to_timedelta([0, 45, 90], unit="min")),
56+
name="half",
57+
)
58+
halves
59+
60+
We can now perform a join on these three Series. Since `chelsea` and `liverpool` Series have the same name it will be necessary to provide suffixes to differentiate the columns in the result. The `halves` Series does not have the same name, but a suffix must be defined for each of the join operands if there are any overlaps.
61+
62+
.. ipython:: python
63+
64+
CvsL = piso.join(chelsea, liverpool, halves, suffixes=["_chelsea", "_liverpool", ""])
65+
CvsL
66+
67+
By default, the :func:`piso.join` function performs a left-join. Since every interval index represents the same domain, that is `(0', 90']`, all join types - *left*, *right*, *inner*, *outer* - will give the same result.
68+
69+
Using this dataframe we will now provide answers for miscellaneous questions. In particular we will filter the dataframe based on values in the columns, then sum the lengths of the intervals in the filtered index.
70+
71+
72+
**How much game time did Chelsea lead for?**
73+
74+
.. ipython:: python
75+
76+
CvsL.query("score_chelsea > score_liverpool").index.length.sum()
77+
78+
79+
**How much game time did Liverpool lead for?**
80+
81+
.. ipython:: python
82+
83+
CvsL.query("score_liverpool > score_chelsea").index.length.sum()
84+
85+
**How much game time were the teams tied for?**
86+
87+
.. ipython:: python
88+
89+
CvsL.query("score_liverpool == score_chelsea").index.length.sum()
90+
91+
**How much game time in the first half were the teams tied for?**
92+
93+
.. ipython:: python
94+
95+
CvsL.query("score_chelsea == score_liverpool and half == '1st'").index.length.sum()
96+
97+
**For how long did Liverpool lead Chelsea by exactly one goal (split by half)?**
98+
99+
.. ipython:: python
100+
101+
CvsL.groupby("half").apply(
102+
lambda df: df.query("score_liverpool - score_chelsea == 1").index.length.sum()
103+
)
104+
105+
**What was the score at the 80 minute mark?**
106+
107+
.. ipython:: python
108+
109+
piso.lookup(CvsL, pd.Timedelta(80, unit="min"))
110+
111+
112+
This analysis is also straightforward using :mod:`staircase`. For more information on this please see the :ref:`corresponding example with staircase <user_guide.football_staircase_example>`
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
.. _user_guide.football_staircase_example:
2+
3+
4+
Analysis of scores in a football match (using staircase)
5+
===========================================================
6+
7+
.. ipython:: python
8+
:suppress:
9+
10+
import matplotlib.pyplot as plt
11+
import matplotlib.ticker as ticker
12+
plt.style.use('seaborn')
13+
14+
This example demonstrates how :mod:`staircase` can be used to mirror the functionality
15+
and analysis presented in the :ref:`corresponding example with piso <user_guide.football_example>`.
16+
17+
The Champions League quarter-final between Chelsea and Liverpool
18+
in 2009 is recognised as among the best games of all time.
19+
Liverpool scored twice in the first half in the 19th and 28th minute.
20+
Chelsea then opened their account in the second half with three
21+
unanswered goals in the 51st, 57th and 76th minute. Liverpool
22+
responded with two goals in the 81st and 83rd minute to put themselves
23+
ahead, however Chelsea drew with a goal in the 89th minute and advanced
24+
to the next stage on aggregate.
25+
26+
27+
We start by importing :mod:`pandas` and :mod:`staircase`
28+
29+
.. ipython:: python
30+
31+
import pandas as pd
32+
import staircase as sc
33+
34+
35+
For the analysis we will create a :class:`staircase.Stairs` for each team, and wrap them up in a :class:`pandas.Series` which is indexed by the club names. Using a Series in this way is by no means necessary but can be useful. We'll create a function `make_stairs` which takes the minute marks of the goals and returns a :class:`staircase.Stairs`. Each step function will be monotonically non-decreasing.
36+
37+
.. ipython:: python
38+
39+
def make_stairs(goal_time_mins):
40+
breaks = pd.to_timedelta(goal_time_mins, unit="min")
41+
return sc.Stairs(start=breaks).clip(pd.Timedelta(0), pd.Timedelta("90m"))
42+
43+
scores = pd.Series(
44+
{
45+
"chelsea":make_stairs([51,57,76,89]),
46+
"liverpool":make_stairs([19,28,81,83]),
47+
}
48+
)
49+
scores
50+
51+
52+
To clarify we plot these step functions below.
53+
54+
.. ipython:: python
55+
:suppress:
56+
57+
fig, axes = plt.subplots(ncols=2, figsize=(8,3), sharey=True)
58+
vals = scores["chelsea"].step_values
59+
vals.index = vals.index/pd.Timedelta("1min")
60+
sc.Stairs.from_values(0, vals).plot(axes[0])
61+
axes[0].set_title("Chelsea")
62+
axes[0].set_xlabel("time (mins)")
63+
axes[0].set_ylabel("score")
64+
axes[0].yaxis.set_major_locator(ticker.MultipleLocator())
65+
axes[0].set_xlim(0,90)
66+
vals = scores["liverpool"].step_values
67+
vals.index = vals.index/pd.Timedelta("1min")
68+
sc.Stairs.from_values(0, vals).plot(axes[1])
69+
axes[1].set_title("Liverpool")
70+
axes[1].set_xlabel("time (mins)")
71+
axes[1].set_ylabel("score")
72+
@savefig case_study_football_staircase.png
73+
plt.tight_layout();
74+
75+
76+
To enable analysis for separate halves of the game we'll define a similar Series which defines the time intervals for each half with tuples of :class:`pandas.Timedeltas`.
77+
78+
.. ipython:: python
79+
80+
halves = pd.Series(
81+
{
82+
"1st":(pd.Timedelta(0), pd.Timedelta("45m")),
83+
"2nd":(pd.Timedelta("45m"), pd.Timedelta("90m")),
84+
}
85+
)
86+
halves
87+
88+
89+
We can now use our *scores* and *halves* Series to provide answers for miscellaneous questions. Note that comparing :class:`staircase.Stairs` objects with relational operators produces boolean-valued step functions (Stairs objects). Finding the integral of these boolean step functions is equivalent to summing up lengths of intervals in the domain where the step function is equal to one.
90+
91+
**How much game time did Chelsea lead for?**
92+
93+
.. ipython:: python
94+
95+
(scores["chelsea"] > scores["liverpool"]).integral()
96+
97+
98+
**How much game time did Liverpool lead for?**
99+
100+
.. ipython:: python
101+
102+
(scores["chelsea"] < scores["liverpool"]).integral()
103+
104+
**How much game time were the teams tied for?**
105+
106+
.. ipython:: python
107+
108+
(scores["chelsea"] == scores["liverpool"]).integral()
109+
110+
**How much game time in the first half were the teams tied for?**
111+
112+
.. ipython:: python
113+
114+
(scores["chelsea"] == scores["liverpool"]).where(halves["1st"]).integral()
115+
116+
**For how long did Liverpool lead Chelsea by exactly one goal (split by half)?**
117+
118+
.. ipython:: python
119+
120+
halves.apply(lambda x:
121+
(scores["liverpool"]==scores["chelsea"]+1).where(x).integral()
122+
)
123+
124+
**What was the score at the 80 minute mark?**
125+
126+
.. ipython:: python
127+
128+
sc.sample(scores, pd.Timedelta("80m"))

0 commit comments

Comments
 (0)