-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding New Rootogram Plot #81
base: main
Are you sure you want to change the base?
Conversation
# use get_bins func from arviz-stats on observed data and then use those bins for | ||
# computing histograms for predictive data as well | ||
# WIP: currently only the bins for one variable (without any facetting) is retrieved and used | ||
bins = array_stats.get_bins(obs_distribution["home_points"].values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder for later: Use xarray aware get_bins
function once available from Arviz-Stats
Todo: Shift rug lower, below the bottom of the bars |
Rugplots now render below the bottom of the rootogram bars. To allow for some tolerance gap for visibility, I've currently set this algorithm:
(It may be that picking a specific number to subtract from min_histogram is better as that'd keep this gap a constant number of units, but then the chart size may vary and so would the perceivable gap) Output looks like this now when the rugplot is plotted: |
pc_kwargs["aes"] = pc_kwargs.get("aes", {}).copy() | ||
pc_kwargs["aes"].setdefault("overlay", sample_dims) # setting overlay dim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed, we only want facetting in rootgram
observed=None, | ||
observed_rug=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove these two. Not including the observed data doesn't really make sense for the rootgram. We could skip the y shift in such cases but then the resulting plot would be a histogram directly, if users want a "regular" histogram they should use plot_dist
.
Moreover, as you might have noticed, the rug plot doesn't really give much information because the data is discrete, so there are many lines overlapping at 1, many lines overlapping at 2... so the rugplot doesn't really show how many observations do we have at 1 or at 2, it seems each has only a single observation. And playing with alpha can only help so much, at most you'd get a qualitative information about the number of variables at each value which is less info than the observed line/scatter provide. So I would remove the argument and the rugplot.
for group_name in (predictive_data_group, "observed_data"): | ||
if group_name not in dt.children: | ||
raise TypeError(f'`data` argument must have the group "{group_name}" for ppcplot') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After removing observed
just keep this outside the if.
# new_obs_hist with histogram->y and left_edge/right_edge midpoint->x | ||
new_obs_hist = xr.Dataset() | ||
|
||
for var_name in list(obs_hist.keys()): | ||
left_edges = obs_hist[var_name].sel(plot_axis="left_edges").values | ||
right_edges = obs_hist[var_name].sel(plot_axis="right_edges").values | ||
|
||
left_edges = np.array(left_edges) | ||
right_edges = np.array(right_edges) | ||
|
||
x = (left_edges + right_edges) / 2 | ||
y = obs_hist[var_name].sel(plot_axis="histogram").values | ||
|
||
stacked_data = np.stack((x, y), axis=-1) | ||
new_var = xr.DataArray( | ||
stacked_data, dims=["hist_dim", "plot_axis"], coords={"plot_axis": ["x", "y"]} | ||
) | ||
|
||
new_obs_hist[var_name] = new_var |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# new_obs_hist with histogram->y and left_edge/right_edge midpoint->x | |
new_obs_hist = xr.Dataset() | |
for var_name in list(obs_hist.keys()): | |
left_edges = obs_hist[var_name].sel(plot_axis="left_edges").values | |
right_edges = obs_hist[var_name].sel(plot_axis="right_edges").values | |
left_edges = np.array(left_edges) | |
right_edges = np.array(right_edges) | |
x = (left_edges + right_edges) / 2 | |
y = obs_hist[var_name].sel(plot_axis="histogram").values | |
stacked_data = np.stack((x, y), axis=-1) | |
new_var = xr.DataArray( | |
stacked_data, dims=["hist_dim", "plot_axis"], coords={"plot_axis": ["x", "y"]} | |
) | |
new_obs_hist[var_name] = new_var | |
new_obs_hist = xr.concat( | |
(ds.sel(plot_axis=["right_edges", "left_edges"]).sum("plot_axis") / 2, obs_hist.sel(plot_axis="histogram", drop=True)), | |
dim="plot_axis", | |
).assign_coords(plot_axis=["x", "y"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that the code in the PR would only have worked when all variables get the same number of bins (which doesn't always happen) because the dimension name is hardcoded to hist_dim
in all variables of the dataset. As I mentioned in slack, the behaviour has changed in the get_bins
PR to support this effect. Now each variable gets independent dimensions hist_dim_mu
, hist_dim_tau
...
# new_pp_hist dataset | ||
new_pp_hist = xr.Dataset() | ||
|
||
for var_name in list(pp_hist.keys()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also update and get rid of the loop
…ooping logic and removed 'none' backend hist element duplication
Tests are failing due to |
…dded tests for plot_rootogram
Added tests for |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #81 +/- ##
==========================================
+ Coverage 84.80% 85.39% +0.59%
==========================================
Files 21 22 +1
Lines 2336 2451 +115
==========================================
+ Hits 1981 2093 +112
- Misses 355 358 +3 ☔ View full report in Codecov by Sentry. |
An example of a "hanging" rootogram and a "suspended" rootogram, including the uncertainty for the predictions |
If we follow those examples, the
And instead pass a
For the uncertainty representation of 94% HDI, what would the process be? |
Pushed draft of the new Rootogram implementation code (#52 ) .
The existing hist functions from the currently still-open plot_dist hist addition PR [WIP] Histogram support addition to distplot.py #47 were used here as well as some code from the plot_ppc PR [WIP] Adding PPC plot to Arviz-Plots #55.
The hist visual element backend interface docstring was modified with 'y' now referring to the 'top' y-coordinate explicitly and not height of the bars (as suggested by @OriolAbril). The matplotlib implementation of the hist visual element (which uses
bar
behind the scenes) was also updated internally withy=y-bottom
to reflect this. (This has to be updated in [WIP] Histogram support addition to distplot.py #47 now as well). The bars were not being plotted in the expected positions otherwise.WIP work includes appropriate binning for each facetted subset of the data to be plotted.
The observed data is used for binning for each subset and the bin heights here are required for setting the 'top's of the predictive bars so this is computed always, regardless of whether
observed
is passed as True or False (The True/False condition only determines whether it is plotted or not)Current plot output when the
rugby
default Arviz datatree is passed (should work with any datatree with posterior predictive and observed groups like plot_ppc though):azp.plot_rootogram(data)
azp.plot_rootogram(data, plot_kwargs={"predictive": False})
azp.plot_rootogram(data, observed=False)
azp.plot_rootogram(data, observed_rug=True)
^If we want a rugplot for rootograms as well like plot_ppc.
📚 Documentation preview 📚: https://arviz-plots--81.org.readthedocs.build/en/81/