Pandas-friendly column names #443

remrama · 2024-10-07T20:08:52Z

This PR addresses #208 (part of the v0.6.0 Roadmap layed out in #279). It removes characters that restrict column access to the bracket format (df["p_val"]) rather than dot(?) method (df.p_val). It includes the following column-name changes:

Replaced dashes with underscores in column names (e.g., p-val --> p_val)
Replaced parentheses with underscores in column names (e.g., mean(A) --> mean_A)
Removed percent signs and brackets in column names (e.g., CI95% --> CI95, CI[97.5%] --> CI97.5)

Here is a specific list of most (if not all) of the changes:

p-adjust --> p_adjust
p-approx --> p_approx
p-corr --> p_corr
p-exact --> p_exact
p-mid --> p_mid
p-spher --> p_sphere
p-tukey --> p_tukey
p-unc --> p_unc
p-val --> p_val
U-val --> U_val
W-spher --> W_spher
W-val --> W_val
p-GG-corr --> p_GG_corr
cohen-d --> cohen_d
eta-square --> eta_square
odds-ratio --> odds_ratio
CI95% --> CI95
CI90% --> CI90
CI[2.5%] --> CI2.5
CI[97.5%] --> CI97.5
mean(A) --> mean_A
mean(B) --> mean_B
std(A) --> std_A
std(B) --> std_B

Future column-name considerations

In doing this, I noted some other column-name considerations for future PRs that feel less immediate:

Standardizing the use of "val". Modules differentially name p-value columns as p, pval, or p_val. This should be standardized across modules.
Related to (1), the norm for p-values might correspond with other "values", like the current column names T, F, W, r, etc. I think optimally this would either be pval and Wval/wval or p and W/w. Perhaps an argument for keeping "val" is that .T blocks the pandas transpose method.
Standardize some effect size namings across modules. Cohen's d is differentially referred to as cohen or cohen_d, and eta-squared is differentially referred to as eta_square, eta-squared, or n2.
Standardize the degrees of freedom column names. These are differentially named DF, dof, ddof.
Standardize abbreviation capitalizations. Bayes factors are BF, confidence intervals are CI, Wilcoxon stats are W, then there's dof and r.
Standardize Sentence case vs camelCase vs snake_case. Column capitalization, like Parametric vs alternative are inconsistent. I think snake_case is the pandas norm, but consistency seems most important.
I think including the numerical values in confidence interval column names is not necessary. Eg, CI95 could just be CI, and, more importantly, CI97.5 and CI2.5 could just be CI_upper and CI_lower, respectively.

Notes

The jupyter notebooks should run fine with the new code, but I did not re-run them, so the output of those tutorials still have the old column names. Let me know if you want me to rerun them before merging.
I updated the docstrings as needed, but did not build them locally to test. Let me know if you want me to build and inspect before merging.

raphaelvallat · 2024-10-08T06:23:05Z

@remrama this is absolutely fantastic. Thank you for doing such a thorough investigation. My thoughts on what you described in the "Future column-name considerations":

Overall, I think we should merge the current PR and then implement most, if not all of the changes that you describe, and then release a new 0.6.0 version. The other items in the roadmap feel less important and could be included as part of the 6.x minor version updates.
Agreed for:
- using *val throughout
- use cohen_d and eta_squared (and replace np2 with partial_eta_squared?)
- use ddof to be consistent with pandas and scipy
- use snake_case throughout
- use ci, ci_upper, ci_lower throughout
- perhaps even worth adding a new documentation page with Pingouin's standard naming conventions...?

Laslty, yes to both your questions in the "Notes" section. :) I'll do a final review once you have re-generated the notebook. I had a brief look and I think it's good to go. I'm assuming you just did a search and replace of the relevant names in VSCode or similar editor, right?

remrama · 2024-10-08T14:03:04Z

Great, thanks @raphaelvallat. I think the current PR is now ready for a quick review. I updated the notebooks and tested the docs locally (passed build and visual inspection). I'm a bit confused as to why the Python tests aren't running after a PR (Lint is, but not others). I ran tests locally and they passed, but I can't speak for alternate systems and Python versions.

And yes, correct, I used search/replace within VSCode. It was mostly iterations of regex to find all the things that needed replacing, and then more regex to simplify replacements. There were only a few snags, like the CI column namings and catching all the docstrings.

Happy to push for another PR that adds to the column-name conventions after this one :)

raphaelvallat · 2024-10-08T15:21:48Z

Fix the Github Action CI:

Replace

name: Python tests

on:
  push:
    branches: [master, develop]
  pull_request:
    branches: [master, develop]

with:

name: Python tests

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

(Yes, I should have caught this before... 😬 )

raphaelvallat · 2024-10-08T15:23:10Z

I'll wait for the unit tests and then do a final review and approval. Hopefully we don't get failing unit tests from previous PRs 🤦 Feel free to re-enable the CI in a separate PR, which we'll merge before this one

* GH Action on main branch instead of master and develop * bump actions/upload-artifact@v2 to v4 * install doc requirements from .toml * bump actions/checkout@v2 --> v4 * bump actions/setup-python@v1 --> v5 * move pip-install-docs back to only run during docs build * typo 3.8 --> 3.9 * bump codecov/codecov-action@v1 --> v4 * remove platform specification from docs-artifact

…into remrama/issue208

codecov · 2024-10-09T14:07:19Z

Codecov Report

Attention: Patch coverage is 98.38710% with 1 line in your changes missing coverage. Please review.

Please upload report for BASE (main@c93ec4b). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/pingouin/correlation.py	80.00%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #443   +/-   ##
=======================================
  Coverage        ?   98.54%           
=======================================
  Files           ?       19           
  Lines           ?     3360           
  Branches        ?      492           
=======================================
  Hits            ?     3311           
  Misses          ?       26           
  Partials        ?       23

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

remrama · 2024-10-09T14:14:05Z

@raphaelvallat all checks passed here, you should be good to review/merge. We should be good to go for the immediate column-name fixes, then we can set up a separate PR for more convention-specific namings.

I had a last-minute thought about this PR: It really doesn't impact at all how the user interfaces with Pingouin (only the output). But there is one exception with the compute_effsize function, where a few parameters won't work anymore. Passing in eta-square and odds-ratio now will not work, user needs to pass eta_square or odds_ratio. Maybe just leave it, but I was thinking we could add a simple "-".replace("_") and a FutureWarning. The only reason these were changed is because they end up as column names in some functions, so the other option would be to just name the columns something like effsize and not carry the actual stat name over.

raphaelvallat

Thank you so much @remrama , approved!

I had a last-minute thought about this PR: It really doesn't impact at all how the user interfaces with Pingouin (only the output). But there is one exception with the compute_effsize function, where a few parameters won't work anymore. Passing in eta-square and odds-ratio now will not work, user needs to pass eta_square or odds_ratio. Maybe just leave it, but I was thinking we could add a simple "-".replace("_") and a FutureWarning. The only reason these were changed is because they end up as column names in some functions, so the other option would be to just name the columns something like effsize and not carry the actual stat name over.

I think let's not worry about the FutureWarning and replace. We should just leverage this major update to do a clean slate.

remrama added 6 commits October 6, 2024 18:18

CI95% --> CI95

fe1e564

p-* --> p_* (also U-* and W-*)

b930822

CI[97.5%] --> CI97.5

1c8694e

mean(A) --> mean_A (also std)

88cafad

T-test --> T_test (index)

c6f0b26

effect sizes

ef8b121

raphaelvallat self-requested a review October 8, 2024 06:13

raphaelvallat mentioned this pull request Oct 8, 2024

Roadmap for release 0.6.0 #279

Open

11 tasks

updated notebooks

5621253

remrama mentioned this pull request Oct 8, 2024

Fix the Github Action CI for Python tests #445

Merged

raphaelvallat assigned remrama Oct 8, 2024

raphaelvallat added the IMPORTANT❗ label Oct 8, 2024

remrama added 9 commits October 9, 2024 09:05

CI95% --> CI95

2e6a48c

p-* --> p_* (also U-* and W-*)

b244732

CI[97.5%] --> CI97.5

5c56bff

mean(A) --> mean_A (also std)

fffab87

T-test --> T_test (index)

d55a3db

effect sizes

8416be5

updated notebooks

6a88cd1

Merge branch 'remrama/issue208' of https://github.com/remrama/pingouin …

df3b65b

…into remrama/issue208

raphaelvallat approved these changes Oct 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas-friendly column names #443

Pandas-friendly column names #443

remrama commented Oct 7, 2024 •

edited

Loading

raphaelvallat commented Oct 8, 2024 •

edited

Loading

remrama commented Oct 8, 2024 •

edited

Loading

raphaelvallat commented Oct 8, 2024 •

edited

Loading

raphaelvallat commented Oct 8, 2024

codecov bot commented Oct 9, 2024

remrama commented Oct 9, 2024

raphaelvallat left a comment

Pandas-friendly column names #443

Are you sure you want to change the base?

Pandas-friendly column names #443

Conversation

remrama commented Oct 7, 2024 • edited Loading

Future column-name considerations

Notes

raphaelvallat commented Oct 8, 2024 • edited Loading

remrama commented Oct 8, 2024 • edited Loading

raphaelvallat commented Oct 8, 2024 • edited Loading

raphaelvallat commented Oct 8, 2024

codecov bot commented Oct 9, 2024

Codecov Report

remrama commented Oct 9, 2024

raphaelvallat left a comment

Choose a reason for hiding this comment

remrama commented Oct 7, 2024 •

edited

Loading

raphaelvallat commented Oct 8, 2024 •

edited

Loading

remrama commented Oct 8, 2024 •

edited

Loading

raphaelvallat commented Oct 8, 2024 •

edited

Loading