-
-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas-friendly column names #443
base: main
Are you sure you want to change the base?
Conversation
@remrama this is absolutely fantastic. Thank you for doing such a thorough investigation. My thoughts on what you described in the "Future column-name considerations":
Laslty, yes to both your questions in the "Notes" section. :) I'll do a final review once you have re-generated the notebook. I had a brief look and I think it's good to go. I'm assuming you just did a search and replace of the relevant names in VSCode or similar editor, right? |
Great, thanks @raphaelvallat. I think the current PR is now ready for a quick review. I updated the notebooks and tested the docs locally (passed build and visual inspection). I'm a bit confused as to why the Python tests aren't running after a PR (Lint is, but not others). I ran tests locally and they passed, but I can't speak for alternate systems and Python versions. And yes, correct, I used search/replace within VSCode. It was mostly iterations of regex to find all the things that needed replacing, and then more regex to simplify replacements. There were only a few snags, like the CI column namings and catching all the docstrings. Happy to push for another PR that adds to the column-name conventions after this one :) |
Fix the Github Action CI: Replace name: Python tests
on:
push:
branches: [master, develop]
pull_request:
branches: [master, develop] with: name: Python tests
on:
push:
branches: [main]
pull_request:
branches: [main] (Yes, I should have caught this before... 😬 ) |
I'll wait for the unit tests and then do a final review and approval. Hopefully we don't get failing unit tests from previous PRs 🤦 Feel free to re-enable the CI in a separate PR, which we'll merge before this one |
* GH Action on main branch instead of master and develop * bump actions/upload-artifact@v2 to v4 * install doc requirements from .toml * bump actions/checkout@v2 --> v4 * bump actions/setup-python@v1 --> v5 * move pip-install-docs back to only run during docs build * typo 3.8 --> 3.9 * bump codecov/codecov-action@v1 --> v4 * remove platform specification from docs-artifact
…into remrama/issue208
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #443 +/- ##
=======================================
Coverage ? 98.54%
=======================================
Files ? 19
Lines ? 3360
Branches ? 492
=======================================
Hits ? 3311
Misses ? 26
Partials ? 23 ☔ View full report in Codecov by Sentry. |
@raphaelvallat all checks passed here, you should be good to review/merge. We should be good to go for the immediate column-name fixes, then we can set up a separate PR for more convention-specific namings. I had a last-minute thought about this PR: It really doesn't impact at all how the user interfaces with Pingouin (only the output). But there is one exception with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @remrama , approved!
I had a last-minute thought about this PR: It really doesn't impact at all how the user interfaces with Pingouin (only the output). But there is one exception with the compute_effsize function, where a few parameters won't work anymore. Passing in eta-square and odds-ratio now will not work, user needs to pass eta_square or odds_ratio. Maybe just leave it, but I was thinking we could add a simple "-".replace("_") and a FutureWarning. The only reason these were changed is because they end up as column names in some functions, so the other option would be to just name the columns something like effsize and not carry the actual stat name over.
I think let's not worry about the FutureWarning and replace
. We should just leverage this major update to do a clean slate.
This PR addresses #208 (part of the
v0.6.0
Roadmap layed out in #279). It removes characters that restrict column access to the bracket format (df["p_val"]
) rather than dot(?) method (df.p_val
). It includes the following column-name changes:p-val
-->p_val
)mean(A)
-->mean_A
)CI95%
-->CI95
,CI[97.5%]
-->CI97.5
)Here is a specific list of most (if not all) of the changes:
p-adjust
-->p_adjust
p-approx
-->p_approx
p-corr
-->p_corr
p-exact
-->p_exact
p-mid
-->p_mid
p-spher
-->p_sphere
p-tukey
-->p_tukey
p-unc
-->p_unc
p-val
-->p_val
U-val
-->U_val
W-spher
-->W_spher
W-val
-->W_val
p-GG-corr
-->p_GG_corr
cohen-d
-->cohen_d
eta-square
-->eta_square
odds-ratio
-->odds_ratio
CI95%
-->CI95
CI90%
-->CI90
CI[2.5%]
-->CI2.5
CI[97.5%]
-->CI97.5
mean(A)
-->mean_A
mean(B)
-->mean_B
std(A)
-->std_A
std(B)
-->std_B
Future column-name considerations
In doing this, I noted some other column-name considerations for future PRs that feel less immediate:
p
,pval
, orp_val
. This should be standardized across modules.T
,F
,W
,r
, etc. I think optimally this would either bepval
andWval
/wval
orp
andW
/w
. Perhaps an argument for keeping "val" is that.T
blocks the pandas transpose method.cohen
orcohen_d
, and eta-squared is differentially referred to aseta_square
,eta-squared
, orn2
.DF
,dof
,ddof
.BF
, confidence intervals areCI
, Wilcoxon stats areW
, then there'sdof
andr
.Parametric
vsalternative
are inconsistent. I think snake_case is the pandas norm, but consistency seems most important.CI95
could just beCI
, and, more importantly,CI97.5
andCI2.5
could just beCI_upper
andCI_lower
, respectively.Notes