-
Notifications
You must be signed in to change notification settings - Fork 4
affirm_dupe_free updates
#46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Important @shannonpileggi, please review this update to Per discussion in #42, I've modified Key Files Changed:
Implementation Notes
ValidationExamples
|
shannonpileggi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @Meghansaha! this is looking better. however, it seems that the columns argument is used inconsistently across functions .
affirm_true, columns = represents "string of column names used in affirmation"
- it isn't super clear to me what this means and we dont have a lot of documentation/testing on this, would be good to flesh out
affirm_dupe_free, columns = represents "columns to check duplicates among"
affirm_class, columns = represents "columns to check class"
it is probably best to discuss
- what should be expected behavior across functions
- what can/should go in this PR, vs go in another PR
Example 1
df_test <- mtcars |>
tibble::rownames_to_column() |>
dplyr::arrange(cyl, hp)
options(
'affirm.id_cols' = c(
"rowname", "mpg"
)
)
affirm_init(replace = TRUE)
df_test |>
affirm_dupe_free(
label = "No duplicates in the number of cylinders & hp",
columns = c(cyl,hp),
id = 1,
data_frames = "mtcars"
)
df_test |>
affirm_true(
label = "mpg <= 30",
condition = mpg <= 30,
id = 2,
data_frames = "mtcars",
# these do not work
#columns = c("rowname", "mpg"),
#columns = everything()
)
affirm_report_excel("test.xlsx")
Affirmation 1 (affirm_dupe_free) - checks cyl & hp for duplicates; ignores affirm.id_cols in output creation; outputs flag_duplicate which is inconsistent with affirmation 2 (no flag output)
Affirmation 2 (affirm_true) - columns argument doesn't work; affirm.id_cols respected; specification of affirm.id_cols differs from columns specification in affirm_dupe_free (bare vs quoted variable names)
Example 2
affirm_init(replace = TRUE)
affirm_class(
dplyr::as_tibble(iris),
label = "all cols are numeric (but Species really isn't)",
columns = everything(),
data_frames = "iris",
id = 1,
class = "numeric"
)
affirm_report_excel("test_iris.xlsx")
affirm_class() behaves more like affirm_dupe_free(). all columns are tested but not output.
| .Ruserdata | ||
| docs | ||
| inst/doc | ||
| affirm.Rproj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did this get added here? this is something we push
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had updated RStudio around the time I was working on this. That update added a projectID field to the .Rproj file that our current .gitignore isn't catching:
Lines 1 to 6 in 7b328b2
| .Rproj.user | |
| .Rhistory | |
| .RData | |
| .Ruserdata | |
| docs | |
| inst/doc |
Should we commit this projectID or update the gitignore to exclude .Rproj files entirely? Tried to find some info on this, : rstudio/rstudio#15524
But still fuzzy on what the implications of projectID are.
Let's discuss.



What changes are proposed in this pull request?
Updated
affirm_no_dupes()to addrecord_idandflag_duplicatecolumns to the output, providing better visibility into duplicate detection logic.If there is an GitHub issue associated with this pull request, please provide link.
Closes #42
Reviewer Checklist (if item does not apply, mark is as complete)
renv::install()_pkgdown.ymlpkgdown::build_site(). Check the R console for errors, and review the rendered website.withr::with_envvar(new = c("NOT_CRAN" = "true"), covr::report()). Begin in a fresh R session without any packages loaded.usethis::use_spell_check()runs with no spelling errors in documentationWhen the branch is ready to be merged into master:
NEWS.mdwith the changes from this pull request under the heading "# pcctc (development version)". If there is an issue associated with the pull request, reference it in parentheses at the end update (seeNEWS.mdfor examples).usethis::use_version(which = "dev")usethis::use_spell_check()againpkgdown::deploy_to_branch()to refresh website to latest version.