-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Add +
for rulesets
#9
Conversation
Hi Ben, thanks for the PR (and on a related note congrats to the sucess of zoomerjoin <3). I like the idea of having an option to combine rulesets. I am at the moment unsure whether I would want to overload the Also, I would like to have a couple of tests that cover some use-cases. Best, |
I very much like the suggestion to add a
Here's a snippet showing off this new behavior. rule_1 <- rule(mpg > 10 & mpg < 30) # mpg goes up to 34
rule_2 <- rule(cyl %in% c(4, 8)) # missing 6 cyl
rule_3 <- rule(vs %in% c(0, 1), allow_na = TRUE)
# combine rules with rules
rule_1 + rule_2
#> <Verification Ruleset with 2 elements>
#> [1] 'Rule for: mpg' matching `mpg > 10 & mpg < 30` (allow_na: FALSE)
#> [2] 'Rule for: cyl' matching `cyl %in% c(4, 8)` (allow_na: FALSE)
# combine rulesets with rules
ruleset(rule_1, rule_2) + rule_3
#> <Verification Ruleset with 3 elements>
#> [1] 'Rule for: mpg' matching `mpg > 10 & mpg < 30` (allow_na: FALSE)
#> [2] 'Rule for: cyl' matching `cyl %in% c(4, 8)` (allow_na: FALSE)
#> [3] 'Rule for: vs' matching `vs %in% c(0, 1)` (allow_na: TRUE)
# combine rules with rulesets
rule_1 + ruleset(rule_2, rule_3)
#> <Verification Ruleset with 3 elements>
#> [1] 'Rule for: mpg' matching `mpg > 10 & mpg < 30` (allow_na: FALSE)
#> [2] 'Rule for: cyl' matching `cyl %in% c(4, 8)` (allow_na: FALSE)
#> [3] 'Rule for: vs' matching `vs %in% c(0, 1)` (allow_na: TRUE)
# rulesets with rulesets
ruleset(rule_1) + ruleset(rule_2, rule_3)
#> <Verification Ruleset with 3 elements>
#> [1] 'Rule for: mpg' matching `mpg > 10 & mpg < 30` (allow_na: FALSE)
#> [2] 'Rule for: cyl' matching `cyl %in% c(4, 8)` (allow_na: FALSE)
#> [3] 'Rule for: vs' matching `vs %in% c(0, 1)` (allow_na: TRUE)
# combine a list of rulsets and rules with `bind_rules`
bind_rules(list(rule_1, rule_2, ruleset(rule_3)))
#> <Verification Ruleset with 3 elements>
#> [1] 'Rule for: mpg' matching `mpg > 10 & mpg < 30` (allow_na: FALSE)
#> [2] 'Rule for: cyl' matching `cyl %in% c(4, 8)` (allow_na: FALSE)
#> [3] 'Rule for: vs' matching `vs %in% c(0, 1)` (allow_na: TRUE) Created on 2023-10-16 with reprex v2.0.2 If you think this is a promising direction, I would love to spruce up the documentation, and can add some examples. I've also gone ahead and added some tests to verify that the |
R/ruleset_construction.R
Outdated
# [double dispatch](https://yutani.rbind.io/post/double-dispatch-of-s3-method/) | ||
# semantics. | ||
|
||
if (class(a) == "ruleset" & class(b)=="ruleset") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about something like this?
if (class(a) != "ruleset") a <- list(a)
if (class(b) != "ruleset") b <- list(b)
out <- list(a, b)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work in the case that both a and b are rules, but I like the suggestion - this could have been more concise.
This should work:
if (class(a) == "rule" & class(b) == "rule") {
out <- list(a,b)
} else {
if (class(a) == "rule") a <- list(a)
if (class(b) == "rule") b <- list(b)
out <- c(a, b)
}
rule_1 <- rule(mpg >10) | ||
rule_2 <- rule(hp >10) | ||
expect_equal( | ||
rule_1 + rule_2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reindent the lines? (If you are using RStudio: CTRL + A then CTRL + I)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing! Assuming we want what's inside the tests indented another time.
I have added two comments to your code. I like where this is going, the only thing that I think is missing are duplications in rules. |
Just made both of those changes - thanks for taking a look at them. How to handle duplicates is a good question. I think that keeping or removing duplicates makes sense as long as the documentation is very explicit about it. I have just made a change to remove duplicates when adding two sets together. This makes the behavior consistent with what a user might expect when taking the union of two sets - duplicates will automatically be removed, because sets don't contain duplicates. |
Looks good to me. I'll run the tests and then merge the PR. |
There is one error in the automated tests |
Thanks for flagging this. I think the github runner is set to be more strict than the one on my PC. Just fixed the warning and the note it raised. Edit: I think the issue was I was using devtools::test() not devtools::check() because I did not have duckdb installed on the machine I am using. |
Sorry, realized there was another issue. Just tested with devtools::check_man() and devtools::check() and it all looks good now. |
Hi there, thanks for writing such a fantastic package!
I'm writing to ask if you can implement the
+
operator to combine several rulesets. This would allow users to combine several rulesets programatically and concisely.I often find I would like to do this if I have sets of rules to check several smaller datasets which then get combined into a larger dataset . I would like to be able to easily define checks on the smaller datasets and then combine them all and run them on the larger datasets using the
+
operator.Here's a quick example on how this could work.
Threw this together in about 10 mins, so I am ofc happy to go back and forth and make iterations until it's something you are happy with, if you think this is a good direction to go in.
Best,
Ben