-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: ✨ add include_gld_purchases()
#138
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I added a couple of comments to handle a few details.
dplyr::rename(date = eksd) |> | ||
dplyr::relocate(atc, .after = date) | ||
|
||
test_that("dataset needs expected variables", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The algorithm needs either the "name" or "vnr" variable to work, so we should add this to the tests and the inclusion function itself (e.g. check for any of the variables in the input data and use the one that is available (and use "name" by default if both are available).
R/include-gld-purchases.R
Outdated
"name", | ||
"vnr" | ||
) |> | ||
# TODO: Need to add this column? We did for hba1c. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For GLD we don't need to deduplicate multiple purchases on the same date. They should be valid.
column_names_to_lower() |> | ||
# Use !! to inject the expression into filter. | ||
dplyr::filter(!!criteria) |> | ||
# Keep only the columns we need. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the comment on the test below: the function only needs either of "name" or "vnr" to work, so we should identify which one is present in the input and use that one (and a default if both are present, probably "name"). The exclude_wld_purchases() function that does the actual censoring/exclusion of weightloss drugs based on "name" or "vnr" also needs to assert which one is passed to it and handle it accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, could you explain, why does it need one or the other? Can we decide which we want and require that? Does LMDB not always have these variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it needs one or the other, and I think both are equally accessible in LMDB, and it only depends on if the user has requested the variables. E.g. In my PhD, I had "name", but not "vnr", but on DARTER, we have "vnr", but not "name".
If we had to choose, "name" is much easier to maintain than "vnr" (vnr numbers get added over the years when packagings change, but brand names stay the same).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just require they have name
then? If they don't have it, to request it, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea, but if we do that, it won't work on DARTER (and I'll get asked to fix it). So, let's go with "vnr" only for now. I'll look into if we still need to include any of the variables: they're only used to discern GLP1-RAs, and since we no longer use the most popular GLP1-RA (semaglutid) for inclusion (since even the non-weightloss brand names are commonly used for off-label treatment of people without diabetes), it might be feasible to just not use any of the GLP1-RAs for inclusion, in which case we won't need to discern between weightloss vs. diabetes brand names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but Anders' comments will need to be addressed before this can be merged :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some comments to @Aastedet.
column_names_to_lower() |> | ||
# Use !! to inject the expression into filter. | ||
dplyr::filter(!!criteria) |> | ||
# Keep only the columns we need. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, could you explain, why does it need one or the other? Can we decide which we want and require that? Does LMDB not always have these variables?
Co-authored-by: Luke W. Johnston <[email protected]>
Co-authored-by: Signe Kirk Brødbæk <[email protected]>
/document |
Closes #92.
Adding the
include_gld_purchases()
function. For adding the logic to thealgorithm
dataframe, I used the symbol=~
to indicate it is a regex, since that's how it was done in Perl and it's the only other symbol I could find to represent regexes.Several files were updated, but mainly because I re-ran the targets pipeline☺️
Please, focus mainly on the tests and the function itself 🥳