Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ✨ add include_gld_purchases() #138

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

lwjohnst86
Copy link
Member

Closes #92.

Adding the include_gld_purchases() function. For adding the logic to the algorithm dataframe, I used the symbol =~ to indicate it is a regex, since that's how it was done in Perl and it's the only other symbol I could find to represent regexes.

Several files were updated, but mainly because I re-ran the targets pipeline ☺️

Please, focus mainly on the tests and the function itself 🥳

Copy link
Collaborator

@Aastedet Aastedet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I added a couple of comments to handle a few details.

dplyr::rename(date = eksd) |>
dplyr::relocate(atc, .after = date)

test_that("dataset needs expected variables", {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm needs either the "name" or "vnr" variable to work, so we should add this to the tests and the inclusion function itself (e.g. check for any of the variables in the input data and use the one that is available (and use "name" by default if both are available).

"name",
"vnr"
) |>
# TODO: Need to add this column? We did for hba1c.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For GLD we don't need to deduplicate multiple purchases on the same date. They should be valid.

R/include-hba1c.R Show resolved Hide resolved
column_names_to_lower() |>
# Use !! to inject the expression into filter.
dplyr::filter(!!criteria) |>
# Keep only the columns we need.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the comment on the test below: the function only needs either of "name" or "vnr" to work, so we should identify which one is present in the input and use that one (and a default if both are present, probably "name"). The exclude_wld_purchases() function that does the actual censoring/exclusion of weightloss drugs based on "name" or "vnr" also needs to assert which one is passed to it and handle it accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, could you explain, why does it need one or the other? Can we decide which we want and require that? Does LMDB not always have these variables?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it needs one or the other, and I think both are equally accessible in LMDB, and it only depends on if the user has requested the variables. E.g. In my PhD, I had "name", but not "vnr", but on DARTER, we have "vnr", but not "name".
If we had to choose, "name" is much easier to maintain than "vnr" (vnr numbers get added over the years when packagings change, but brand names stay the same).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just require they have name then? If they don't have it, to request it, what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, but if we do that, it won't work on DARTER (and I'll get asked to fix it). So, let's go with "vnr" only for now. I'll look into if we still need to include any of the variables: they're only used to discern GLP1-RAs, and since we no longer use the most popular GLP1-RA (semaglutid) for inclusion (since even the non-weightloss brand names are commonly used for off-label treatment of people without diabetes), it might be feasible to just not use any of the GLP1-RAs for inclusion, in which case we won't need to discern between weightloss vs. diabetes brand names.

Copy link
Contributor

@signekb signekb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but Anders' comments will need to be addressed before this can be merged :)

Copy link
Member Author

@lwjohnst86 lwjohnst86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments to @Aastedet.

column_names_to_lower() |>
# Use !! to inject the expression into filter.
dplyr::filter(!!criteria) |>
# Keep only the columns we need.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, could you explain, why does it need one or the other? Can we decide which we want and require that? Does LMDB not always have these variables?

R/include-gld-purchases.R Outdated Show resolved Hide resolved
R/include-gld-purchases.R Outdated Show resolved Hide resolved
R/include-gld-purchases.R Outdated Show resolved Hide resolved
@lwjohnst86
Copy link
Member Author

/document

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

Create function include_gld_purchases()
3 participants