Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you have a definition of metadata for TADA? #550

Open
cristinamullin opened this issue Dec 3, 2024 · 0 comments
Open

Do you have a definition of metadata for TADA? #550

cristinamullin opened this issue Dec 3, 2024 · 0 comments
Labels
documentation Future Improvement Minimum viable function complete, issue includes potential future improvements

Comments

@cristinamullin
Copy link
Collaborator

cristinamullin commented Dec 3, 2024

Do you have a definition of metadata for TADA?

Draft answer:
We don't have a definition specifically for TADA, but metadata is any information that helps describe the data (that is, the original and TADA result values). This would include all the additional columns that originally come out in WQX/WQP profiles and also the additional added TADA columns that describe how the TADA version of the result value might have changed throughout the process.

There are different kinds of metadata that describe the result value available, for example there are a series of columns relevant to describing the monitoring location, detection limits, project, monitoring activity, and more. This metadata is all required or optional as defined by the WQX schema. TADA only differs in that it creates TADA versions of each of the original WQX columns that can be edited through the process (for example to handle detection limit information, convert units, handle synonyms, etc.). The TADA versions of the columns should be considered the final versions eventually used for analyses. The original columns are retained in the data frame by default, but we have a function that can help remove the originals if desired (and all has been reviewed & finalized) to reduce the number of columns before moving onto the analysis step.

Issue
This may be confusing to package users. Currently, this information is not included in the package documentation. This question does not apply to a specific function, but how the TADA R package is designed/works more generally. This information should be easy for users to find. Consider adding this and other information about the recommended package workflow to the Read Me.

Details
WQX/WQP does have definitions for each column (aka each metadata element) in the schema. For TADA, we don't have definitions documented for each new column (metadata element) that TADA adds in a single spreadsheet that would be accessible to the TADA shiny app users. The definitions are only included in the package documentation which works for package users but then that level of detail is not as easily accessible for R Shiny app users.

It would be helpful to add BOTH the WQX/WQP and TADA column definitions to the TADA Shiny app excel file output. This should also be made available to package users.

Potential solution
Create a reference file with all TADA profile columns (both WQX/WQP and TADA versions) and include definitions for each. This will help users understand the difference between the original and TADA versions. A challenge with this is that the TADA definitions for the TADA versions of the columns may change over time as the data is edited. For example, the TADA version of the result value changes multiple times in the TADA workflow (unit conversion, copying of detection limit value over to result value where needed, taking half the detection limit) - should the definition be updated when each individual function runs to reflect what TADA has done to alter the value? Same questions for characteristic, fraction and speciation. They are first capitalized (autoclean function), then synonyms are handled (harmonization function). In these cases, would the definition need to be a row added to the TADA profile at the start of the workflow that gets updated each time a function is run that impacts a columns value?

@cristinamullin cristinamullin added Future Improvement Minimum viable function complete, issue includes potential future improvements documentation labels Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Future Improvement Minimum viable function complete, issue includes potential future improvements
Projects
None yet
Development

No branches or pull requests

1 participant