Do you have a definition of metadata for TADA? #550
Labels
documentation
Future Improvement
Minimum viable function complete, issue includes potential future improvements
Do you have a definition of metadata for TADA?
Draft answer:
We don't have a definition specifically for TADA, but metadata is any information that helps describe the data (that is, the original and TADA result values). This would include all the additional columns that originally come out in WQX/WQP profiles and also the additional added TADA columns that describe how the TADA version of the result value might have changed throughout the process.
There are different kinds of metadata that describe the result value available, for example there are a series of columns relevant to describing the monitoring location, detection limits, project, monitoring activity, and more. This metadata is all required or optional as defined by the WQX schema. TADA only differs in that it creates TADA versions of each of the original WQX columns that can be edited through the process (for example to handle detection limit information, convert units, handle synonyms, etc.). The TADA versions of the columns should be considered the final versions eventually used for analyses. The original columns are retained in the data frame by default, but we have a function that can help remove the originals if desired (and all has been reviewed & finalized) to reduce the number of columns before moving onto the analysis step.
Issue
This may be confusing to package users. Currently, this information is not included in the package documentation. This question does not apply to a specific function, but how the TADA R package is designed/works more generally. This information should be easy for users to find. Consider adding this and other information about the recommended package workflow to the Read Me.
Details
WQX/WQP does have definitions for each column (aka each metadata element) in the schema. For TADA, we don't have definitions documented for each new column (metadata element) that TADA adds in a single spreadsheet that would be accessible to the TADA shiny app users. The definitions are only included in the package documentation which works for package users but then that level of detail is not as easily accessible for R Shiny app users.
It would be helpful to add BOTH the WQX/WQP and TADA column definitions to the TADA Shiny app excel file output. This should also be made available to package users.
Potential solution
Create a reference file with all TADA profile columns (both WQX/WQP and TADA versions) and include definitions for each. This will help users understand the difference between the original and TADA versions. A challenge with this is that the TADA definitions for the TADA versions of the columns may change over time as the data is edited. For example, the TADA version of the result value changes multiple times in the TADA workflow (unit conversion, copying of detection limit value over to result value where needed, taking half the detection limit) - should the definition be updated when each individual function runs to reflect what TADA has done to alter the value? Same questions for characteristic, fraction and speciation. They are first capitalized (autoclean function), then synonyms are handled (harmonization function). In these cases, would the definition need to be a row added to the TADA profile at the start of the workflow that gets updated each time a function is run that impacts a columns value?
The text was updated successfully, but these errors were encountered: