Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WQX v3.0 Testing, Development and Updates #529

Open
wokenny13 opened this issue Sep 27, 2024 · 5 comments
Open

WQX v3.0 Testing, Development and Updates #529

wokenny13 opened this issue Sep 27, 2024 · 5 comments
Assignees
Labels
Future Improvement Minimum viable function complete, issue includes potential future improvements Module 1 Top Priority Usability WQP Team Discussion WQX Team Discussion

Comments

@wokenny13
Copy link
Collaborator

Is your feature request related to a problem? Please describe:
Services are still under development with WQP 3.0 beta and testing. There is interest in hearing from us re: testing if we notice issues with TADA workflows when we start our updates, especially if there are example workflows that might be useful for testing.

Describe the solution you'd like:
Pushing the changes through in TADA in relation to the WQP 3.0 format will likely be held back for at least half a year (Around March 2025). Prior to this push, there will be a need to validate any impacts to TADA workflow and processes while the dataRetrieval package from USGS are being made so that there is a smooth, quick and efficient process downstream. There is a stable CRAN as well as a developer version on GitHub, and testing should be done with the GitHub version.

Additional context:

Please see GitHub link from USGS on updates, plans and documentation on dataRetrieval function.

dataRetrieval status updates: https://doi-usgs.github.io/dataRetrieval/articles/Status.html
dataRetrieval development plan: https://doi-usgs.github.io/dataRetrieval/articles/wqx3_development_plan.html
Lee Stanish said the dev version is using the 3.0 profiles but is still very much in development

@cristinamullin
Copy link
Collaborator

cristinamullin commented Dec 6, 2024

@cefergus I added some details here from our emails for tracking:

If you run:
schema <- readr::read_csv("https://www.epa.gov/system/files/other-files/2024-07/schema_outbound_wqx3.0.csv")
you get a table that can be used to find what needs replacing. You could use that to create a function that converts the new data to the old names so you don't need to update your code. BUT, we're generally thinking over here that it would usually be better to update the code to work with the "modern" outputs.

Let's start by just testing if all our functions can run still if we use the new beta services/profiles but the old names. Eventually we’ll need to switch the code base & function outputs to use/reference the new names too. For this, it is still going to be helpful to have a function that can easily covert the columns from back and forth from the “legacy” to the new names.

Note on performance: The services have been working really well lately, but folks doing big queries (like our group) still might need to rejigger the expectations of what can come back from a single query. Let's wait to update our big data retrieval functions (automatic chunking of pulls for users if needed) after we switch to the new services.

@cristinamullin
Copy link
Collaborator

cristinamullin commented Dec 6, 2024

Here are the lines in TADA that we will need to update to use the new profiles:
See lines 282-305 in DataDiscoveryRetrieval.R
https://github.com/USEPA/EPATADA/blob/develop/R/DataDiscoveryRetrieval.R

# Retrieve all 3 profiles
  print("Downloading WQP query results. This may take some time depending upon the query size.")
  print(WQPquery)
  results.DR <- dataRetrieval::readWQPdata(WQPquery,
    dataProfile = "resultPhysChem",
    ignore_attributes = TRUE
  )
  # check if any results are available
  if ((nrow(results.DR) > 0) == FALSE) {
    print("Returning empty results dataframe: Your WQP query returned no results (no data available). Try a different query. Removing some of your query filters OR broadening your search area may help.")
    TADAprofile.clean <- results.DR
  } else {
    sites.DR <- dataRetrieval::whatWQPsites(WQPquery)
 
    projects.DR <- dataRetrieval::readWQPdata(WQPquery,
      ignore_attributes = TRUE,
      service = "Project"
    )
 
    TADAprofile <- TADA_JoinWQPProfiles(
      FullPhysChem = results.DR,
      Sites = sites.DR,
      Projects = projects.DR
    )

More from Laura D on USGS dataRetrieval:

Just to let you all know on my personal vocabulary - I usually now refer to our classic WQP calls as "legacy". At the moment, that's probably not the best term since it is still the production version of the Portal (WQP). But, we're working towards a release where the beta services on WQP become production. When that happens, the current system will be considered legacy and the new system will just be the default. I have NO idea when that might be.

I'll check on what's going on with the missing columns. Most of my tests for "legacy" don't usually specify the resultPhysChem profile (they are even more legacy-y dating back to when WQP didn't even have profiles). I'll let you know if I push up a fix on GitHub.

If you want to see how the new profiles look, you can try this:

  results.DR <- dataRetrieval::readWQPdata(WQPquery,
    service = "ResultWQX3",
    dataProfile = "basicPhysChem",
    ignore_attributes = TRUE
  )
 
sites.DR <- dataRetrieval::whatWQPsites(WQPquery,
                                                                         legacy = TRUE)

At the moment....I ran some tests and simple queries and they do seem to be completing today. There's currently no WQX3 version of the "Project" dataProfile, so that's something that you would need to wait for anyway.

@cristinamullin
Copy link
Collaborator

cristinamullin commented Dec 11, 2024

@cefergus I created a branch for these edits. See: https://github.com/USEPA/EPATADA/tree/WQX3.0betatesting

I started editing the TADA_DataRetreival function to use the new 3.0 full phys chem profile

@cefergus
Copy link
Collaborator

Pushed a function to rename WQX3.0 column names back to WQX2.0 legacy names referencing the online schema. However, we noticed that there are differences in special characters between the 2.0 names and names used in TADA. Next steps are to identify which column names need to be changed to match TADA. Once this is fixed we can test how well TADA autoclean works with the data uploaded using the new service.

@cefergus
Copy link
Collaborator

cefergus commented Jan 2, 2025

Updated TADA_RenameColumn function to rename WQX3.0 columns to legacy version and/or names used in TADA_AutoClean. Calling TADA_DataRetrieval with applyautoclean = TRUE is working.

Will test it on other queries to see if there are other misalignment issues to work through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Future Improvement Minimum viable function complete, issue includes potential future improvements Module 1 Top Priority Usability WQP Team Discussion WQX Team Discussion
Projects
None yet
Development

No branches or pull requests

3 participants