-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Curation of records, data, processes from original CARS #20
Comments
See here for data locations and code information from Jeff: |
Good chat today @BecCowley & @ChrisC28 |
I've made a first pass at producing the "CODA" form of the output from the WOD. The data is contained in yearly directories. Each directory includes daily files with the naming conventions: CODA_WOD_<platform_type>.nc platform_type = ctd, pfl, xbt, etc.... So for each variable and each platform type and variable, there are 365 or 366 files. The files themselves are two dimension (cast, depth_index). Each variable includes the data, the depth levels (WOD data is on depth and NOT pressure it seems) and the WOD flags. There are a few quirks that I'm working through that could make the data a little easier to read and deal with. For example, the length of the depth dimension varies from file to file, which isn't optimal for reading the data. Exactly what data/metadata to carry through is also something we should all discuss. |
The first pass of the data is here: /oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/WOD |
@ChrisC28, looking at the pressure files - is the Pressure_depth variable meant to be the pressure converted to depth? Looks erroneous (looking at one of the CTD files). |
I've pushed the example notebook to the main branch |
@BecCowley The "Pressure_depth" variable is simply the pressure as read in the WOD data on the depth levels. I treat pressure as any other variable. I did notice some strangeness myself. Could you let me know which profile you looked at? |
Let's try this again. Using the wodpync module, I've created some test CODA files. Not all the meta-date is there as I had some boring problems processing strings that I still haven't worked out. Additionally, it seems like not all metadata is carried through in WOD (for example, Salinity doesn't have units). You can find the test dataset on tube: /oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/CODA_test It's currently CTD only, although I've run some tests on XBT and profiling floats without issues. |
@ChrisC28 I had a quick look at the files. Certainly there needs to be some transfer of variable attributes, fixes to fill values etc. I did some very basic plotting using the WOD flags and temperature and salinity. There are some strange out of range numbers in the WOD_flag variable (-127) for the one file I looked at. The data itself looks reasonable. Happy to work on tidying up when back next year! |
@ChrisC28 accession_number |
NOTE: wodpy uses masked arrays, which are extremely slow. I've found places where I think masked arrays can be replaced by regular arrays. I've running soe tests now. Should hopefully speed things up. |
I've now modified WODpy to make use of standard numpy arrays rather than masked arrays. I'm still checking things to make sure that what I've done is sensible, but it speeds things up by an order of magnitude. |
Hi @BecCowley The files have the metadata above that should hopefully help the duplicate checker work. The -127 WODFlag values still appear and I've traced these back to the original data files. Could have you run your sceptical eye over these files and let me know if they are fit for purpose? I can now regenerate the files quickly, so fixes should be pretty easy to implement |
@ChrisC28 Some comments:
I will do some testing on the data itself and let you know what I find. |
@ChrisC28 some comments/queries on the data:
|
I'm moving the WOD_2018 over to: |
I've pushed my changes to wodpy to my fork here: https://github.com/ChrisC28/wodpy Being lazy, the original code is left in the file but commented out |
Fixed issue:
Working through the remainder of @BecCowley 's list |
Found the issue - wodpync creates a datetime output based on the "date" variable and the "GMT_time" variable. The later is occasionally missing. Wodpy tests for this, but when changing from masked to regular python arrays, the test failed. I've reverse engineered the test with a bit of a hack but it seems to catch those cases.... when GMT time is missing, it takes the time as midnight (as in the original wodpy). |
Hi @BecCowley A new batch of CODA files to check. I've placed them here
were |
@ChrisC28 the WOD_unique id contains latitude, needs correcting as discussed. |
Fixed... regenerating the WOD derived CODA files |
Added the first batch of MNF -> CODA files:
Note that, as discussed, I haven't included a lot of the meta-data (things like COUNTRY, etc....). Probably worth discussing what we need for the QC/duplicate checking and making sure that we include what's required. Next step: repeat with the AIMS data. |
@ChrisC28
We will absolutely need the following information for XBT files:
For profiling floats and glider files (when we get there):
The WOD code tables are available here and we should use these values if possible https://www.ncei.noaa.gov/access/world-ocean-database/wod-codes.html Can you finagle that? Happy to help. Also still need the CODA WOD files updated to fix the wod_unique_id issue. |
I've fixed a few bugs and placed the newly created files in a new directory: |
Another one for your to-do list: Couple of things to note:
Please have a quick squizz when you get a moment. |
@ChrisC28 the attributes in the variables for the AIMS files have an issue (there is a long string in there - ncdump it to see). |
Hi @BecCowley Path for the new test dataset is: I'll push the converters next week. Would be good to try to refactor the code into a common set of functions (the MNF and RAN code is very similiar). Note: I found that nutrient etc.... profiles are actually in the ocean station files and not, as I suspected, in the ctd files in WOD (there are also a few in the profiling float data). I had been ignoring ocean stations, but turns out that they are important. |
@ChrisC28, @Thomas-Moore-Creative, I see the WOD files are only there from 2000 to 2017. I think Thomas was going to download the latest WOD from 2000 to now - has this been done yet and can we then complete the conversion? |
It has not, apologies. I'll start this now. |
I'll do this over in #19 |
Hi all, In my haste to get this out on a Friday evening, I neglected to mention that I've downloaded WOD2018 from the OpenDAP server. It turned out to be very easy (took me less than 30 minutes): |
@BecCowley - given the above diligence from @ChrisC28 I assume that is as much as we can grab for now from WOD? .... I note it goes up to 2022 |
Have a look at the notebook here: I've only downloaded a subset of WOD2018. However, it might be worth having nearly the whole thing? Not sure how useful data from Captain Cook might be, but you never know.... |
Yes, looks like it's downloaded. Thanks for doing this @ChrisC28 |
@BecCowley I'm running the script now! Converting a bunch of other variables (nutrients, CO2, etc...). |
@ChrisC28 here are a list of format issues in the CODAv1 files:
|
@ChrisC28 another issue to fix - the WOD and originator flag values are float type in the WOD CODA files but double in the MNF versions. I think they should be byte types. Also, the originator flags in the WOD files are dependent on the origflagset variable which isn't carried through to the CODA files. I would suggest doing a conversion and change the *origflag to be consistent, add flag_values and flag_meanings to the *origflag variables. Then the data type can be made byte. |
@BecCowley : I've updated the MNF and RAN CODA files to fix the negative z issue. This came form using the TEOS10 package to convert from pressure to depth - TEOS10 defines z as negative below the surface. Now looking into the WOD files. |
@BecCowley |
Collect the notes and data (and perhaps some of the code) used by Jeff to create the CARS2009 product.
It would be good to understand how Jeff did:
Also to rescue the data he already collected and re use it in the new product.
Where to put this information? Locally it is available in the datalib location, in Jeff's folders.
Maybe we need to replicate it somewhere useful for the new CARS, or just make sure we can identify where the important parts are.
Also, thinking about the final format for the new product, it should match the original so users can easily slot the new product into their existing applications.
The text was updated successfully, but these errors were encountered: