Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flagging logic for missing and unknown data in QARTOD spike test #129

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Sakshamgupta90
Copy link
Contributor

Related to issue #106
As per the observations:

1. Diff results:
Here are the diff results calculated using the average method:

[35.54251562117366 -- -- 0.0009809673153213794 0.0004743669231928038
 0.0022776381135756196 -- -- -- 4.972008905212988e-05 0.007755656380723508
 0.008936353381891138 0.001787683444078425]

2. Flags Assigned:
The flags based on these diff results are:
[2 9 9 1 1 1 9 9 9 1 1 1 1]
It appears that the flags are assigned correctly.

3. Code for Examining Flags:
examining the spike test flag arrays

wf = pd.DataFrame(x, flag_arr, ['salinity'])
print('right column KEYS: ', 'Missing', QartodFlags.MISSING, 'UNKNOWN', QartodFlags.UNKNOWN, '\n')
print(wf)

4. Data and Flags Analysis:
original dataset x

[35.54251562117366 -- 35.5424538572037 35.54400000534482 35.54750808811659
 35.55196490473474 35.55186644512574 -- 35.550779557924486
 35.550198525366476 35.54951805263036 35.5643488926557 35.56130702591726]
Index 0 : The diff value is valid and meets the criteria for flag 2.
Index 1 : Marked as missing (9), as expected.
Index 2 : Shows a flag of 9 because the diff is masked, indicating missing data.

The flags appear to be correctly applied based on the diff results and thresholds.

So as discussed, another check that implemented:

compare the values of `diff` and the original dataset `X`. And if `diff has a `nan` value, but on the same index the value is not `nan` in `x` dataset, in this case, flag will be UNKNOWN.
And if both the values are masked, then MISSING flag will be assigned.

@ocefpaf @leilabbb

@ocefpaf
Copy link
Member

ocefpaf commented Aug 13, 2024

@Sakshamgupta90 can you come up with a small and reproducible code example and data so we can test this? The values you posted above are from Leila's notebook, right? I guess I'm missing the context here.

@Sakshamgupta90
Copy link
Contributor Author

@Sakshamgupta90 can you come up with a small and reproducible code example and data so we can test this?

@ocefpaf can you elaborate this please, I didn't understand.

The values you posted above are from Leila's notebook, right? I guess I'm missing the context here.

Yes, the values are from the Leila's notebook.

@ocefpaf
Copy link
Member

ocefpaf commented Aug 14, 2024

@Sakshamgupta90 in order for us to review this we need more context. Your message above has diffs and flags but we don't know the inputs. If you can, try to create a notebook with the smallest dataset possible to reproduce that. You may start with Leila's notebook, and add it here. It is confusing to chase those code snippets in other PRs and comments to check what you did.

@Sakshamgupta90
Copy link
Contributor Author

@Sakshamgupta90 in order for us to review this we need more context. Your message above has diffs and flags but we don't know the inputs. If you can, try to create a notebook with the smallest dataset possible to reproduce that. You may start with Leila's notebook, and add it here. It is confusing to chase those code snippets in other PRs and comments to check what you did.

Hi @ocefpaf, I have uploaded a notebook on gist for your review.
https://gist.github.com/Sakshamgupta90/a867ec5727be04598f0fd8516d0b52ec


# Check if either inp or diff is masked
elif inp.mask[i] or diff.mask[i]:
flag_arr[i] = QartodFlags.UNKNOWN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there, I was looped into the discussion in issue 106, and @ocefpaf recommended I take a look at this PR if I had any thoughts.

This looks good!

Even though the resulting flag_arr is the same, I would think we'd want the conditionals to look like this, no?

       # Check if inp is masked (original data missing)
        if inp.mask[i]:
            flag_arr[i] = QartodFlags.MISSING
        
       # Check if diff is masked but not in inp (this indicates that original data is not missing, 
       # but the data point got masked in diff lines 575-580 due to trying to calculate a value 
       # using a valid value and a missing value; and because of that, we are not able to apply QARTOD
       # thus the UNKNOWN flag)
        elif (diff.mask[i] and not inp.mask[i]):
            flag_arr[i] = QartodFlags.UNKNOWN

@Sakshamgupta90

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the corrected behavior to what is expected, we will also want to change the expected list in test_qartod.QartodSpikeTest.test_spike_masked() (line ~1021) to this:

expected = [2, 4, 4, 4, 1, 3, 1, 2, 9, 2, 4, 4, 2, 9]

Previously, it was applying the MISSING FLAG 9 to the values surrounding actual missing values. Your suggested change fixes this and we should see UNKNOWN FLAG 2 surrounding missing values instead

Copy link
Contributor

@iwensu0313 iwensu0313 Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(similar updates need to be made to the expected outputs in test_spike_methods() and test_spike_test_inputs(). after which I believe all the tests should pass - I was able to pass them locally w/ the minor updates)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @iwensu0313 ,
I completely agree with you. The conditionals should indeed be structured as you've suggested to accurately reflect the intended behavior.
Additionally, updating the expected results list is necessary to ensure the test remains valid.

Thank you for pointing out the required changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants