-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flagging logic for missing and unknown data in QARTOD spike test #129
base: main
Are you sure you want to change the base?
Conversation
@Sakshamgupta90 can you come up with a small and reproducible code example and data so we can test this? The values you posted above are from Leila's notebook, right? I guess I'm missing the context here. |
@ocefpaf can you elaborate this please, I didn't understand.
Yes, the values are from the Leila's notebook. |
@Sakshamgupta90 in order for us to review this we need more context. Your message above has diffs and flags but we don't know the inputs. If you can, try to create a notebook with the smallest dataset possible to reproduce that. You may start with Leila's notebook, and add it here. It is confusing to chase those code snippets in other PRs and comments to check what you did. |
Hi @ocefpaf, I have uploaded a notebook on gist for your review. |
|
||
# Check if either inp or diff is masked | ||
elif inp.mask[i] or diff.mask[i]: | ||
flag_arr[i] = QartodFlags.UNKNOWN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there, I was looped into the discussion in issue 106, and @ocefpaf recommended I take a look at this PR if I had any thoughts.
This looks good!
Even though the resulting flag_arr
is the same, I would think we'd want the conditionals to look like this, no?
# Check if inp is masked (original data missing)
if inp.mask[i]:
flag_arr[i] = QartodFlags.MISSING
# Check if diff is masked but not in inp (this indicates that original data is not missing,
# but the data point got masked in diff lines 575-580 due to trying to calculate a value
# using a valid value and a missing value; and because of that, we are not able to apply QARTOD
# thus the UNKNOWN flag)
elif (diff.mask[i] and not inp.mask[i]):
flag_arr[i] = QartodFlags.UNKNOWN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the corrected behavior to what is expected, we will also want to change the expected
list in test_qartod.QartodSpikeTest.test_spike_masked()
(line ~1021) to this:
expected = [2, 4, 4, 4, 1, 3, 1, 2, 9, 2, 4, 4, 2, 9]
Previously, it was applying the MISSING FLAG 9 to the values surrounding actual missing values. Your suggested change fixes this and we should see UNKNOWN FLAG 2 surrounding missing values instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(similar updates need to be made to the expected outputs in test_spike_methods()
and test_spike_test_inputs()
. after which I believe all the tests should pass - I was able to pass them locally w/ the minor updates)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @iwensu0313 ,
I completely agree with you. The conditionals should indeed be structured as you've suggested to accurately reflect the intended behavior.
Additionally, updating the expected
results list is necessary to ensure the test remains valid.
Thank you for pointing out the required changes.
Related to issue #106
As per the observations:
1. Diff results:
Here are the diff results calculated using the average method:
2. Flags Assigned:
The flags based on these diff results are:
[2 9 9 1 1 1 9 9 9 1 1 1 1]
It appears that the flags are assigned correctly.
3. Code for Examining Flags:
examining the spike test flag arrays
4. Data and Flags Analysis:
original dataset x
The flags appear to be correctly applied based on the diff results and thresholds.
So as discussed, another check that implemented:
@ocefpaf @leilabbb