Fix flagging logic for missing and unknown data in QARTOD spike test #129

Sakshamgupta90 · 2024-08-13T04:26:05Z

Related to issue #106
As per the observations:

1. Diff results:
Here are the diff results calculated using the average method:

[35.54251562117366 -- -- 0.0009809673153213794 0.0004743669231928038
 0.0022776381135756196 -- -- -- 4.972008905212988e-05 0.007755656380723508
 0.008936353381891138 0.001787683444078425]

2. Flags Assigned:
The flags based on these diff results are:
[2 9 9 1 1 1 9 9 9 1 1 1 1]
It appears that the flags are assigned correctly.

3. Code for Examining Flags:
examining the spike test flag arrays

wf = pd.DataFrame(x, flag_arr, ['salinity'])
print('right column KEYS: ', 'Missing', QartodFlags.MISSING, 'UNKNOWN', QartodFlags.UNKNOWN, '\n')
print(wf)

4. Data and Flags Analysis:
original dataset x

[35.54251562117366 -- 35.5424538572037 35.54400000534482 35.54750808811659
 35.55196490473474 35.55186644512574 -- 35.550779557924486
 35.550198525366476 35.54951805263036 35.5643488926557 35.56130702591726]

Index 0 : The diff value is valid and meets the criteria for flag 2.
Index 1 : Marked as missing (9), as expected.
Index 2 : Shows a flag of 9 because the diff is masked, indicating missing data.

The flags appear to be correctly applied based on the diff results and thresholds.

So as discussed, another check that implemented:

compare the values of `diff` and the original dataset `X`. And if `diff has a `nan` value, but on the same index the value is not `nan` in `x` dataset, in this case, flag will be UNKNOWN.
And if both the values are masked, then MISSING flag will be assigned.

@ocefpaf @leilabbb

ocefpaf · 2024-08-13T16:52:41Z

@Sakshamgupta90 can you come up with a small and reproducible code example and data so we can test this? The values you posted above are from Leila's notebook, right? I guess I'm missing the context here.

Sakshamgupta90 · 2024-08-14T04:21:27Z

@Sakshamgupta90 can you come up with a small and reproducible code example and data so we can test this?

@ocefpaf can you elaborate this please, I didn't understand.

The values you posted above are from Leila's notebook, right? I guess I'm missing the context here.

Yes, the values are from the Leila's notebook.

ocefpaf · 2024-08-14T18:14:23Z

@Sakshamgupta90 in order for us to review this we need more context. Your message above has diffs and flags but we don't know the inputs. If you can, try to create a notebook with the smallest dataset possible to reproduce that. You may start with Leila's notebook, and add it here. It is confusing to chase those code snippets in other PRs and comments to check what you did.

Sakshamgupta90 · 2024-08-15T07:52:11Z

@Sakshamgupta90 in order for us to review this we need more context. Your message above has diffs and flags but we don't know the inputs. If you can, try to create a notebook with the smallest dataset possible to reproduce that. You may start with Leila's notebook, and add it here. It is confusing to chase those code snippets in other PRs and comments to check what you did.

Hi @ocefpaf, I have uploaded a notebook on gist for your review.
https://gist.github.com/Sakshamgupta90/a867ec5727be04598f0fd8516d0b52ec

iwensu0313 · 2024-09-03T19:28:09Z

ioos_qc/qartod.py

+
+        # Check if either inp or diff is masked
+        elif inp.mask[i] or diff.mask[i]:
+            flag_arr[i] = QartodFlags.UNKNOWN


Hi there, I was looped into the discussion in issue 106, and @ocefpaf recommended I take a look at this PR if I had any thoughts.

This looks good!

Even though the resulting flag_arr is the same, I would think we'd want the conditionals to look like this, no?

# Check if inp is masked (original data missing) if inp.mask[i]: flag_arr[i] = QartodFlags.MISSING # Check if diff is masked but not in inp (this indicates that original data is not missing, # but the data point got masked in diff lines 575-580 due to trying to calculate a value # using a valid value and a missing value; and because of that, we are not able to apply QARTOD # thus the UNKNOWN flag) elif (diff.mask[i] and not inp.mask[i]): flag_arr[i] = QartodFlags.UNKNOWN

@Sakshamgupta90

Given the corrected behavior to what is expected, we will also want to change the expected list in test_qartod.QartodSpikeTest.test_spike_masked() (line ~1021) to this:

expected = [2, 4, 4, 4, 1, 3, 1, 2, 9, 2, 4, 4, 2, 9]

Previously, it was applying the MISSING FLAG 9 to the values surrounding actual missing values. Your suggested change fixes this and we should see UNKNOWN FLAG 2 surrounding missing values instead

(similar updates need to be made to the expected outputs in test_spike_methods() and test_spike_test_inputs(). after which I believe all the tests should pass - I was able to pass them locally w/ the minor updates)

Hi @iwensu0313 ,
I completely agree with you. The conditionals should indeed be structured as you've suggested to accurately reflect the intended behavior.
Additionally, updating the expected results list is necessary to ensure the test remains valid.

Thank you for pointing out the required changes.

Fix flagging logic for missing and unknown data in QARTOD spike test

5ffa1c6

iwensu0313 mentioned this pull request Aug 13, 2024

inconsistent in data rendering and in QARTOD flags creation #106

Open

iwensu0313 reviewed Sep 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flagging logic for missing and unknown data in QARTOD spike test #129

Fix flagging logic for missing and unknown data in QARTOD spike test #129

Sakshamgupta90 commented Aug 13, 2024

ocefpaf commented Aug 13, 2024

Sakshamgupta90 commented Aug 14, 2024

ocefpaf commented Aug 14, 2024

Sakshamgupta90 commented Aug 15, 2024

iwensu0313 Sep 3, 2024

iwensu0313 Sep 3, 2024

iwensu0313 Sep 3, 2024 •

edited

Loading

Sakshamgupta90 Sep 3, 2024

Fix flagging logic for missing and unknown data in QARTOD spike test #129

Are you sure you want to change the base?

Fix flagging logic for missing and unknown data in QARTOD spike test #129

Conversation

Sakshamgupta90 commented Aug 13, 2024

ocefpaf commented Aug 13, 2024

Sakshamgupta90 commented Aug 14, 2024

ocefpaf commented Aug 14, 2024

Sakshamgupta90 commented Aug 15, 2024

iwensu0313 Sep 3, 2024

Choose a reason for hiding this comment

iwensu0313 Sep 3, 2024

Choose a reason for hiding this comment

iwensu0313 Sep 3, 2024 • edited Loading

Choose a reason for hiding this comment

Sakshamgupta90 Sep 3, 2024

Choose a reason for hiding this comment

iwensu0313 Sep 3, 2024 •

edited

Loading