Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add specific thresholds for detecting quality errors in vitamins and other components [quality data] #11083

Open
jusdekiwi opened this issue Dec 2, 2024 · 4 comments
Labels
🧽 Data quality https://wiki.openfoodfacts.org/Quality Food supplements

Comments

@jusdekiwi
Copy link

Problem

The current threshold for triggering a quality error for a vitamin's amount is 105g per 100g. I think it is not precise enough and we can do much better!
Image
On the screenshot below, only 5 values raise an error instead of all of them since the unit is wrong. Here the problem comes from the units, but it could come from a typo, like it often does in the nutritional facts errors.
Image

Proposed solution

I suggest we establish a specific threshold for each vitamin so we can detect new quality errors. I've extracted the max values for each vitamin so we can set a threshold value above the max known value.

Image
Retrievable data I've extracted from ciqual and usda (the proposed threshold here is 4 times greater that the maximal known value): nutrient max values.ods

Expected outcome

Many new errors would be raised but we would be able to fix the products' data more easily and then improve the overall quality of the database :)

Note: I've arbitrarily chosen a factor of 4 for the example but we should discuss together which value would be the best.

@aleene
Copy link
Contributor

aleene commented Dec 2, 2024

We should exclude the supplements category for this.

I rather start with a low value and increase if needed. I assume this will be added to the vitamins/minerals taxonomy

@CharlesNepote CharlesNepote added the 🧽 Data quality https://wiki.openfoodfacts.org/Quality label Dec 4, 2024
@CharlesNepote CharlesNepote changed the title Add specific threshols for detecting quality errors in vitamins and other components [quality data] Add specific thresholds for detecting quality errors in vitamins and other components [quality data] Dec 4, 2024
@CharlesNepote
Copy link
Member

This is very interesting!

I have investigated this a little bit and here are my first comments.

1. Raw products vs the rest

The thresholds you have mentioned works with raw products.

But the majority of food products are processed and many products have added vitamins or micronutrients at higher levels than your thresholds. I don't know if we identify all the raw products at once: is there a specific category for them?

2. The quest to clear outliers

If the product is not vitamins or micronutrients, it should never contain more than XXg of a vitamin or a micronutrient, even
if there is added vitamins/nutrients.

I would start with a very important value, such as 20 g / 100 g. As of 2024-12-13, a request on Mirabelle counts 542 products like this that don't have already a data quality error.

3. Level of "errorness"

There are data quality errors which have bigger impacts than others. A wrong value for the fats can lead to a bad Nutri-Score or other kind of bad evaluation in terms of nutrition.

Issues with vitamins or micronutrients seems to be less important IMHO, but it might exist some case where they are: any opinion?

Should we create a new level of "minor data quality errors"?

@aleene
Copy link
Contributor

aleene commented Dec 13, 2024

Do you have raw food examples?

I agree it is a minor error. Not sure about an error classification. More an issue for the daily mail: what to include and what not

@aleene
Copy link
Contributor

aleene commented Dec 14, 2024

I suggest to start with a threshold for which we are certain they are errors. Then we can lower the threshold to a point where we start to get false positives. Some (how many?) false positives are acceptable (what to do with those? ignore generated error in some way?) as long as we can detect errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧽 Data quality https://wiki.openfoodfacts.org/Quality Food supplements
Projects
Status: To discuss and validate
Status: To do
Development

No branches or pull requests

4 participants