Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a Collaborative Regex Engine #226

Closed
4 tasks
Tracked by #127
teolemon opened this issue Jul 26, 2020 · 0 comments
Closed
4 tasks
Tracked by #127

Create a Collaborative Regex Engine #226

teolemon opened this issue Jul 26, 2020 · 0 comments

Comments

@teolemon
Copy link
Member

teolemon commented Jul 26, 2020

Why

  • Managing the variability of formulations is no longer possible at our scale (languages, variations) without detonating our current detectors.
    • 16.000 labels (out of 2000 explicitly known)
    • 8.800 origins of ingredients (150 countries and their cities / regions)
    • 33.000 categories (4000 explicitly known)
    • 3000 types, recyclability and packaging materials
  • Need to distribute rule making to more people

What

Solution: Create an extraction rules engine

  • Store all the OCRs in an ElasticSearch engine
  • Create a system to store “IF this THEN that" rules in a database (with CRUD operations)
    • Triggers
      • OCR keyword
      • A combination of several triggers (OCR keyword 1 + 2, OCR Keyword + existing value…)
    • Output: key/value: “Origins”: “France” (2 rules if several key/values, or the possibility to have several key/values for a single trigger)
    • Status: Enabled/Disabled
    • User: Creator of the rule
  • Create a system to apply the rules in the future on incoming photos
  • Create a system to apply the rule to all or part of the existing OCR / Images (list with checkboxes)
  • Create an authentication system with differentiated privileges
    • lambda users: propose rules (not activated)
    • moderators: create or activate proposed rules
    • superadmin
  • Create a table with the history of the modifications applied via the rules (optional, we can use the existing insight system for this)
  • Create a system to suggest values to search from an already available ngrams file
  • Correlate recurring textual patterns with (non validated predictions AND/OR values already in the Open Food Facts database)

Docs

Examples

Part of

@teolemon teolemon changed the title Extract Origins in the EU Organic logo Location brainstorm Jul 26, 2020
@teolemon teolemon changed the title Location brainstorm [WIP] Location brainstorm Jul 26, 2020
@teolemon teolemon changed the title [WIP] Location brainstorm Create a Collaborative Regex Engine Aug 25, 2021
@teolemon teolemon moved this to To discuss and validate in 🤖 Artificial Intelligence @ Open Food Facts Sep 20, 2022
@openfoodfacts openfoodfacts locked and limited conversation to collaborators Aug 29, 2023
@raphael0202 raphael0202 converted this issue into discussion #1220 Aug 29, 2023
@github-project-automation github-project-automation bot moved this from To discuss and validate to Done in 🤖 Artificial Intelligence @ Open Food Facts Aug 29, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Development

No branches or pull requests

1 participant