Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement outlier detection in RCF #348

Open
baslia opened this issue Sep 1, 2024 · 3 comments
Open

Implement outlier detection in RCF #348

baslia opened this issue Sep 1, 2024 · 3 comments

Comments

@baslia
Copy link
Collaborator

baslia commented Sep 1, 2024

Problem

Currently, (Random Cut Forest) RCF is not used for outlier detection. This significantly limits its capabilities in identifying and analyzing data points that deviate significantly from the overall pattern, potentially leading to inaccurate conclusions and missed insights.

Proposed solution

Introduce outlier detection capabilities to RCF, enabling users to identify and analyze data points that fall outside the expected range for their respective categories or nutri-score labels. This could be achieved by implementing the following functionalities:

  • Ability to specify categorical and numerical features for outlier detection. This allows users to identify outliers within specific categories, such as finding outlier nutrient values for different food types or nutri-score levels.
  • Option to apply different outlier detection methods for each category. This provides flexibility to tailor the analysis to the specific characteristics of each group, ensuring accurate outlier identification.
  • Visualization options to represent outliers within each category. This could include scatter plots, boxplots, or other appropriate visualizations that clearly show the distribution of data and highlight outliers in each category.

Additional context

Outlier detection plays a crucial role in data analysis, enabling researchers to identify data points that might be erroneous, fraudulent, or indicative of unique patterns. By incorporating outlier detection capabilities, RCF would become a more comprehensive and versatile tool for analyzing nutritional data, providing deeper insights into dietary patterns and potential health implications.

Mockups

A dropdown menu or checkbox option to select a categorical feature alongside the numerical feature for outlier detection.
A visualization panel displaying scatter plots or boxplots for each category, highlighting outlier data points.

Part of

Implement outlier detection in RCF

@raphael0202
Copy link
Contributor

Implementing outlier detection to detect vandalism or incorrect data values would definitely be a great addition! Is it something you're interested on working on?

@teolemon
Copy link
Member

@baslia

@baslia
Copy link
Collaborator Author

baslia commented Sep 11, 2024

Hey @teolemon @raphael0202 , I would be interested to work on the mentioned topic, I commented about that last week: #346 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To do
Development

No branches or pull requests

3 participants