CMI | Actigraphy data EDA

This notebook explores working with actigraphy time series recordings to get an idea of the data at hand.

Investigate data quality (e.g., time gaps, non-wear periods, battery issues)
Calculate main statistics for all participants
Derive meaningful insights for feature engineering (circadian rhythms, time-based activity trends, activity levels, etc.)

CMI | Features EDA

Understanding the task

The aim of this competition is to predict the Severity Impairment Index (sii), which measures the level of problematic internet use among children and adolescents, based on physical activity data and other features. sii is derived from PCIAT-PCIAT_Total, the sum of scores from the Parent-Child Internet Addiction Test (PCIAT: 20 questions, scored 0-5).

Target Variable (sii) is defined as:

0: None (PCIAT-PCIAT_Total from 0 to 30)
1: Mild (PCIAT-PCIAT_Total from 31 to 49)
2: Moderate (PCIAT-PCIAT_Total from 50 to 79)
3: Severe (PCIAT-PCIAT_Total 80 and more)

This makes sii an ordinal categorical variable with four levels, where the order of categories is meaningful.

Type of Machine Learning Problem we can use with sii as a target:

Ordinal classification (ordinal logistic regression, models with custom ordinal loss functions)
Multiclass classification (treat sii as a nominal categorical variable without considering the order)
Regression (ignore the discrete nature of categories and treat sii as a continuous variable, then round prediction)
Custom (e.g. loss functions that penalize errors based on the distance between categories)

We can also use PCIAT-PCIAT_Total as a continuous target variable, and implement regression on PCIAT-PCIAT_Total and then map predictions to sii categories.

Finally, another strategy involves predicting responses to each question of the Parent-Child Internet Addiction Test: i.e. pedict individual question scores as separate targets, sum the predicted scores to get the PCIAT-PCIAT_Total and map predictions to the corresponding sii category.

CMI | EDA which makes sense

a first analysis of the data
how to cross-validate a model
that regression models are better than classification models in this competition, and
how to tune the thresholds for rounding the regression output.
The notebook uses polars DataFrames. If you are more fluent with pandas than with polars, this is an opportunity to get to know polars, which is often more efficient than pandas.

CMI | Best Single Model

CMI | 1st Place Solution

voted ensemble consisting of: (improving the robustness)

LGBMRegressor
TwoXGBoostRegressors
CatBoostRegressor
ExtraTreesRegressor

Honorable Mentions 🌟

🔸 CMI | LSTM+VAE

🔸 CMI | TabNet

🔸 CMI | Advanced Feature Engineering

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CMI | Actigraphy data EDA

CMI | Features EDA

Understanding the task

CMI | EDA which makes sense

CMI | Best Single Model

CMI | 1st Place Solution

Honorable Mentions 🌟

About

Releases

Packages

TorrentBrave/Zero2Hero-Plan-Child-Mind-Institute-Problematic-Internet-Use

Folders and files

Latest commit

History

Repository files navigation

CMI | Actigraphy data EDA

CMI | Features EDA

Understanding the task

CMI | EDA which makes sense

CMI | Best Single Model

CMI | 1st Place Solution

Honorable Mentions 🌟

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages