-
Notifications
You must be signed in to change notification settings - Fork 200
Add simple DatetimeEncoder example with periodic encoding (#1629) #1834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -352,3 +352,78 @@ | |
| # features from a datetime column. | ||
| # Also check out the |TableVectorizer|, which automatically recognizes | ||
| # and transforms datetime columns by default. | ||
|
|
||
| """ | ||
| Simple example: understanding what DatetimeEncoder extracts | ||
| ------------------------------------------------------------ | ||
|
|
||
| This example demonstrates what the DatetimeEncoder does on a simple, | ||
| non-forecasting regression task. | ||
|
|
||
| We compare a naive baseline using a raw numeric timestamp with a model | ||
| that uses DatetimeEncoder with periodic encoding to extract meaningful | ||
| time-based features for a linear regression model. | ||
| """ | ||
|
|
||
| import numpy as np | ||
| import pandas as pd | ||
|
|
||
| from skrub import DatetimeEncoder, ApplyToCols | ||
| from sklearn.linear_model import LinearRegression | ||
| from sklearn.pipeline import make_pipeline | ||
| from sklearn.metrics import r2_score | ||
|
|
||
| # --------------------------------------------------------------------- | ||
| # Create a simple synthetic dataset | ||
| # --------------------------------------------------------------------- | ||
| rng = pd.date_range("2023-01-01", periods=300, freq="h") | ||
|
|
||
| X = pd.DataFrame({"date": rng}) | ||
|
|
||
| # Target depends on hour of day and weekday (cyclic pattern) | ||
| y = ( | ||
| 10 * X["date"].dt.hour.isin([8, 9, 17, 18]).astype(int) | ||
| + 5 * (X["date"].dt.weekday < 5).astype(int) | ||
| + np.random.normal(0, 1, size=len(X)) | ||
| ) | ||
|
|
||
| # --------------------------------------------------------------------- | ||
| # Baseline model: naive numeric timestamp | ||
| # --------------------------------------------------------------------- | ||
| X_baseline = pd.DataFrame( | ||
| {"timestamp": X["date"].astype("int64") // 10**9} | ||
| ) | ||
|
|
||
| model_baseline = LinearRegression() | ||
| model_baseline.fit(X_baseline, y) | ||
|
|
||
| y_pred_baseline = model_baseline.predict(X_baseline) | ||
| print("R² with naive timestamp:", r2_score(y, y_pred_baseline)) | ||
|
|
||
| # --------------------------------------------------------------------- | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There should also be a section with a DatetimeEncoder that does not include periodic features for the sake of the comparison.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also please add a few lines explaining in words what is happening here as part of the narrative of the example. |
||
| # Model using DatetimeEncoder with periodic features | ||
| # --------------------------------------------------------------------- | ||
| model_datetime = make_pipeline( | ||
| ApplyToCols( | ||
| DatetimeEncoder( | ||
| add_weekday=True, | ||
| periodic_encoding="spline", | ||
| ), | ||
| cols=["date"], | ||
| ), | ||
| LinearRegression(), | ||
| ) | ||
|
|
||
| model_datetime.fit(X, y) | ||
| y_pred_datetime = model_datetime.predict(X) | ||
|
|
||
| print("R² with DatetimeEncoder:", r2_score(y, y_pred_datetime)) | ||
|
|
||
| # --------------------------------------------------------------------- | ||
| # Inspect generated datetime features | ||
| # --------------------------------------------------------------------- | ||
| apply = model_datetime.named_steps["applytocols"] | ||
| encoder = next(iter(apply.transformers_.values())) | ||
|
|
||
| print("Generated features:") | ||
| print(encoder.get_feature_names_out()) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add a conclusion summarizing briefly what is being done here |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please expand a bit this section explaining that for the sake of the example we are using a synthetic dataset, and a plot of what the target looks like.