Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for grouped custom metrics in workflows #371

Open
simdadim opened this issue Feb 23, 2023 · 1 comment
Open

Support for grouped custom metrics in workflows #371

simdadim opened this issue Feb 23, 2023 · 1 comment
Labels
feature a feature request or enhancement

Comments

@simdadim
Copy link

simdadim commented Feb 23, 2023

I've working on a tabular dataset where multiple rows are linked forming an "session". Within the session, one of the rows are our target action, and the goal for the model is to try to find this, and bump this row as far up within the session as possible (predict this row with higher probability than the other rows in the same session).

I've modeled this using a binary classifier. To evaluate the performance, I want to se how much up/down the target row was moved in the correct direction if I sort the rows within the session by each rows predicted probability of being the correct one. Stated otherwise - I want the most likely row within each session to have the highest probability within the given session.

The formula is quite easy, its just group by session_id, identify the target row in each session, and evaluate the relative distance it has moved if I sort on the new predicted probability score.

But, I struggle to create a custom yardstick-metric to calculate this, since the grouping of the data is not passed on to my custom evaluating function. I've tried different approaches, but from what I can see, the problem is that the predict_model-function is dropping the grouping of the dataframe.

Is it possible to keep the grouping of a DF in predict_model(), and include the grouping variable? To my understanding, this would make it possible to develop custom metrics that accounts for grouped data. I imagine this means that the .predictions column in the resample_results dataframe would also keep the groups.

@EmilHvitfeldt EmilHvitfeldt added the feature a feature request or enhancement label Mar 30, 2023
@EmilHvitfeldt
Copy link
Member

Hello @simdadim 👋

This is a good idea, we are currently thinking about how to best handle these types of metrics. We want to make sure our approach is sound and general enough to do everything we want.

I'm gonna keep this issue up to remind us of your request, but it will take a little while before we get to working on this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants