You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've working on a tabular dataset where multiple rows are linked forming an "session". Within the session, one of the rows are our target action, and the goal for the model is to try to find this, and bump this row as far up within the session as possible (predict this row with higher probability than the other rows in the same session).
I've modeled this using a binary classifier. To evaluate the performance, I want to se how much up/down the target row was moved in the correct direction if I sort the rows within the session by each rows predicted probability of being the correct one. Stated otherwise - I want the most likely row within each session to have the highest probability within the given session.
The formula is quite easy, its just group by session_id, identify the target row in each session, and evaluate the relative distance it has moved if I sort on the new predicted probability score.
But, I struggle to create a custom yardstick-metric to calculate this, since the grouping of the data is not passed on to my custom evaluating function. I've tried different approaches, but from what I can see, the problem is that the predict_model-function is dropping the grouping of the dataframe.
Is it possible to keep the grouping of a DF in predict_model(), and include the grouping variable? To my understanding, this would make it possible to develop custom metrics that accounts for grouped data. I imagine this means that the .predictions column in the resample_results dataframe would also keep the groups.
The text was updated successfully, but these errors were encountered:
This is a good idea, we are currently thinking about how to best handle these types of metrics. We want to make sure our approach is sound and general enough to do everything we want.
I'm gonna keep this issue up to remind us of your request, but it will take a little while before we get to working on this problem.
I've working on a tabular dataset where multiple rows are linked forming an "session". Within the session, one of the rows are our target action, and the goal for the model is to try to find this, and bump this row as far up within the session as possible (predict this row with higher probability than the other rows in the same session).
I've modeled this using a binary classifier. To evaluate the performance, I want to se how much up/down the target row was moved in the correct direction if I sort the rows within the session by each rows predicted probability of being the correct one. Stated otherwise - I want the most likely row within each session to have the highest probability within the given session.
The formula is quite easy, its just group by session_id, identify the target row in each session, and evaluate the relative distance it has moved if I sort on the new predicted probability score.
But, I struggle to create a custom yardstick-metric to calculate this, since the grouping of the data is not passed on to my custom evaluating function. I've tried different approaches, but from what I can see, the problem is that the
predict_model
-function is dropping the grouping of the dataframe.Is it possible to keep the grouping of a DF in
predict_model()
, and include the grouping variable? To my understanding, this would make it possible to develop custom metrics that accounts for grouped data. I imagine this means that the.predictions
column in theresample_results
dataframe would also keep the groups.The text was updated successfully, but these errors were encountered: