-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sequential example #24
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
functions need simple doctring
examples/sequential/preprocessing.py
Outdated
from tf_tabular.utils import get_vocab | ||
|
||
|
||
def divide_ratings_by_mean_user_rating(ratings: pd.DataFrame, user_id_column='user_id'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why divide?
please add a simple doc string "Substract mean rating for each user"
examples/sequential/preprocessing.py
Outdated
ratings = ratings[["user_id", "movie_id", "user_rating"]].groupby(["user_id"], as_index=False).agg(list) | ||
|
||
def cutoff(x): | ||
return min(int(len(x) * 0.2), max_y_cutoff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parametrize 0.2
examples/sequential/preprocessing.py
Outdated
|
||
|
||
def preprocess_dataset(ratings_df, movies_df): | ||
train_df, val_df = split_by_user(ratings_df, max_y_cutoff=5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parametrize 5 in max_y_cutoff
src/tf_tabular/utils.py
Outdated
@@ -142,7 +145,7 @@ def build_categorical_input(name, embedding_dim, vocab, is_multi_hot, embedding_ | |||
return (x, inp) | |||
|
|||
|
|||
def get_vocab(series, max_size): | |||
def get_vocab(series, max_size: int = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional[int] or int | None
examples/sequential/preprocessing.py
Outdated
val_users = unique_users[: int(num_users * 0.2)] | ||
train_users = unique_users[int(num_users * 0.2) :] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace 0.2 with val_split
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still seeing a mix of imperative and descriptive docstrings
Otherwise, LGTM!
examples/sequential/preprocessing.py
Outdated
train_set = ratings[ratings.user_id.isin(train_users)] | ||
val_set = ratings[ratings.user_id.isin(val_users)] | ||
|
||
print(f"Train set size: {train_set.shape}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like much the idea of having prints in the code. Can we change them for logging.info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Also added some fixes needed for this use case and removed unused conftest file