Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sequential example #24

Merged
merged 7 commits into from
May 8, 2024
Merged

Add sequential example #24

merged 7 commits into from
May 8, 2024

Conversation

mats-claassen
Copy link
Member

Also added some fixes needed for this use case and removed unused conftest file

Copy link

@glema-xmartlabs glema-xmartlabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functions need simple doctring

from tf_tabular.utils import get_vocab


def divide_ratings_by_mean_user_rating(ratings: pd.DataFrame, user_id_column='user_id'):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why divide?
please add a simple doc string "Substract mean rating for each user"

ratings = ratings[["user_id", "movie_id", "user_rating"]].groupby(["user_id"], as_index=False).agg(list)

def cutoff(x):
return min(int(len(x) * 0.2), max_y_cutoff)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parametrize 0.2



def preprocess_dataset(ratings_df, movies_df):
train_df, val_df = split_by_user(ratings_df, max_y_cutoff=5)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parametrize 5 in max_y_cutoff

@@ -142,7 +145,7 @@ def build_categorical_input(name, embedding_dim, vocab, is_multi_hot, embedding_
return (x, inp)


def get_vocab(series, max_size):
def get_vocab(series, max_size: int = None):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional[int] or int | None

Comment on lines 59 to 60
val_users = unique_users[: int(num_users * 0.2)]
train_users = unique_users[int(num_users * 0.2) :]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace 0.2 with val_split

Copy link

@glema-xmartlabs glema-xmartlabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still seeing a mix of imperative and descriptive docstrings
Otherwise, LGTM!

train_set = ratings[ratings.user_id.isin(train_users)]
val_set = ratings[ratings.user_id.isin(val_users)]

print(f"Train set size: {train_set.shape}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like much the idea of having prints in the code. Can we change them for logging.info?

Copy link

@glema-xmartlabs glema-xmartlabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mats-claassen mats-claassen merged commit 4ab8c7b into main May 8, 2024
@mats-claassen mats-claassen deleted the examples/sequential branch May 8, 2024 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants