You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
I'm newbie to recommender system and cython, so I was trying to what is going on by reviewing ALS code without cython written here.
After understanding mathematical derivation, I now want to compare als results using implicit library and using plain least squares written in here.
The reproducible code is given below. Please note that to reproduce results, I fixed user_factors and item_factors using np.random.seed(1).
importnumpyasnpfromnumpy.testingimportassert_almost_equalimportimplicitfromimplicit.datasets.movielensimportget_movielensfromimplicit.nearest_neighboursimportbm25_weightfromimplicit.utilsimportnonzerosdefleast_squares(Cui, X, Y, regularization, num_threads=0):
"""For each user in Cui, calculate factors Xu for them using least squares on Y. Note: this is at least 10 times slower than the cython version included here. """users, n_factors=X.shapeYtY=Y.T.dot(Y)
foruinrange(users):
X[u] =user_factor(Y, YtY, Cui, u, regularization, n_factors)
returnXdefuser_linear_equation(Y, YtY, Cui, u, regularization, n_factors):
# Xu = (YtCuY + regularization * I)^-1 (YtCuPu)# YtCuY + regularization * I = YtY + regularization * I + Yt(Cu-I)# accumulate YtCuY + regularization*I in AA=YtY+regularization*np.eye(n_factors)
# accumulate YtCuPu in bb=np.zeros(n_factors)
fori, confidenceinnonzeros(Cui, u):
factor=Y[i]
ifconfidence>0:
b+=confidence*factorelse:
confidence*=-1A+= (confidence-1) *np.outer(factor, factor)
returnA, bdefuser_factor(Y, YtY, Cui, u, regularization, n_factors):
# Xu = (YtCuY + regularization * I)^-1 (YtCuPu)A, b=user_linear_equation(Y, YtY, Cui, u, regularization, n_factors)
returnnp.linalg.solve(A, b)
defitem_factor(X, XtX, Cui, u, regularization, n_factors):
# Yu = (XtCuX + regularization * I)^-1 (XtCuPu)A, b=user_linear_equation(X, XtX, Cui, u, regularization, n_factors)
returnnp.linalg.solve(A, b)
params= {
'factors':100,
'iterations':1,
'regularization':0.01,
'random_state':42
}
imp_als=implicit.als.AlternatingLeastSquares(**params)
titles, ratings=get_movielens("1m")
min_rating=4# remove things < min_rating, and convert to implicit dataset# by considering ratings as a binary preference onlyratings.data[ratings.data<min_rating] =0ratings.eliminate_zeros()
ratings.data=np.ones(len(ratings.data))
ratings= (bm25_weight(ratings, B=0.9) *5).tocsr()
user_ratings=ratings.T.tocsr()
M,N=user_ratings.shapenp.random.seed(1)
user_factors=np.random.rand(M, params["factors"]).astype(np.float32) *0.01item_factors=np.random.rand(N, params["factors"]).astype(np.float32) *0.01imp_als.user_factors=user_factors.copy()
imp_als.item_factors=item_factors.copy()
imp_als.fit(user_ratings)
for_inrange(params["iterations"]):
user_factors=least_squares(user_ratings, user_factors, item_factors, 0.01, num_threads=0)
item_factors=least_squares(user_ratings.T.tocsr(), item_factors, user_factors, 0.01, num_threads=0)
assert_almost_equal(imp_als.user_factors, user_factors)
Because both of implicit and least_squares function starts optimization using same user_factors and item_factors, I expected same parameter results after one iteration. However, they are slightly different.
user_factors[:10,:10] from least_squares is given below
Hello!
I'm newbie to recommender system and cython, so I was trying to what is going on by reviewing ALS code without cython written here.
After understanding mathematical derivation, I now want to compare als results using
implicit
library and usingplain least squares
written in here.The reproducible code is given below. Please note that to reproduce results, I fixed
user_factors
anditem_factors
usingnp.random.seed(1)
.Because both of
implicit
andleast_squares
function starts optimization using sameuser_factors
anditem_factors
, I expected same parameter results afterone iteration
. However, they are slightly different.user_factors[:10,:10]
fromleast_squares
is given belowimp_als.user_factors[:10,:10]
fromleast_squares
is given belowWe can see that numbers are slightly different, starting from second decimal place generally.
Is there anything that I'm missing? Or these differences are tolerated? Any comments are very appreciated.
The text was updated successfully, but these errors were encountered: