Skip to content

Commit d4181cf

Browse files
Merge pull request #1687 from microsoft/staging
Staging to main: New Release 1.1.0
2 parents 6987858 + 48b70d5 commit d4181cf

File tree

11 files changed

+233
-51
lines changed

11 files changed

+233
-51
lines changed

.readthedocs.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ build:
66
- cmake
77

88
# Explicitly set the version of Python and its requirements
9+
# The flat extra_requirements all is equivalent to: pip install .[all]
910
python:
1011
version: "3.7"
1112
install:

NEWS.md

+9
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,14 @@
11
# What's New
22

3+
## Update April 1, 2022
4+
5+
We have a new release [Recommenders 1.1.0](https://github.com/microsoft/recommenders/releases/tag/1.1.0)!
6+
We have introduced the SASRec and SSEPT algorithms that are based on transformers.
7+
In addition, we now have enabled Python 3.8 and 3.9.
8+
We have also made improvements on the SARPlus algorithm, including support for Azure Synapse and Spark 3.2.
9+
There are also bug fixes and improvements on NCF, RBM, LightGBM, LightFM, Scikit-Surprise, the stratified splitter, dockerfile
10+
and upgrade to Scikit-Learn 1.0.2.
11+
312
## Update January 13, 2022
413

514
We have a new release [Recommenders 1.0.0](https://github.com/microsoft/recommenders/releases/tag/1.0.0)! The codebase has now migrated to TensorFlow versions 2.6 / 2.7 and to Spark version 3. In addition, there are a few changes in the dependencies and extras installed by `pip` (see [this guide](recommenders/README.md#optional-dependencies)). We have also made improvements in the code and the CI / CD pipelines.

README.md

+8-3
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,14 @@
22

33
[![Documentation Status](https://readthedocs.org/projects/microsoft-recommenders/badge/?version=latest)](https://microsoft-recommenders.readthedocs.io/en/latest/?badge=latest)
44

5-
## What's New (January 13, 2022)
6-
7-
We have a new release [Recommenders 1.0.0](https://github.com/microsoft/recommenders/releases/tag/1.0.0)! The codebase has now migrated to TensorFlow versions 2.6 / 2.7 and to Spark version 3. In addition, there are a few changes in the dependencies and extras installed by `pip` (see [this guide](recommenders/README.md#optional-dependencies)). We have also made improvements in the code and the CI / CD pipelines.
5+
## What's New (April 1, 2022)
6+
7+
We have a new release [Recommenders 1.1.0](https://github.com/microsoft/recommenders/releases/tag/1.1.0)!
8+
We have introduced the SASRec and SSEPT algorithms that are based on transformers.
9+
In addition, we now have enabled Python 3.8 and 3.9.
10+
We have also made improvements on the SARPlus algorithm, including support for Azure Synapse and Spark 3.2.
11+
There are also bug fixes and improvements on NCF, RBM, LightGBM, LightFM, Scikit-Surprise, the stratified splitter, dockerfile
12+
and upgrade to Scikit-Learn 1.0.2.
813

914
Starting with release 0.6.0, Recommenders has been available on PyPI and can be installed using pip!
1015

docs/source/models.rst

+19
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,25 @@ SAR
213213
.. automodule:: recommenders.models.sar.sar_singlenode
214214
:members:
215215

216+
SASRec
217+
******************************
218+
219+
.. automodule:: recommenders.models.sasrec.model
220+
:members:
221+
222+
.. automodule:: recommenders.models.sasrec.sampler
223+
:members:
224+
225+
.. automodule:: recommenders.models.sasrec.util
226+
:members:
227+
228+
229+
SSE-PT
230+
******************************
231+
232+
.. automodule:: recommenders.models.sasrec.ssept
233+
:members:
234+
216235

217236
Surprise
218237
******************************

examples/00_quick_start/sequential_recsys_amazondataset.ipynb

+2-2
Original file line numberDiff line numberDiff line change
@@ -144,9 +144,9 @@
144144
"\n",
145145
"Only the SLi_Rec model is time-aware. For the other models, you can just pad some meaningless timestamp in the data files to fill up the format, the models will ignore these columns.\n",
146146
"\n",
147-
"We use Softmax to the loss function. In training and evalution stage, we group 1 positive instance with num_ngs negative instances. Pair-wise ranking can be regarded as a special case of Softmax ranking, where num_ngs is set to 1. \n",
147+
"We use Softmax to the loss function. In training and evalution stage, we group 1 positive instance with `num_ngs` negative instances. Pair-wise ranking can be regarded as a special case of softmax ranking, where `num_ngs` is set to 1. \n",
148148
"\n",
149-
"More specifically, for training and evalation, you need to organize the data file such that each one positive instance is followd by num_ngs negative instances. Our program will take 1+num_ngs lines as a unit for Softmax calculation. num_ngs is a parameter you need to pass to the `prepare_hparams`, `fit` and `run_eval` function. `train_num_ngs` in `prepare_hparams` denotes the number of negative instances for training, where a recommended number is 4. `valid_num_ngs` and `num_ngs` in `fit` and `run_eval` denote the number in evalution. In evaluation, the model calculates metrics among the 1+num_ngs instances. For the `predict` function, since we only need to calcuate a socre for each individual instance, there is no need for num_ngs setting. More details and examples will be provided in the following sections.\n",
149+
"More specifically, for training and evalation, you need to organize the data file such that each one positive instance is followed by `num_ngs` negative instances. Our program will take `1+num_ngs` lines as a unit for Softmax calculation. `num_ngs` is a parameter you need to pass to the `prepare_hparams`, `fit` and `run_eval` function. `train_num_ngs` in `prepare_hparams` denotes the number of negative instances for training, where a recommended number is 4. `valid_num_ngs` and `num_ngs` in `fit` and `run_eval` denote the number in evalution. In evaluation, the model calculates metrics among the `1+num_ngs` instances. For the `predict` function, since we only need to calcuate a score for each individual instance, there is no need for `num_ngs` setting. More details and examples will be provided in the following sections.\n",
150150
"\n",
151151
"For training stage, if you don't want to prepare negative instances, you can just provide positive instances and set the parameter `need_sample=True, train_num_ngs=train_num_ngs` for function `prepare_hparams`, our model will dynamicly sample `train_num_ngs` instances as negative samples in each mini batch.\n",
152152
"\n",

recommenders/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# Licensed under the MIT License.
33

44
__title__ = "Microsoft Recommenders"
5-
__version__ = "1.0.0"
5+
__version__ = "1.1.0"
66
__author__ = "RecoDev Team at Microsoft"
77
__license__ = "MIT"
88
__copyright__ = "Copyright 2018-present Microsoft Corporation"

recommenders/evaluation/spark_evaluation.py

+7-5
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# Copyright (c) Microsoft Corporation. All rights reserved.
22
# Licensed under the MIT License.
33

4+
import numpy as np
45
try:
56
from pyspark.mllib.evaluation import RegressionMetrics, RankingMetrics
67
from pyspark.sql import Window, DataFrame
@@ -99,13 +100,13 @@ def __init__(
99100
raise ValueError("Schema of rating_pred not valid. Missing Prediction Col")
100101

101102
self.rating_true = self.rating_true.select(
102-
col(self.col_user).cast("double"),
103-
col(self.col_item).cast("double"),
103+
col(self.col_user),
104+
col(self.col_item),
104105
col(self.col_rating).cast("double").alias("label"),
105106
)
106107
self.rating_pred = self.rating_pred.select(
107-
col(self.col_user).cast("double"),
108-
col(self.col_item).cast("double"),
108+
col(self.col_user),
109+
col(self.col_item),
109110
col(self.col_prediction).cast("double").alias("prediction"),
110111
)
111112

@@ -158,7 +159,8 @@ def exp_var(self):
158159
0
159160
]
160161
var2 = self.y_pred_true.selectExpr("variance(label)").collect()[0][0]
161-
return 1 - var1 / var2
162+
# numpy divide is more tolerant to var2 being zero
163+
return 1 - np.divide(var1, var2)
162164

163165

164166
class SparkRankingEvaluation:

0 commit comments

Comments
 (0)