SSP-MMC

Stochastic-Shortest-Path-Minimize-Memorization-Cost (SSP-MMC) is a spaced repetition scheduling algorithm used to help learners remember more words in MaiMemo, a language learning application in China.

This repository contains a public release of the data and code used for several experiments in the following paper (which introduces SSP-MMC):

Junyao Ye, Jingyong Su, and Yilong Cao. 2022. A Stochastic Shortest Path Algorithm for Optimizing Spaced Repetition Scheduling. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 4381–4390.

You can access this paper at: https://www.maimemo.com/paper/

When using this data set and/or software, please cite this publication. A BibTeX record is:

@inproceedings{10.1145/3534678.3539081,
author = {Ye, Junyao and Su, Jingyong and Cao, Yilong},
title = {A Stochastic Shortest Path Algorithm for Optimizing Spaced Repetition Scheduling},
year = {2022},
publisher = {ACM},
doi = {10.1145/3534678.3539081},
pages = {4381–4390},
numpages = {10}
}

Software

The file data_preprocessing.py is used to preprocess data for the DHP model.

The file cal_model_param.py contains the DHP model and HLR model.

The file model/utils.py saves the parameters of the DHP model for training and simulation.

The file algo/main.cpp contains a Cpp implementation of SSP-MMC, which aims at finding the optimal policy.

The file simulator.py provides an environment for comparing different scheduling algorithms.

Workflow

Run data_preprocessing.py -> halflife_for_fit.tsv
Run cal_model_param.py -> intercept_ and coef_ for the DHP model
Save the parameters to the function cal_recall_halflife and cal_forget_halflife in model/utils.py and the function cal_next_recall_halflife in algo/main.cpp
Run algo/main.cpp -> optimal policy in algo/result/
Run simulator.py to compare the SSP-MMC with several baselines.

Data Set and Format

The dataset is available on Dataverse (1.6 GB). This is a 7zipped TSV file containing our experiments' 220 million MaiMemo student memory behavior logs.

The columns are as follows:

u - student user ID who reviewed the word (anonymized)
w - spelling of the word
i - total times the user has reviewed the word
d - difficulty of the word
t_history - interval sequence of the historic reviews
r_history - recall sequence of the historic reviews
delta_t - time elapsed from the last review
r - result of the review
p_recall - probability of recall
total_cnt - number of users who did the same memory behavior

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
algo		algo
data		data
fit_result		fit_result
model		model
plot		plot
simulation_result		simulation_result
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
cal_model_param.py		cal_model_param.py
data_preprocessing.py		data_preprocessing.py
simulator.py		simulator.py
visualization.py		visualization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SSP-MMC

Software

Workflow

Data Set and Format

About

Contributors 2

Languages

License

maimemo/SSP-MMC

Folders and files

Latest commit

History

Repository files navigation

SSP-MMC

Software

Workflow

Data Set and Format

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages