This repository represents the implementation of the paper:
Ye Hong, Yatao Zhang, Konrad Schindler, Martin Raubal
| MIE, ETH Zurich | FRS, Singapore-ETH Centre | PRS, ETH Zurich |
This code has been tested on
- Python 3.9.12, trackintel 1.2.4, gensim 4.1.2, PyTorch 1.12.1, transformers 4.16.2, cudatoolkit 11.3, GeForce RTX 3090
To create a virtual environment and install the required dependencies, please run the following:
git clone https://github.com/mie-lab/location-prediction.git
cd location-prediction
conda env create -f environment.yml
conda activate loc-pred
in your working folder.
The respective code files are stored in separate modules:
/preprocessing/*
. Functions that are used for preprocessing the dataset. It should be executed before training a model.poi.py
includes POI preprocessing and embedding methods (LDA and TF-IDF)./models/*
. Implementation of Transformer learning model./baselines/*
. (Non-ML) Baseline methods that we implemented to compare with the proposed model. The methods include persistent forecast, most frequent forecast and Markov models./config/*
. Hyperparameter settings are saved in the.yml
files under the respective dataset folder underconfig/
. For example,/config/geolife/transformer.yml
contains hyperparameter settings of the transformer model for the geolife dataset./utils/*
. Helper functions that are used for model training./analysis/*
. Analysis function for getting dataset properties and visualizing training results of the model.entropy.py
includes functions to calculate the random, uncorrelated and real entropy.stats.py
includes functions to calculate the mobility motifs.
The main starting point for training a model is as follows:
main.py
for starting the deep learning model training.main_individual.py
for starting the training of individual models.
The repo contains different model variations, which can be controlled as follows:
- Individual vs collective model - Running
main.py
ormain_individual.py
. Config files for individual models containind_
as the prefix. - Including different contexts - Whether to include a specific context can be controlled in the config files, with
if_embed_user
,if_embed_poi
,if_embed_time
, andif_embed_duration
parameters. - Including different previous days - The length of considered historical previous days can be controlled through the
previous_day
parameter in each config file. - Including separate previous days - The selection of single historical previous days can be controlled through the
day_selection
parameter in each config file.default
includes all days, and specific day selection can be passed in a list, e.g.,[0, 1, 7]
to include only the current, previous and one week before.
To run the whole pipeline on the Geolife dataset, follow the steps below:
Download the repo, and install the necessary Requirements and dependencies
.
Download the Geolife GPS tracking dataset from here. Create a new folder in the repo root and name it data
. Unzip and copy the Geolife Data
folder into data/
. The file structure should look like data/Data/000/...
.
Create a file paths.json
in the repo root, and define your working directories by writing:
{
"raw_geolife": "./data/Data"
}
run
python preprocessing/geolife.py 20
for executing the preprocessing script for the geolife dataset. The process takes 15-30min. dataSet_geolife.csv
, sp_time_temp_geolife.csv
and valid_ids_geolife.pk
will be created under the data/
folder, geolife_slide_filtered.csv
will be created under data/quality
folder.
run
python main.py config/geolife/transformer.yml
for starting the training process. The dataloader will create intermediate data files and save them under data/temp/
folder. The configuration of the current run, the network paramters and the performance indicators will be stored under the outputs/
folder.
run
python analysis/stats.py
for generating the mobility entropy plot, the basic statistics of the Geolife dataset, and generating the tracking quality plot.
To run the whole pipeline on Gowalla or Foursquare New York City (NYC) datasets, follow the steps below:
Switch to lbsn
branch. Download the repo, and install the necessary Requirements and dependencies
.
Download the Gowalla dataset from here or the Foursquare NYC dataset from here. Create a new folder in the repo root and name it data
. Unzip and copy the Gowalla Gowalla_totalCheckins.txt
file into a new folder data/gowalla
. The file structure should look like data/gowalla/Gowalla_totalCheckins.txt
for Gowalla. Alternatively, unzip and copy the Foursquare dataset_TSMC2014_NYC.txt
file into a new folder data/tsmc2014
. The file structure should look like data/tsmc2014/dataset_TSMC2014_NYC.txt
for Foursquare.
Create a file paths.json
in the repo root, and define your working directories by writing:
{
"raw_gowalla": "./data/gowalla"
}
or
{
"raw_foursquare": "./data/tsmc2014"
}
run
python preprocessing/gowalla.py
or
python preprocessing/foursquare.py
for executing the preprocessing script for the datasets. dataSet_*.csv
, locations_*.csv
, sp_time_temp_*.csv
and valid_ids_*.pk
will be created under the data/
folder,
run
python main.py config/gowalla/transformer.yml
or
python main.py config/foursquare/transformer.yml
for starting the training process. The dataloader will create intermediate data files and save them under the data/temp/
folder. The configuration of the current run, the network paramters and the performance indicators will be stored under the outputs/
folder.
If you find this code useful for your work or use it in your project, please consider citing:
@article{hong_context_2023,
title = {Context-aware multi-head self-attentional neural network model for next location prediction},
journal = {Transportation Research Part C: Emerging Technologies},
author = {Hong, Ye and Zhang, Yatao and Schindler, Konrad and Raubal, Martin},
year = {2023},
volume = {156},
pages = {104315},
doi = {10.1016/j.trc.2023.104315}
}
If you have any questions, please open an issue or let me know:
- Ye Hong {[email protected]}