A PyTorch implementation of the Pointer Network model in:
A reinforcement learning approach to the orienteering problem with time windows
Gama R,
Fernandes HL - Computers & Operations Research, 2021
(arXiv, github)
@article{GAMA2021105357,
title={A Reinforcement Learning Approach to the Orienteering Problem with Time Windows},
journal = {Computers & Operations Research},
volume = {133},
pages = {105357},
year = {2021},
issn = {0305-0548},
author={Ricardo Gama and Hugo L. Fernandes},
journal = {Computers & Operations Research},
url = {https://www.sciencedirect.com/science/article/pii/S0305054821001349},
}
This repo includes two already trained models:
- Cordeau's OPTW instance-region pr01, trained with uniform sampling:
$ python inference_optw_rl.py --instance pr01 --model_name article --sample_type uni_samp
route: [0, 9, 24, 47, 12, 38, 30, 2, 32, 37, 10, 45, 11, 28, 1, 16, 36, 31, 35, 34, 22, 7, 0]
total score: 308
inference time: 173 ms
using CPU:
$ python inference_optw_rl.py --instance pr01 --model_name article --sample_type uni_samp --device cpu
route: [0, 9, 24, 47, 12, 38, 30, 2, 32, 37, 10, 45, 11, 28, 1, 16, 36, 31, 35, 34, 22, 7, 0]
total score: 308
inference time: 1058 ms
- Gavalas' OPTW instance-region t101, trained with correlation sampling of scores:
$ python inference_optw_rl.py --instance t101 --model_name article --sample_type corr_samp
route: [0, 68, 29, 8, 76, 3, 62, 7, 61, 16, 69, 17, 15, 33, 39, 44, 97, 46, 92, 74, 78, 0]
total score: 399
inference time: 343 ms
-
Install Anaconda (Python 3 version).
-
Clone this repo:
$ git clone https://github.com/mustelideos/optw_rl
-
Install the environment:
$ cd optw_rl/ $ conda env create --file environment.yml
-
Activate the environment:
$ conda activate optw_env
-
Using Beam Search (the default option) for Inference ("model")
$ python inference_optw_rl.py --instance t101 --model_name article --sample_type corr_samp route: [0, 68, 29, 8, 76, 3, 62, 7, 61, 16, 69, 17, 15, 33, 39, 44, 97, 46, 92, 74, 78, 0] total score: 399
-
Using Active Search followed by Beam Search ("model+as")
$ python inference_optw_rl.py --instance t101 --infe_type as_bs --model_name article --sample_type corr_samp route: [0, 68, 29, 8, 76, 3, 62, 7, 61, 24, 46, 97, 47, 16, 69, 17, 15, 21, 95, 92, 74, 0] total score: 400
- Make sure to generate the validation set of tourist-region-instances first
$ python generate_instances.py --instance t101 --sample_type corr_samp
- Using Beam Search for Inference ("model")
$ python inference_optw_rl.py --instance t101 --model_name article --generated --sample_type corr_samp average total score: 339.16
- Using Active Search followed by Beam Search ("model+as")
$ python inference_optw_rl.py --instance t101 --infe_type as_bs --model_name article --generated --sample_type corr_samp average total score: 340.03
-
Make sure to generate the validation set of tourist-region-instances first. These are used to report performance during training.
$ python generate_instances.py --instance t101 --sample_type corr_samp
-
Choose a name, how many epochs and how often it is saved
$ python train_optw_rl.py --instance t101 --sample_type corr_samp --nepocs 1000 --nsave 1000 --model_name testing_1
-
Infer (using Beam Search, for instance) specifying the model name and the (saved) number of epochs
$ python inference_optw_rl.py --instance t101 --sample_type corr_samp --model_name testing_1 --saved_model_epoch 1000 total score: 376
For optional arguments and default values:
$ python train_optw_rl.py -h
(...)
optional arguments:
-h, --help show this help message and exit
--instance INSTANCE which instance to train on
--device DEVICE device to use (cpu/cuda)
--use_checkpoint use checkpoint (see
https://pytorch.org/docs/stable/checkpoint.html)
--sample_type {uni_samp,corr_samp}
how to sample the scores of each point of interest:
uniformly sampled (uni_samp), score proportional to
each point of interest's duration of visit (corr_samp)
--model_name MODEL_NAME
model name
--debug debug mode (verbose output and no saving)
--nsave NSAVE saves the model weights every <nsave> epochs
--nprint NPRINT to log and save the training history (total score in
the benchmark and generated instances of the
validation set) every <nprint> epochs
--nepocs NEPOCS number of training epochs
--batch_size BATCH_SIZE
training batch size
--max_grad_norm MAX_GRAD_NORM
maximum norm value for gradient value clipping
--lr LR initial learning rate
--seed SEED seed random # generators (for reproducibility)
--beta BETA entropy term coefficient
--rnn_hidden RNN_HIDDEN
hidden size of RNN
--n_layers N_LAYERS number of attention layers in the encoder
--n_heads N_HEADS number heads in attention layers
--ff_dim FF_DIM hidden dimension of the encoder's feedforward sublayer
--nfeatures NFEATURES
number of non-dynamic features
--ndfeatures NDFEATURES
number of dynamic features
.
├── data
│ ├── benchmark
│ └── generated
├── images
├── results
│ ├── pr01
│ │ └── model_w
│ │ └── model_article_uni_samp
│ └── t101
│ └── model_w
│ └── model_article_corr_samp
└── src