07-09-2023
: The paper is available on arXiv now
28-08-2023
: The pretrained tracker model is released
17-08-2023
: The SMAT tracker training and inference code is released
14-08-2023
: The paper is accepted at WACV2024
Install the dependency packages using the environment file smat_pyenv.yml
.
Generate the relevant files:
python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output
After running this command, modify the datasets paths by editing these files
lib/train/admin/local.py # paths about training
lib/test/evaluation/local.py # paths about testing
- Set the path of training datasets in
lib/train/admin/local.py
- Place the pretrained backbone model under the
pretrained_models/
folder - For data preparation, please refer to this
- Uncomment lines
63, 67, and 71
in the base_backbone.py file. Long story short: The code is opitmized for high inference speed, hence some intermediate feature-maps are pre-computed during testing. However, these pre-computations are not feasible during training. - Run
python tracking/train.py --script mobilevitv2_track --config mobilevitv2_256_128x1_ep300 --save_dir ./output --mode single
- The training logs will be saved under
output/logs/
folder
The pretrained tracker model can be found here
- Update the test dataset paths in
lib/test/evaluation/local.py
- Place the pretrained tracker model under
output/checkpoints/
folder - Run
python tracking/test.py --tracker_name mobilevitv2_track --tracker_param mobilevitv2_256_128x1_ep300 --dataset got10k_test or trackingnet or lasot --inference_mode pytorch or onnx or openvino or tensorrtfp32
- Change the
DEVICE
variable betweencuda
andcpu
in the--tracker_param
file for GPU and CPU-based inference, respectively - The raw results will be stored under
output/test/
folder
To evaluate the tracker on a sample video, run
python tracking/video_demo.py --tracker_name mobilevitv2_track --tracker_param mobilevitv2_256_128x1_ep300 --videofile *path-to-video-file* --optional_box *bounding-box-annotation*
- We use the Separable Self-Attention Transformer implementation and the pretrained
MobileViTv2
backbone from ml-cvnets. Thank you! - Our training code is built upon OSTrack and PyTracking
- To generate the evaluation metrics for different datasets (except, server-based GOT-10k and TrackingNet), we use the pysot-toolkit
If our work is useful for your research, please consider citing:
@InProceedings{Gopal2024Sep,
author = "Goutam Yelluru Gopal and Maria Amer",
title = "Separable Self and Mixed Attention Transformers for Efficient Object Tracking",
booktitle = "IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) ",
year = "2024",
pages = "8",
month = "Jan. 4-8",
address = "Waikoloa, Hawaii",
}