Junyu Xie, Charig Yang, Weidi Xie, Andrew Zisserman
Visual Geometry Group, Department of Engineering Science, University of Oxford
pytorch=2.0.0
,
Pillow
,
opencv
,
einops
,
tensorboardX
Segment Anything can be installed following the official repository here, or by
pip install git+https://github.com/facebookresearch/segment-anything.git
- Synthetic training data from OCLR_paper can be downloaded from here.
- DAVIS2017 (and DAVIS2016) can be downloaded here.
- DAVIS2017-motion has the same sequences with DAVIS2017, but the annotations are curated to cater for jointly moving objects, which can be downloaded from here.
- DAVIS datasets can be obtained following the instructions above.
- YTVOS2018-motion is a subset selected from training split of YTVOS2018. These selected sequences are used for evaluation, with predominantly moving objects involved (i.e., objects can be discovered based on their motion). The list of selected sequences can be found here.
- Other datasets such as SegTrackv2, FBMS-59 and MoCA_filter can be downloaded and preprocessed following the protocol described in motiongrouping.
In this work, optical flow is estimated by RAFT, with the code provided in the flow
folder.
The data paths can be specified in data/dataset_config.py
.
- The pretrained original SAM checkpoints can be downloaded here
- The pretrained flowsam model checkpoints can be downloaded here.
- Our predicted masks on benchmarks datasets can be found here.
To run FlowI-SAM,
python evaluation.py --model=flowisam --dataset=dvs16 --flow_gaps=1,-1,2,-2 \
--max_obj=5 --num_gridside=10 --ckpt_path={} --save_path={}
To run FlowP-SAM,
python evaluation.py --model=flowpsam --dataset=dvs16 --flow_gaps=1,-1,2,-2 \
--max_obj=10 --num_gridside=20 --ckpt_path={} --save_path={}
where --flow_gaps
denotes the frame gaps of flow inputs
--max_obj
indicates the maximum number of predicted object masks
--num_gridside
indicates the number of uniform grid point inputs (e.g., "10" correponds to 10 x 10 points)
--ckpt_path
specifies the model checkpoint path
--save_path
specifies the path to save predicted masks (if not specified, no masks will be saved)
To run the code on your own data, (or datasets without GT multi-object segmentation, e.g., SegTrackv2, FBMS-59, MoCA_filter, etc.)
- Set
--dataset=example
, and arrange you data as the following:
{data_name}/
├── JPEGImages/
│ └── {category_name}/
│ ├── 00000.jpg
│ └── ......
├── FlowImages_gap1/
│ └── {category_name}/
│ ├── 00000.png
│ └── ......
├── ...... (More flow images)
- Add you own dataset information in
config_eval_dataloader()
indata/dataset_config.py
(under "example" dataset)
To perform sequence-level mask association (in other words, matching the identities of masks throughout the sequence) for multi-object datasets,
python seq_level_postprocess.py --dataset=dvs17m --mask_dir={} --save_path={}
For single-object cases usually the first mask of each frame would suffice.
- For DAVIS2016, use the DAVIS2016 official evaluator.
- For DAVIS2017, use the DAVIS2017 official evaluator.
- For DAVIS2017-motion, following the evaluation protocol introduced in OCLR_paper.
- For MoCA_filter, use the evaluator provided in motiongrouping.
python train.py --model={} --dataset=dvs16 --model_save_path={}
where --model
specifies the model to be trained (flowisam
or flowpsam
)
--model_save_path
indicates the path to save logs and model ckpts
If you find this repository helpful, please consider citing our work:
@article{xie2024flowsam,
title={Moving Object Segmentation: All You Need Is SAM (and Flow)},
author={Junyu Xie and Charig Yang and Weidi Xie and Andrew Zisserman},
journal={arXiv preprint arXiv:2404.12389},
year={2024}
}
Segment Anything: https://github.com/facebookresearch/segment-anything