EPIC-KITCHENS VISOR Benchmark VIdeo Segmentations and Object Relations (NeurIPS 2022 - Datasets and Benchmarks Track)
Ahmad Darkhalil*, Dandan Shan*, Bin Zhu*, Jian Ma*, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, Dima Damen
This repo contains code for the Hand-Object-Segmentation benchmarks and evaluations in EPCI-KITCHENS VISOR.
When use this repo, any of our models or dataset, you need to cite the VISOR paper
@inproceedings{VISOR2022,
title = {EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations},
author = {Darkhalil, Ahmad and Shan, Dandan and Zhu, Bin and Ma, Jian and Kar, Amlan and Higgins, Richard and Fidler, Sanja and Fouhey, David and Damen, Dima},
booktitle = {Proceedings of the Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks},
year = {2022}
}
Conda environment recommended:
- cv2
- pytorch
- detectron2
conda create --name hos
conda activate hos
pip install opencv-python
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.9/index.html
Download VISOR data from EPIC-KITCHENS VISOR. Unzip it and rename it as epick_visor
.
Generate a COCO format annotation of VISOR data for training:
--epick_visor_store
: the path to the annotation folder.
--mode
: coco format data for different tasks, choose from hos
or active
.
--split
: generate for which split, choose from train
and val
.
--unzip_img
: only need to use this args once to unzip the orginally downloaded compressed images for each video. Worth noting that unzip
command sometimes has some issue, which affects data loading later.
--copy_img
: copy images to get the same folder structure as in COCO.
python gen_coco_format.py \
--epick_visor_store=/path/to/epick_visor/GroundTruth-SparseAnnotations \
--mode=hos \
--split train val \
--unzip_img \
--copy_img \
Then the data structure looks like below:
datasets
├── epick_visor_coco_active
│ ├── annotations
│ │ ├── train.json
│ │ └── val.json
│ ├── train
│ │ └── *.jpg
│ └── val
│ └── *.jpg
└── epick_visor_coco_hos
├── annotations
│ ├── train.json
│ └── val.json
├── train
│ └── *.jpg
└── val
└── *.jpg
Error Correction. In the script before generating the COCO version, we correct some errors first and save all jsons in annotations_corrected
folder.
- In the dataset, there are missing "on_which_hand" and "in_contact_object" labels for two images with gloves,
P06_13/P06_13_frame_0000000128.jpg
andP06_13/P06_13_frame_0000000181.jpg
. We add the keys and values for them to make sure all images with gloves have these two keys. - There is a typo on 13 images in train set where 'on_which_hand' is ['left hand', 'rigth hand'], we meant ['left hand', 'right hand'].
- Hand-object relations errors in 11 images.
Visualize the COCO version annotations from trainset:
python -m hos.data.datasets.epick ./datasets/epick_visor_coco_hos/annotations/train.json ./datasets/epick_visor_coco_hos/train epick_visor_2022_train
Download our pre-trained weights into checkpoints\
folder to run evaluation or demo code:
mkdir checkpoints && cd checkpoints
wget -O model_final_hos.pth https://www.dropbox.com/s/bfu94fpft2wi5sn/model_final_hos.pth?dl=0
wget -O model_final_active.pth https://www.dropbox.com/s/j2tsvjgneyaggy4/model_final_active.pth?dl=0
cd ..
Hand and Contacted Object Segmentation (HOS) model:
python train_net_hos.py \
--config-file ./configs/hos/hos_pointrend_rcnn_R_50_FPN_1x.yaml \
--num-gpus 2 \
--dataset epick_hos \
OUTPUT_DIR ./checkpoints/hos_train
Hand and Active Object Segmentation (Active) model:
python train_net_active.py \
--config-file ./configs/active/active_pointrend_rcnn_R_50_FPN_1x.yaml \
--num-gpus 2 \
--dataset epick_active \
OUTPUT_DIR ./checkpoints/active_train
Hand and Contacted Object Segmentation (HOS) model:
python eval.py \
--config-file ./configs/hos/hos_pointrend_rcnn_R_50_FPN_1x.yaml \
--num-gpus 2 \
--eval-only \
OUTPUT_DIR ./checkpoints/hos \
MODEL.WEIGHTS ./checkpoints/model_final_hos.pth
Hand and Active Object Segmentation (Active) model:
python eval.py \
--config-file ./configs/active/active_pointrend_rcnn_R_50_FPN_1x.yaml \
--num-gpus 2 \
--eval-only \
OUTPUT_DIR ./checkpoints/active \
MODEL.WEIGHTS ./checkpoints/model_final_active.pth
Create inputs\
and outputs\
folders, put images you want to test into inputs\
:
mkdir inputs && mkdir outputs
Then run the demo:
python demo.py --input_dir=inputs --output_dir=outputs