1. Intellindust AI Lab
2. University of Macau
* Equal Contribution † Corresponding Author
FSOD-VFM is a framework for few-shot object detection leveraging powerful vision foundation models (VFMs).
It integrates three key components:
🔹 Universal Proposal Network (UPN) for category-agnostic bounding box generation
🔹 SAM2 for accurate mask extraction
🔹 DINOv2 features for efficient adaptation to novel object categories
To address over-fragmentation in proposals, FSOD-VFM introduces a novel graph-based confidence reweighting strategy for refining detections.
If you find our work useful, please give us a ⭐!
- [2026.2.3] Initial release of FSOD-VFM.
Put all datasets under FSOD-VFM/dataset/:
git clone https://github.com/Intellindust-AI-Lab/FSOD-VFM
cd FSOD-VFM
mkdir datasetDownload Pascal VOC from http://host.robots.ox.ac.uk/pascal/VOC,
then put it under /dataset/ following structure:
dataset/PascalVOC/
├── VOC2007/
├── VOC2007Test/
│ └── VOC2007
│ │ ├── JPEGImages
│ │ └── ...
│ └── ...
└── VOC2012/Download COCO from https://cocodataset.org and organize it as:
dataset/coco/
├── annotations/
├── train2017/
├── val2017/
└── test2017/Download CD-FSOD from https://yuqianfu.com/CDFSOD-benchmark/, and organize as:
dataset/CDFSOD/
├── ArTaxOr/...
├── clipart1k/...
├── DIOR/...
├── FISH/...
├── NEU-DET/...
└── UODD/...conda env create -f fsod.yml
conda activate FSODVFM# Ensure the operation is performed inside the /FSOD-VFM directory
git clone https://github.com/facebookresearch/dinov2.gitconda install -c conda-forge gcc=9.5.0 gxx=9.5.0 ninja -y
cd chatrex/upn/ops
pip install -v -e .# Ensure the operation is performed outside the /FSOD-VFM directory
cd ../../../../
git clone https://github.com/facebookresearch/sam2.git && cd sam2
pip install -e .# Make sure the checkpoints folder is inside the project root (FSODVFM/checkpoints).
cd FSOD-VFM && mkdir checkpoints && cd checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt
wget https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth
wget https://github.com/IDEA-Research/ChatRex/releases/download/upn-large/upn_large.pthsh run_scripts/run_pascal.shTips:
-
Modify
--json_pathfor different splits (split1,split2,split3) and shot settings (1shot,5shot, etc.). -
Modify
--target categoriesfor different splits. -
Adjust hyperparameters:
--min_threshold: UPN confidence threshold (default:0.01)--alp: alpha for graph diffusion--lamb: decay parameter for graph diffusion
-
To fix shell script issues:
sed -i 's/\r$//' run_scripts/run_pascal.sh
sh run_scripts/run_coco.shTips:
- Modify
--json_pathfor10shotor30shot. - Target categories are fixed to the standard COCO 20 classes.
sh run_scripts/run_cdfsod.shTips:
-
Modify
--json_path,--test_json, and--test_img_dirfor different subsets (e.g.,ArTaxOr,DIOR). -
For
DIOR, use:--test_img_dir ./dataset/CDFSOD/DIOR/test/new_test/
If you use FSOD-VFM in your research, please cite:
@inproceedings{feng2025fsodvfm,
title={Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion},
author={Feng, Chen-Bin and Sha, Youyang and Liu, Longfei and Yu, Yongjun and Vong, Chi Man and Yu, Xuanlong and Shen, Xi},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026}
}Our work builds upon excellent open-source projects including No-Time-To-Train, SAM2, ChatRex, and DINOv2. We sincerely thank their authors for their contributions to the community.



