PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs

This is the official code repository. For a paper summary, check out our project page!

Note: More functionalities, scripts and pretrained PINs are released in the coming weeks!

Installation

Please create a conda environment for running PIN on OpenFlamingo, run

conda env create -f environment.yml

currently we are working on incorporating BLIP-2 into a single environment file, please stay tuned.

Evaluation

We can evaluate the trained PIN on OpenFlamingo using for COCO, PVOC and LVIS. For that, please set up the corresponding datasets according to their repo or website. Alternatively, one can set the flag download to true to downlaod via our code for PVOC. The metrics and visualizations are logged using wandb. The test script can be started using

sh scripts/test_OF_PIN.sh

after adding the disk path to each dataset.

Training

First we need to set up the training data. For background images we use the BG20k dataset, please download from their repo and save on disk. Please copy the lvis category list from here to the utils folder. Synthetic images are generated following XPaste. We create a synthetic dataset with 100 samples, after cleaning around 60k objects remained. We will release our generated synthetic images soon and share a link here.

After setting up the datasets, you can start a training run for PIN using

sh scripts/run_OF_PIN.sh

Training metrics are logged using wandb.

Contact

If you have questions or find a bug, feel free to open a GitHub issue or send a mail to m.l.dorkenwald at uva.nl.

BibTeX

@InProceedings{Dorkenwald_PIN_CVPR_2024,
    author    = {Dorkenwald, Michael and Barazani, Nimrod and Snoek, Cees G. M. and Asano, Yuki M.},
    title     = {PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {13548-13558}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
PINs		PINs
eval_dloaders		eval_dloaders
openflamingo		openflamingo
scripts		scripts
utils		utils
LICENSE		LICENSE
README.md		README.md
VLM_PIN_adaption.py		VLM_PIN_adaption.py
environment.yml		environment.yml
run.py		run.py
synthetic_dataloader.py		synthetic_dataloader.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs

Installation

Evaluation

Training

Contact

BibTeX

About

Releases

Packages

Languages

License

QUVA-Lab/PIN

Folders and files

Latest commit

History

Repository files navigation

PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs

Installation

Evaluation

Training

Contact

BibTeX

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages