This is the project repository for DevBench (preprint), a multimodal benchmark intended to assess Vision–Language Models in terms of their similarities with human responses across development.
Evaluation of an implemented model against DevBench can be conducted as follows:
python eval.py model_name
where model_name
is one of clip_base
, clip_large
, blip
, flava
, bridgetower
, vilt
, or cvcl
.
You can add additional models by constructing a subclass with methods for obtaining image features, text features, and image--text similarity scores.
For attribution and licensing reasons, not all assets and data are hosted within this repo. Assets and data can be obtained via the following means:
(Lexical) Looking While Listening: Assets are available in this repo. Assets from Adams et al. (2018) and Frank et al. (2016) were directly obtained from the original papers, while assets from Donnelly & Kidd (2021) were reconstructed to ensure licensing at least as permissible as CC-BY-NC-SA. Data are available in this repo; they were aggregated from data in the original papers.
(Lexical) Visual Vocabulary: Assets are available from OSF (these are the same images as in the THINGS similarity task). Data are available in this repo.
(Grammatical) Test of Reception of Grammar: Assets can be downloaded from the LEVANTE repo by running sh assets/gram-trog/trog_dl.sh
. Data are available in this repo.
(Grammatical) Winoground: Assets can be downloaded from Hugging Face; download and unzip images.zip
into assets/gram-winoground/images
. Data can be downloaded from Hugging Face; this should go into evals/gram-winoground
.
(Semantic) Free Word Association Task: Assets and data from children are available in this repo. These were transcribed from Entwisle (1966), but thresholded to remove idiosyncratic responses. Assets and data from adults can be downloaded from the Florida Free Association Norms by running sh assets/sem-wat/wat_adult_dl.sh
.
(Semantic) Visual Object Categorisation: Assets are available in this repo. These were obtained either from Kiani et al. (2007) via Spriet et al. (2021), or reconstructed to ensure licensing at least as permissible as CC-BY-NC-SA. Data are also available in this repo, converted from SPSS files from the original paper.
(Semantic) THINGS Similarity: Assets are available from OSF. Data can be downloaded from OSF; this should go into evals/sem-things
.