Skip to content

alvinwmtan/dev-bench

Repository files navigation

DevBench: A multimodal developmental benchmark for language learning

This is the project repository for DevBench (preprint), a multimodal benchmark intended to assess Vision–Language Models in terms of their similarities with human responses across development.

Evaluation of an implemented model against DevBench can be conducted as follows:

python eval.py model_name

where model_name is one of clip_base, clip_large, blip, flava, bridgetower, vilt, or cvcl.

You can add additional models by constructing a subclass with methods for obtaining image features, text features, and image--text similarity scores.

Obtaining assets and data

For attribution and licensing reasons, not all assets and data are hosted within this repo. Assets and data can be obtained via the following means:

(Lexical) Looking While Listening: Assets are available in this repo. Assets from Adams et al. (2018) and Frank et al. (2016) were directly obtained from the original papers, while assets from Donnelly & Kidd (2021) were reconstructed to ensure licensing at least as permissible as CC-BY-NC-SA. Data are available in this repo; they were aggregated from data in the original papers.

(Lexical) Visual Vocabulary: Assets are available from OSF (these are the same images as in the THINGS similarity task). Data are available in this repo.

(Grammatical) Test of Reception of Grammar: Assets can be downloaded from the LEVANTE repo by running sh assets/gram-trog/trog_dl.sh. Data are available in this repo.

(Grammatical) Winoground: Assets can be downloaded from Hugging Face; download and unzip images.zip into assets/gram-winoground/images. Data can be downloaded from Hugging Face; this should go into evals/gram-winoground.

(Semantic) Free Word Association Task: Assets and data from children are available in this repo. These were transcribed from Entwisle (1966), but thresholded to remove idiosyncratic responses. Assets and data from adults can be downloaded from the Florida Free Association Norms by running sh assets/sem-wat/wat_adult_dl.sh.

(Semantic) Visual Object Categorisation: Assets are available in this repo. These were obtained either from Kiani et al. (2007) via Spriet et al. (2021), or reconstructed to ensure licensing at least as permissible as CC-BY-NC-SA. Data are also available in this repo, converted from SPSS files from the original paper.

(Semantic) THINGS Similarity: Assets are available from OSF. Data can be downloaded from OSF; this should go into evals/sem-things.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published