Skip to content

Latest commit

 

History

History
58 lines (39 loc) · 3.38 KB

README.md

File metadata and controls

58 lines (39 loc) · 3.38 KB

Supervised Learning Experiments

Getting started

Data preparation

Please make sure ImageNet is downloaded to $DATASET/imagenet directory. Then, you may create training dataset variants of ImageNet-Captions, LAIONet, YFCC-15M, and CC-12M by following the instructions in the data_preparation folder. TSV files containing image paths will be stored under $DATASET/imagenet-captions, and corresponding class frequencies will be stored under freqs folder.

Evaluation is done on the ImageNet validation set, which is expected to be stored under $DATASET/imagenet/val. Optionally, we also support evaluating on ImageNetV2 and ImageNet-100. Example commands to download these datasets are provided below.

export DATASET=../datasets

# Download ImageNetV2
mkdir $DATASET/imagenetv2 && cd $DATASET/imagenetv2
wget https://huggingface.co/datasets/vaishaal/ImageNetV2/blob/main/imagenetv2-matched-frequency.tar.gz
tar -xvf imagenetv2-matched-frequency.tar.gz
rm imagenetv2-matched-frequency.tar.gz

# Download ImageNet-100
mkdir $DATASET/imagenet100 && cd $DATASET/imagenet100
git clone https://github.com/danielchyeh/ImageNet-100-Pytorch.git && cd ImageNet-100-Pytorch
python generate_IN100.py --source_folder $DATASET/imagenet --target_folder $DATASET/imagenet100
rm -r $DATASET/imagenet100/ImageNet-100-Pytorch

Environment setup

Nothing to take special care here. Basically just make sure PyTorch (>=2.0, with CUDA) is installed and there are at least 4 GPUs on your device.

Pre-trained heads

We have provided pre-extracted class embeddings for 1K ImageNet classes with different prompts and text encoders, check heads folder for details. You may also extract your own class embeddings using dump_clip_txt_features.py. Depending the text encoder you use, you may need to install corresponding libraries, e.g., clip, open_clip, and transformers.

Running

Training

We privide example scripts to replicate our experiments in the scripts folder. This includes investigations on vocabulary size (Sec. 3.3), data distribution (Sec. 3.4 & 3.5), and open-world concepts (Sec. 3.6). It also supports explorations on few-shot and open-world recognition (Sec. 4.1). You may run them directly or modify them to suit your needs. Checkpoints and intermediate evaluation results are saved to the output folder by default.

Evaluation

The metrics are already computed and saved during training. If you want to re-evaluate a trained model, you may run the following command:

bash scripts/eval.sh $PATH_TO_CHECKPOINT

The results will be saved to the same directory as the checkpoint file.