Pixel Linguist

Offical Repo of paper Pixel Sentence Representation Learning

Model Checkpoint: at HuggingFace
Github Repo: Github Repo
Paper: https://arxiv.org/pdf/2402.08183.pdf

Overview

Installation

conda create -n pixel python=3.9 -y && conda activate pixel
git clone https://github.com/gowitheflow-1998/Pixel-Linguist.git

package install

conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
conda install -c conda-forge pycairo pygobject manimpango scikit-learn
cd Pixel-Linguist
pip install -r requirements.txt
pip install -e .

Fallback fonts downloading

(Do not need to download this if directly git clone our repo)

python scripts/data/download_fallback_fonts.py ‘data/fallback_fonts’

Inference and Evaluation

sts benchmark:

python tools/evaluation_sts.py

Adjust the specific language you want to evaluate.

beir:

python tools/evaluation_retrieval.py

It is convenient to evaluate on other datasets available on BEIR, other than Natural Questions that we evaluated in the paper. Simply modify dataset name in the script.

Reproduce Pixel Linguist Training

Step 0: Visual alignment step:

bash run_bash/0-run_unsup.sh

For this step, run separately on all our unsup datasets to create 4 checkpoints, and do an ensemble using tools/ensemble.py.

Step 1: Topical alignment step:

bash run_bash/1-run_wikispan.sh

Step 2: Reasoning alignment step:

bash run_bash/2-run_allnli_finetune.sh

Step 3: multilingual transfer step:

bash run_bash/3-run_allnli-pm.sh

Go back and forth between Step 2 and Step 3 (see paper for exact procedure of the iterative training where "leapfrogging" pattern is found!) for 2-3 times for maximum performance, please end the training with English allnli, instead of parallel data.

Note

We find that training with an extra MLP (using PIXELForSequenceClassification class) but do inference without (using PIXELForRepresentation, which then drops the MLP), boosts the semantics performance a little bit, providing the performance in the latest version of the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
figures		figures
run_bash		run_bash
scripts		scripts
src		src
tools		tools
‘data/fallback_fonts’		‘data/fallback_fonts’
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pixel Linguist

Overview

Installation

package install

Fallback fonts downloading

Inference and Evaluation

Reproduce Pixel Linguist Training

Note

About

Releases

Packages

Languages

gowitheflow-1998/Pixel-Linguist

Folders and files

Latest commit

History

Repository files navigation

Pixel Linguist

Overview

Installation

package install

Fallback fonts downloading

Inference and Evaluation

Reproduce Pixel Linguist Training

Note

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages