Supplementary Code for Unsupervised Neural Quantization for Compressed-Domain Similarity Search
It trains a neural network that maps database vectors in 8- or 16-byte codes optimized for nearest neighbor search.
- A machine with some CPU (preferably 8+) and a GPU
- Running with no GPU or less than 4 CPU cores may cause premature senility;
- Some popular Linux x64 distribution
- Tested on Ubuntu16.04, should work fine on any popular linux64 and even MacOS;
- Windows and x32 systems may require heavy wizardry to run;
- When in doubt, use Docker, preferably GPU-enabled (i.e. nvidia-docker)
- Clone or download this repo.
cd
yourself to it's root directory. - Grab or build a working python enviromnent. Anaconda works fine.
- Install standard compilers (e.g.
gcc
andg++
for most linux) andswig3.0
- On ubuntu, just
sudo apt-get -y install swig3.0 gcc-4.9 g++-4.9 libstdc++6 wget unzip
- and maybe
sudo ln -s /usr/bin/swig3.0 /usr/bin/swig
for a good measure
- Install packages from
requirements.txt
, with a little twist
- FAISS library is hard to install via pip, we recommend using their anaconda installation
- You will also need jupyter or some other way to work with .ipynb files
- Run jupyter notebook and open a notebook in
./notebooks/
- Before you run the first cell, change
%env CUDA_VISIBLE_DEVICES=#
to devices that you plan to use. - First it downloads data from dropbox. You will need up to 1.5Gb
- Second, it defines an experiment setup. The setups are:
bigann1m_unq_8b.ipynb
- BIGANN1M dataset, 8 bytes per vectordeep1m_unq_8b.ipynb
- DEEP1M dataset, 8 bytes per vectorbigann1m_unq_16b.ipynb
- BIGANN1M dataset, 16 bytes per vectordeep1m_unq_16b.ipynb
- DEEP1M dataset, 16 bytes per vector