DQF

DQF (Dual-Index Query Framework) is a novel high-dimensional approximate nearest neighbor search framework that leverages a dual-layer index structure and dynamic search strategy based on a decision tree. It optimizes search efficiency by prioritizing high-frequency queries through a "hot index" and managing dynamic query distributions with an incremental update mechanism. This code forked off from code for NSSG algorithm.

Benchmark datasets

Data set	Download	dimension	nb base vectors	nb query vectors	original website
SIFT1M	original website	128	1,000,000	10,000	original website
GIST1M	original website	128	1,000,000	1,000	original website
Crawl	crawl.tar.gz (1.7GB)	300	1,989,995	10,000	original website
GloVe-100	glove-100.tar.gz (424MB)	100	1,183,514	10,000	original website

How to use

0. Download the dataset

Here, we use SIFT1M as an example.

mkdir -p ./build/data && cd ./build/data
wget ftp://ftp.irisa.fr/local/texmex/corpus/sift.tar.gz
tar -xf sift.tar.gz
cd ..

1. Compile

Prerequisite : openmp, opencv, cmake, boost

cmake ..
make -j

2. Build the full index

For example:

cd /path/to/project/build/tests/
./test_build_full_index data_path knn_graph_path ssg_path K L_KNNG iter S R_KNNG L_SSG R_SSG Angle

data_path: Path to the original data.
knn_graph_path: Path to the pre-built kNN graph.
ssg_path: Path where the resulting SSG is stored.
K, L_KNNG, iter, S, R_KNNG: Parameters of EFANNA.
L_SSG, R_SSG, Angle: Parameters of NSSG.

NOTE: Data alignment is essential for the correctness of our procedure, because we use SIMD instructions for acceleration of numerical computing such as AVX and SSE2. You should use it to ensure your data elements (feature) is aligned with 8 or 16 int or float. For example, if your features are of dimension 70, then it should be extend to dimension 72. And the last 2 dimension should be filled with 0 to ensure the correctness of the distance computing. And this is what data_align() does.

To test all datasets, the Bash commands are as follows:

./test_build_full_index ../data/sift/sift_base.fvecs ./sift.knng ./sift.ssg 200 200 12 10 100 100 50 60
./test_build_full_index ../data/gist/gist_base.fvecs ./gist.knng ./gist.ssg 400 400 12 15 100 500 70 60
./test_build_full_index ../data/crawl/crawl_base.fvecs ./crawl.knng ./crawl.ssg 400 420 12 15 100 500 40 60
./test_build_full_index ../data/glove-100/glove-100_base.fvecs ./glove.knng ./glove.ssg 400 420 12 20 200 500 50 60

3. Search

For example:

cd /path/to/project/build/tests/
./test_search data_path ssg_path hot_knn_graph_path hot_ssg_path K L_KNNG iter S R_KNNG L_SSG R_SSG Angle Beta K Alpha

data_path: Path to the original data.
ssg_path: Path to the pre-built full SSG graph.
hot_knn_graph_path, hot_ssg_path: Path to the hot KNNG graph and SSG graph.
K, L_KNNG, iter, S, R_KNNG: Parameters of EFANNA, built for the hot index (same as the full index).
L_SSG, R_SSG, Angle: Parameters of NSSG, built for the hot index (same as the full index).
Beta: Used to calculate the hot index S_L = K + (L - K) / beta.
K: Number of final nearest neighbors.
Alpha: Parameter of the Zipf distribution (default: 1.2).

To test all datasets, the Bash commands are as follows:

./test_search ../data/sift/sift_base.fvecs sift.ssg sift_s.knng sift_s.ssg 200 200 12 10 100 100 50 60 50 10 1.2
./test_search ../data/gist/gist_base.fvecs gist.ssg gist_s.knng gist_s.ssg 400 400 12 15 100 500 70 60 10 10 1.2
./test_search ../data/crawl/crawl_base.fvecs crawl.ssg crawl_s.knng crawl_s.ssg 400 420 12 15 100 500 40 60 2 10 1.2
./test_search ../data/glove-100/glove-100_base.fvecs glove.ssg glove_s.knng glove_s.ssg 400 420 12 20 200 500 50 60 2 10 1.2

4. Test query shift

For example:

cd /path/to/project/build/tests/
./test_query_shift data_path ssg_path hot_knn_graph_path hot_ssg_path K L_KNNG iter S R_KNNG L_SSG R_SSG Angle Beta K L Alpha

data_path: Path to the original data.
ssg_path: Path to the pre-built full SSG graph.
hot_knn_graph_path, hot_ssg_path: Path to the hot KNNG graph and SSG graph.
K, L_KNNG, iter, S, R_KNNG: Parameters of EFANNA, built for the hot index (same as the full index).
L_SSG, R_SSG, Angle: Parameters of NSSG, built for the hot index (same as the full index).
Beta: Used to calculate the hot index S_L = K + (L - K) / beta.
K: Number of final nearest neighbors.
L: Limit on the number of candidate nodes during search.
Alpha: Parameter of the Zipf distribution (default: 1.2).

To test all datasets, the Bash commands are as follows:

./test_query_shift ../data/sift/sift_base.fvecs sift.ssg sift_s.knng sift_s.ssg 200 200 12 10 100 100 50 60 50 10 20 1.2
./test_query_shift ../data/gist/gist_base.fvecs gist.ssg gist_s.knng gist_s.ssg 400 400 12 15 100 500 70 60 10 10 50 1.2
./test_query_shift ../data/crawl/crawl_base.fvecs crawl.ssg crawl_s.knng crawl_s.ssg 400 420 12 15 100 500 40 60 2 10 40 1.2
./test_query_shift ../data/glove-100/glove-100_base.fvecs glove.ssg glove_s.knng glove_s.ssg 400 420 12 20 200 500 50 60 2 10 110 1.2

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
include		include
src		src
tests		tests
CMakeLists.txt		CMakeLists.txt
README.md		README.md
test_build_full_index.sh		test_build_full_index.sh
test_query_shift.sh		test_query_shift.sh
test_search.sh		test_search.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DQF

Benchmark datasets

How to use

0. Download the dataset

1. Compile

2. Build the full index

3. Search

4. Test query shift

About

Uh oh!

Releases

Packages

Languages

ZJU-DAILY/DQF

Folders and files

Latest commit

History

Repository files navigation

DQF

Benchmark datasets

How to use

0. Download the dataset

1. Compile

2. Build the full index

3. Search

4. Test query shift

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages