You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+26-22
Original file line number
Diff line number
Diff line change
@@ -41,7 +41,7 @@ For other languages, `vectorlite.[so|dll|dylib]` can be extracted from the wheel
41
41
42
42
Vectorlite is currently in beta. There could be breaking changes.
43
43
## Highlights
44
-
1. Fast ANN(approximate nearest neighbors) search backed by hnswlib. Vector query is significantly faster than similar projects like [sqlite-vec](https://github.com/asg017/sqlite-vec) and [sqlite-vss](https://github.com/asg017/sqlite-vss). Please see benchmark [below](https://github.com/1yefuwang1/vectorlite?tab=readme-ov-file#benchmark).
44
+
1. Fast ANN(approximate nearest neighbors) search backed by [hnswlib](https://github.com/nmslib/hnswlib). Vector query is significantly faster than similar projects like [sqlite-vec](https://github.com/asg017/sqlite-vec) and [sqlite-vss](https://github.com/asg017/sqlite-vss). Please see benchmark [below](https://github.com/1yefuwang1/vectorlite?tab=readme-ov-file#benchmark).
45
45
2. Works on Windows, Linux and MacOS(x64 and ARM).
46
46
3. A fast and portable SIMD accelerated vector distance implementation using Google's [highway](https://github.com/google/highway) library. On my PC(i5-12600KF with AVX2 support), vectorlite's implementation is 1.5x-3x faster than hnswlib's when dealing vectors with dimension >= 256.
47
47
4. Supports all vector distance types provided by hnswlib: l2(squared l2), cosine, ip(inner product. I do not recomend you to use it though). For more info please check [hnswlib's doc](https://github.com/nmslib/hnswlib/tree/v0.8.0?tab=readme-ov-file#supported-distances).
@@ -103,35 +103,32 @@ select rowid, distance from my_vectorlite_table where knn_search(vector_name, kn
103
103
104
104
## Benchmark
105
105
How the benchmark is done:
106
-
1. Insert 10000 randomly-generated vectors into a vectorlite table with HNSW parameters ef_construction=100, M=30.
107
-
2. Randomly generate 100 vectors and then query the table with them for 10 nearest neighbors with ef=10,50,100.
106
+
1. Insert 3000/20000 randomly-generated vectors of dimension 128,512,1536 and 3000 into a vectorlite table with HNSW parameters ef_construction=100, M=30.
107
+
2. Randomly generate 100 vectors and then query the table with them for 10 nearest neighbors with ef=10,50,100 to see how ef impacts recall rate.
108
108
3. Calculate recall rate by comparing the result with the neighbors calculated using brute force.
109
109
4. vectorlite_scalar_brute_force(which is just inserting vectors into a normal sqlite table and do `select rowid from my_table order by vector_distance(query_vector, embedding, 'l2') limit 10`) is benchmarked as the baseline to see how much hnsw speeds up vector query.
110
-
5. hnswlib is also benchmarked to see how much cost SQLite adds to vectorlite.
110
+
5.[hnswlib](https://github.com/nmslib/hnswlib) is also benchmarked to see how much cost SQLite adds to vectorlite.
111
111
The benchmark is run in WSL on my PC with a i5-12600KF intel CPU and 16G RAM.
112
112
113
113
114
-
TL;DR
114
+
TL;DR:
115
+
1. Vectorlite's vector query is 3x-100x faster than [sqlite-vec](https://github.com/asg017/sqlite-vec) at the cost of lower recall rate. The difference gets larger when the dataset size grows, which is expected because sqlite-vec only supports brute force.
116
+
2. Surprisingly, vectorlite_scalar_brute_force's vector query is about 1.5x faster for vectors with dimension >= 512 but slower than sqlite-vec for 128d vectors. vectorlite_scalar_brute_force's vector insertion is 3x-8x faster than sqlite-vec.
117
+
3. Compared with [hnswlib](https://github.com/nmslib/hnswlib), vectorlite provides almost identical recall rate. Vector query speed with L2 distance is on par with 128d vectors and is 1.5x faster when dealing with 3000d vectors. Mainly because vectorlite's vector distance implementation is faster. But vectorlite's vector insertion is about 4x-5x slower.
118
+
4. Compared with brute force baseline(vectorlite_scalar_brute_force), vectorlite's knn query is 8x-80x faster.
119
+
120
+
The benchmark code can be found in [benchmark folder](https://github.com/1yefuwang1/vectorlite/tree/main/benchmark), which can be used as an example of how to improve recall rate for your scenario by tuning HNSW parameters.
121
+
### 3000 vectors
115
122
When dealing with 3000 vectors(which is a fairly small dataset):
116
123
1. Compared with [sqlite-vec](https://github.com/asg017/sqlite-vec), vectorlite's vector query can be 3x-15x faster with 128-d vectors, 6x-26x faster with 512-d vectors, 7x-30x faster with 1536-d vectors and 6x-24x faster with 3000-d vectors. But vectorlite's vector insertion is 6x-16x slower, which is expected because sqlite-vec uses brute force only and doesn't do much indexing.
117
124
2. Compared with vectorlite_scalar_brute_force, hnsw provides about 10x-40x speed up.
118
-
3. Compared with hnswlib, vectorlite's vector query speed and accuracy is on par and is 1.5x faster when dealing with 3000d vectors. The credit goes to vectorlite's vector distance implementation. But vector insertion is about 4x-5x slower.
119
-
4. vectorlite_scalar_brute_force's vector insertion 4x-7x is faster than sqlite-vec, and vector query is about 1.7x faster.
120
-
121
-
122
-
2. When dealing with 20000 vectors,
123
-
1. Compared with [sqlite-vec](https://github.com/asg017/sqlite-vec), vectorlite's vector query can be 8x-33x faster with 128-d vectors, 20x-100x faster with 3000d vectors with speed-accuracy trade-off.
124
-
2. Compared with vectorlite_scalar_brute_force, hnsw provides about 8x-80x speed up with reduced recall rate at 13.8%-85%.
125
-
3. Compared with hnswlib,
125
+
3. Compared with hnswlib, vectorlite provides almost identical recall rate. Vector query speed is on par with 128d vectors and is 1.5x faster when dealing with 3000d vectors. Mainly because vectorlite's vector distance implementation is faster. But vector insertion is about 4x-5x slower.
126
+
4. vectorlite_scalar_brute_force's vector insertion 4x-7x is faster than sqlite-vec, and vector query is about 1.7x faster when dealing with vectors of dimension >= 512.
126
127
127
128
128
129
129
-
The benchmark code can be found in [benchmark folder](https://github.com/1yefuwang1/vectorlite/tree/main/benchmark), which can be used as an example of how to improve recall rate for your scenario by tuning HNSW parameters.
130
-
131
-
Picking good HNSW parameters is crucial for achieving high performance. Please benchmark and find the best HNSW parameters for your scenario.
@@ -234,12 +231,19 @@ Bencharmk sqlite_vec as comparison.
234
231
</details>
235
232
236
233
### 20000 vectors
234
+
When dealing with 20000 vectors,
235
+
1. Compared with [sqlite-vec](https://github.com/asg017/sqlite-vec), vectorlite's vector query can be 8x-100x faster depending on vector dimension.
236
+
2. Compared with vectorlite_scalar_brute_force, hnsw provides about 8x-80x speed up with reduced recall rate at 13.8%-85% depending on vector dimension.
237
+
3. Compared with hnswlib, vectorlite provides almost identical recall rate. Vector query is on par with 128d vectors and can be 1.5x faster with 3000d vectors. But vector insertion is 3x-9x slower.
238
+
4. vectorlite_scalar_brute_force's vector insertion is 4x-8x faster than sqlite-vec. sqlite-vec's vector query is 1.5x faster with 128d vectors and 1.8x slower when vector dimension>=512.
239
+
240
+
237
241
Please note:
238
242
1. sqlite-vss is not benchmarked with 20000 vectors because its index creation takes so long that it doesn't finish in hours.
239
-
2. sqlite-vec's vector query is benchmarked but not plotted in the figures because it's search time is disproportionally long.
243
+
2. sqlite-vec's vector query is benchmarked and included in the raw data, but not plotted in the figure because it's search time is disproportionally long.
0 commit comments