Skip to content

Commit

Permalink
Add benchmark
Browse files Browse the repository at this point in the history
  • Loading branch information
1yefuwang1 committed Jul 7, 2024
1 parent f6175d0 commit 3298442
Show file tree
Hide file tree
Showing 3 changed files with 62 additions and 2 deletions.
6 changes: 6 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,12 @@ jobs:
done
done
- name: Run benchmark
shell: bash
run: |
python -m pip install -r benchmark/requirements.txt
python benchmark/benchmark.py
upload_wheels:
name: Upload wheels
if: ${{ github.event.inputs.upload_wheel != 'no' && github.event_name != 'pull_request' }}
Expand Down
57 changes: 55 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,67 @@ For other languages, vectorlite.[so|dll|dylib] can be extracted from the wheel f

Vectorlite is currently in beta. There could be breaking changes.
## Highlights
1. Fast ANN-search backed by hnswlib.
1. Fast ANN-search backed by hnswlib. Please see benchmark below.
2. Works on Windows, Linux and MacOS.
3. SIMD accelerated vector distance calculation for x86 platform, using `vector_distance()`
4. Supports all vector distance types provided by hnswlib: l2(squared l2), cosine, ip(inner product). For more info please check [hnswlib's doc](https://github.com/nmslib/hnswlib/tree/v0.8.0?tab=readme-ov-file#supported-distances).
4. Supports all vector distance types provided by hnswlib: l2(squared l2), cosine, ip(inner product. I do not recomend you to use it though). For more info please check [hnswlib's doc](https://github.com/nmslib/hnswlib/tree/v0.8.0?tab=readme-ov-file#supported-distances).
3. Full control over HNSW parameters for performance tuning.
4. Metadata filter support (requires sqlite version >= 3.38).
5. Index serde support. A vectorlite table can be saved to a file, and be reloaded from it. Index files created by hnswlib can also be loaded by vectorlite.
6. Vector json serde support using `vector_from_json()` and `vector_to_json()`.

## Benchamrk
Vectorlite is fast. Compared with [sqlite-vss](https://github.com/facebookresearch/faiss). vectorlite is 10x faster in insertion and 2x-10x faster in searching with much better recall rate.
The benchmark method is that:
1. Insert 10000 randomly-generated vectors into a vectorlite table
2. Randomly generate 100 vectors and then query the table with them.

The benchmark code can be found in benchmark folder.
It can also be used as an example of how to improve recall_rate for your scenario by tuning HNSW parameters.

```
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ distance_type ┃ vector dimension ┃ ef_construction ┃ M ┃ ef_search ┃ insert_time(per vector) ┃ search_time(per query) ┃ recall_rate ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ l2 │ 256 │ 16 │ 200 │ 10 │ 291.30 us │ 68.78 us │ 52.20% │
│ l2 │ 256 │ 16 │ 200 │ 50 │ 291.30 us │ 195.49 us │ 88.70% │
│ l2 │ 256 │ 16 │ 200 │ 100 │ 291.30 us │ 347.22 us │ 97.60% │
│ l2 │ 256 │ 32 │ 200 │ 10 │ 290.10 us │ 73.92 us │ 52.20% │
│ l2 │ 256 │ 32 │ 200 │ 50 │ 290.10 us │ 195.06 us │ 88.70% │
│ l2 │ 256 │ 32 │ 200 │ 100 │ 290.10 us │ 286.31 us │ 97.60% │
│ l2 │ 1024 │ 16 │ 200 │ 10 │ 1368.79 us │ 410.93 us │ 39.60% │
│ l2 │ 1024 │ 16 │ 200 │ 50 │ 1368.79 us │ 1323.04 us │ 81.60% │
│ l2 │ 1024 │ 16 │ 200 │ 100 │ 1368.79 us │ 1854.42 us │ 94.40% │
│ l2 │ 1024 │ 32 │ 200 │ 10 │ 1356.85 us │ 456.60 us │ 39.60% │
│ l2 │ 1024 │ 32 │ 200 │ 50 │ 1356.85 us │ 1298.97 us │ 81.60% │
│ l2 │ 1024 │ 32 │ 200 │ 100 │ 1356.85 us │ 2082.90 us │ 94.40% │
│ cosine │ 256 │ 16 │ 200 │ 10 │ 294.52 us │ 70.68 us │ 57.90% │
│ cosine │ 256 │ 16 │ 200 │ 50 │ 294.52 us │ 167.39 us │ 92.10% │
│ cosine │ 256 │ 16 │ 200 │ 100 │ 294.52 us │ 242.41 us │ 97.90% │
│ cosine │ 256 │ 32 │ 200 │ 10 │ 259.53 us │ 63.86 us │ 57.90% │
│ cosine │ 256 │ 32 │ 200 │ 50 │ 259.53 us │ 198.87 us │ 92.10% │
│ cosine │ 256 │ 32 │ 200 │ 100 │ 259.53 us │ 311.33 us │ 97.90% │
│ cosine │ 1024 │ 16 │ 200 │ 10 │ 1164.01 us │ 383.68 us │ 45.20% │
│ cosine │ 1024 │ 16 │ 200 │ 50 │ 1164.01 us │ 1090.47 us │ 83.90% │
│ cosine │ 1024 │ 16 │ 200 │ 100 │ 1164.01 us │ 1628.61 us │ 95.50% │
│ cosine │ 1024 │ 32 │ 200 │ 10 │ 1185.78 us │ 371.90 us │ 45.20% │
│ cosine │ 1024 │ 32 │ 200 │ 50 │ 1185.78 us │ 1362.33 us │ 83.90% │
│ cosine │ 1024 │ 32 │ 200 │ 100 │ 1185.78 us │ 1697.19 us │ 95.50% │
└───────────────┴──────────────────┴─────────────────┴─────┴───────────┴─────────────────────────┴────────────────────────┴─────────────┘
```
The result of the same benchmark for [sqlite-vss](https://github.com/asg017/sqlite-vss) is below:
```
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ vector dimension ┃ insert_time(per vector) ┃ search_time(per query) ┃ recall_rate ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ 256 │ 3694.37 us │ 755.17 us │ 55.40% │
│ 1024 │ 18598.29 us │ 3848.64 us │ 48.60% │
└──────────────────┴─────────────────────────┴────────────────────────┴─────────────┘
```
I believe the performance difference is mainly caused by the underlying vector search library.
Sqlite-vss uses [faiss](https://github.com/facebookresearch/faiss), which is optimized for batched scenarios.
Vectorlite uses [hnswlib](https://github.com/facebookresearch/faiss), which is optimized for online vector searching.

# Quick Start
The quickest way to get started is to install vectorlite using python.
```shell
Expand Down Expand Up @@ -121,6 +173,7 @@ vectorlite_py wheel can be found in `dist` folder
- [ ] Support Multi-vector document search and epsilon search
- [ ] Support multi-threaded search
- [ ] Release vectorlite to more package managers.
- [ ] Support more vector types, e.g. float16, int8.

# Known limitations
1. On a single query, a knn_search vector constraint can only be paired with at most one rowid constraint and vice versa.
Expand Down
1 change: 1 addition & 0 deletions examples/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
vectorlite_py
numpy>=1.22
apsw>=3.45

0 comments on commit 3298442

Please sign in to comment.