forked from 1yefuwang1/vectorlite
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add doc * Add api reference * Add news * Add getting-started.md * Clean up * Only enable doc build on pushing to main
- Loading branch information
1 parent
57cc4ac
commit 4d981db
Showing
10 changed files
with
605 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
name: documentation | ||
|
||
on: | ||
push: | ||
branches: | ||
- main | ||
workflow_dispatch: | ||
|
||
permissions: | ||
contents: write | ||
|
||
jobs: | ||
docs: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- uses: actions/setup-python@v5 | ||
- name: Install dependencies | ||
run: | | ||
cp -r media doc/markdown | ||
pip install sphinx sphinx_rtd_theme myst_parser | ||
- name: Sphinx build | ||
run: | | ||
sphinx-build doc _build | ||
- name: Deploy to GitHub Pages | ||
uses: peaceiris/actions-gh-pages@v3 | ||
if: ${{ (github.event_name == 'push' || github.event_name == 'workflow_dispatch') && github.ref == 'refs/heads/main' }} | ||
with: | ||
publish_branch: gh-pages | ||
github_token: ${{ secrets.GITHUB_TOKEN }} | ||
publish_dir: _build/ | ||
force_orphan: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line, and also | ||
# from the environment for the first two. | ||
SPHINXOPTS ?= | ||
SPHINXBUILD ?= sphinx-build | ||
SOURCEDIR = . | ||
BUILDDIR = _build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Configuration file for the Sphinx documentation builder. | ||
# | ||
# For the full list of built-in configuration values, see the documentation: | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html | ||
|
||
# -- Project information ----------------------------------------------------- | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information | ||
|
||
project = 'vectorlite' | ||
copyright = 'vectorlite contributors' | ||
author = '[email protected]' | ||
release = '0.2.0' | ||
|
||
|
||
# -- General configuration --------------------------------------------------- | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration | ||
|
||
extensions = ['myst_parser'] | ||
|
||
templates_path = ['_templates'] | ||
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] | ||
|
||
|
||
# -- Options for HTML output ------------------------------------------------- | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output | ||
|
||
html_theme = 'sphinx_rtd_theme' | ||
html_static_path = ['_static'] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
.. Example documentation master file, created by | ||
sphinx-quickstart on Sat Sep 23 20:35:12 2023. | ||
You can adapt this file completely to your liking, but it should at least | ||
contain the root `toctree` directive. | ||
Welcome to vectorlite's documentation! | ||
=================================== | ||
|
||
.. toctree:: | ||
:maxdepth: 3 | ||
:caption: Contents: | ||
|
||
markdown/overview.md | ||
markdown/api.md | ||
markdown/getting-started.md | ||
markdown/news.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
@ECHO OFF | ||
|
||
pushd %~dp0 | ||
|
||
REM Command file for Sphinx documentation | ||
|
||
if "%SPHINXBUILD%" == "" ( | ||
set SPHINXBUILD=sphinx-build | ||
) | ||
set SOURCEDIR=. | ||
set BUILDDIR=_build | ||
|
||
%SPHINXBUILD% >NUL 2>NUL | ||
if errorlevel 9009 ( | ||
echo. | ||
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx | ||
echo.installed, then set the SPHINXBUILD environment variable to point | ||
echo.to the full path of the 'sphinx-build' executable. Alternatively you | ||
echo.may add the Sphinx directory to PATH. | ||
echo. | ||
echo.If you don't have Sphinx installed, grab it from | ||
echo.https://www.sphinx-doc.org/ | ||
exit /b 1 | ||
) | ||
|
||
if "%1" == "" goto help | ||
|
||
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% | ||
goto end | ||
|
||
:help | ||
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% | ||
|
||
:end | ||
popd |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# API reference | ||
Vectorlite provides the following APIs. | ||
Please note vectorlite is currently in beta. There could be breaking changes. | ||
## Free-standing Application Defined SQL functions | ||
The following functions can be used in any context. | ||
``` sql | ||
vectorlite_info() -- prints version info and some compile time info. e.g. Is SSE, AVX enabled. | ||
vector_from_json(json_string) -- converts a json array of type TEXT into BLOB(a c-style float32 array) | ||
vector_to_json(vector_blob) -- converts a vector of type BLOB(c-style float32 array) into a json array of type TEXT | ||
vector_distance(vector_blob1, vector_blob2, distance_type_str) -- calculate vector distance between two vectors, distance_type_str could be 'l2', 'cosine', 'ip' | ||
``` | ||
|
||
In fact, one can easily implement brute force searching using `vector_distance`, which returns 100% accurate search results: | ||
```sql | ||
-- use a normal sqlite table | ||
create table my_table(rowid integer primary key, embedding blob); | ||
|
||
-- insert | ||
insert into my_table(rowid, embedding) values (0, {your_embedding}); | ||
-- search for 10 nearest neighbors using l2 squared distance | ||
select rowid from my_table order by vector_distance({query_vector}, embedding, 'l2') asc limit 10 | ||
|
||
``` | ||
## Virtual Table | ||
The core of vectorlite is the [virtual table](https://www.sqlite.org/vtab.html) module, which is used to hold vector index and way faster than brute force approach at the cost of not being 100% accurate. | ||
A vectorlite table can be created using: | ||
|
||
```sql | ||
-- Required fields: table_name, vector_name, dimension, max_elements | ||
-- Optional fields: | ||
-- 1. distance_type: defaults to l2 | ||
-- 2. ef_construction: defaults to 200 | ||
-- 3. M: defaults to 16 | ||
-- 4. random_seed: defaults to 100 | ||
-- 5. allow_replace_deleted: defaults to true | ||
-- 6. index_file_path: no default value. If not provided, the table will be memory-only. If provided, vectorlite will try to load index from the file and save to it when db connection is closed. | ||
create virtual table {table_name} using vectorlite({vector_name} float32[{dimension}] {distance_type}, hnsw(max_elements={max_elements}, {ef_construction=200}, {M=16}, {random_seed=100}, {allow_replace_deleted=true}), {index_file_path}); | ||
``` | ||
You can insert, update and delete a vectorlite table as if it's a normal sqlite table. | ||
```sql | ||
-- rowid is required during insertion, because rowid is used to connect the vector to its metadata stored elsewhere. Auto-generating rowid doesn't makes sense. | ||
insert into my_vectorlite_table(rowid, vector_name) values ({your_rowid}, {vector_blob}); | ||
-- Note: update and delete statements that uses rowid filter require sqlite3_version >= 3.38 to run. | ||
update my_vectorlite_table set vector_name = {new_vector_blob} where rowid = {your_rowid}; | ||
delete from my_vectorlite_table where rowid = {your_rowid}; | ||
``` | ||
The following functions should be only used when querying a vectorlite table | ||
```sql | ||
-- returns knn_parameter that will be passed to knn_search(). | ||
-- vector_blob: vector to search | ||
-- k: how many nearest neighbors to search for | ||
-- ef: optional. A HNSW parameter that controls speed-accuracy trade-off. Defaults to 10 at first. If set to another value x, it will remain x if not specified again in another query within a single db connection. | ||
knn_param(vector_blob, k, ef) | ||
-- Should only be used in the `where clause` in a `select` statement to tell vectorlite to speed up the query using HNSW index | ||
-- vector_name should match the vectorlite table's definition | ||
-- knn_parameter is usually constructed using knn_param() | ||
knn_search(vector_name, knn_parameter) | ||
-- An example of vector search query. `distance` is an implicit column of a vectorlite table. | ||
select rowid, distance from my_vectorlite_table where knn_search(vector_name, knn_param({vector_blob}, {k})) | ||
-- An example of vector search query with pushed-down metadata(rowid) filter, requires sqlite_version >= 3.38 to run. | ||
select rowid, distance from my_vectorlite_table where knn_search(vector_name, knn_param({vector_blob}, {k})) and rowid in (1,2,3,4,5) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# Getting started | ||
The quickest way to get started is to install vectorlite using python. | ||
```shell | ||
# Note: vectorlite-py not vectorlite. vectorlite is another project. | ||
pip install vectorlite-py apsw numpy | ||
``` | ||
Vectorlite's metadata filter feature requires sqlite>=3.38. Python's builtin `sqlite` module is usually built with old sqlite versions. So `apsw` is used here as sqlite driver, because it provides bindings to latest sqlite. Vectorlite still works with old sqlite versions if metadata filter support is not required. | ||
Below is a minimal example of using vectorlite. It can also be found in the [examples folder](https://github.com/1yefuwang1/vectorlite/tree/v0.2.0/examples). | ||
|
||
```python | ||
import vectorlite_py | ||
import apsw | ||
import numpy as np | ||
""" | ||
Quick start of using vectorlite extension. | ||
""" | ||
|
||
conn = apsw.Connection(':memory:') | ||
conn.enable_load_extension(True) # enable extension loading | ||
conn.load_extension(vectorlite_py.vectorlite_path()) # load vectorlite | ||
|
||
cursor = conn.cursor() | ||
# check if vectorlite is loaded | ||
print(cursor.execute('select vectorlite_info()').fetchall()) | ||
|
||
# Vector distance calculation | ||
for distance_type in ['l2', 'cosine', 'ip']: | ||
v1 = "[1, 2, 3]" | ||
v2 = "[4, 5, 6]" | ||
# Note vector_from_json can be used to convert a JSON string to a vector | ||
distance = cursor.execute(f'select vector_distance(vector_from_json(?), vector_from_json(?), "{distance_type}")', (v1, v2)).fetchone() | ||
print(f'{distance_type} distance between {v1} and {v2} is {distance[0]}') | ||
|
||
# generate some test data | ||
DIM = 32 # dimension of the vectors | ||
NUM_ELEMENTS = 10000 # number of vectors | ||
data = np.float32(np.random.random((NUM_ELEMENTS, DIM))) # Only float32 vectors are supported by vectorlite for now | ||
|
||
# Create a virtual table using vectorlite using l2 distance (default distance type) and default HNSW parameters | ||
cursor.execute(f'create virtual table my_table using vectorlite(my_embedding float32[{DIM}], hnsw(max_elements={NUM_ELEMENTS}))') | ||
# Vector distance type can be explicitly set to cosine using: | ||
# cursor.execute(f'create virtual table my_table using vectorlite(my_embedding float32[{DIM}] cosine, hnsw(max_elements={NUM_ELEMENTS}))') | ||
|
||
# Insert the test data into the virtual table. Note that the rowid MUST be explicitly set when inserting vectors and cannot be auto-generated. | ||
# The rowid is used to uniquely identify a vector and serve as a "foreign key" to relate to the vector's metadata. | ||
# Vectorlite takes vectors in raw bytes, so a numpy vector need to be converted to bytes before inserting into the table. | ||
cursor.executemany('insert into my_table(rowid, my_embedding) values (?, ?)', [(i, data[i].tobytes()) for i in range(NUM_ELEMENTS)]) | ||
|
||
# Query the virtual table to get the vector at rowid 12345. Note the vector needs to be converted back to json using vector_to_json() to be human-readable. | ||
result = cursor.execute('select vector_to_json(my_embedding) from my_table where rowid = 1234').fetchone() | ||
print(f'vector at rowid 1234: {result[0]}') | ||
|
||
# Find 10 approximate nearest neighbors of data[0] and there distances from data[0]. | ||
# knn_search() is used to tell vectorlite to do a vector search. | ||
# knn_param(V, K, ef) is used to pass the query vector V, the number of nearest neighbors K to find and an optional ef parameter to tune the performance of the search. | ||
# If ef is not specified, ef defaults to 10. For more info on ef, please check https://github.com/nmslib/hnswlib/blob/v0.8.0/ALGO_PARAMS.md | ||
result = cursor.execute('select rowid, distance from my_table where knn_search(my_embedding, knn_param(?, 10))', [data[0].tobytes()]).fetchall() | ||
print(f'10 nearest neighbors of row 0 is {result}') | ||
|
||
# Find 10 approximate nearest neighbors of the first embedding in vectors with rowid within [1000, 2000) using metadata(rowid) filtering. | ||
rowids = ','.join([str(rowid) for rowid in range(1000, 2000)]) | ||
result = cursor.execute(f'select rowid, distance from my_table where knn_search(my_embedding, knn_param(?, 10)) and rowid in ({rowids})', [data[0].tobytes()]).fetchall() | ||
print(f'10 nearest neighbors of row 0 in vectors with rowid within [1000, 2000) is {result}') | ||
|
||
conn.close() | ||
|
||
``` | ||
|
||
More examples can be found in [examples](https://github.com/1yefuwang1/vectorlite/tree/v0.2.0/examples) folder. |
Oops, something went wrong.