Skip to content

Commit

Permalink
intro to nsw and hnsw
Browse files Browse the repository at this point in the history
Signed-off-by: Alex Chi <[email protected]>
  • Loading branch information
skyzh committed Jan 15, 2024
1 parent fd02564 commit 48bbf94
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 5 deletions.
4 changes: 2 additions & 2 deletions tutorial/src/10-epilogue.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

At this point, you should have learned how to add vector capabilities to an existing relational database system and should have a solid understanding of how vector indexes work. However, there are still some tasks that need to be completed before your vector extension is ready for production. These include persisting the vector indexes to disk, supporting deletion and updates, preventing out-of-memory issues, ensuring efficient index lookups with the disk format, and making the index compatible with the underlying database system, including transactions and multi-version indexes.

Adding vector capabilities to a database system is relatively easy. In Professor Andy Pavlo's [Databases in 2023: A Year in Review](https://ottertune.com/blog/2023-databases-retrospective) on Ottertune, he mentioned that the engineering effort required to introduce a new access method and index data structure for vector search is *not that much*. This tutorial should have demonstrated that extending a database system with vector capabilities can be accomplished without fundamentally altering the system, and by adding additional components, it can easily support vector similarity searches.
Adding vector capabilities to a database system is relatively easy. In Professor Andy Pavlo's [Databases in 2023: A Year in Review](https://ottertune.com/blog/2023-databases-retrospective) in Ottertune blog, he mentioned that the engineering effort required to introduce a new access method and index data structure for vector search is *not that much*. This tutorial should have demonstrated that extending a database system with vector capabilities can be accomplished without fundamentally altering the system, and by adding additional components, it can easily support vector similarity searches.

> There are two likely explanations for the quick proliferation of vector search indexes. The first is that similarity search via embeddings is such a compelling use case that every DBMS vendor rushed out their version and announced it immediately. The second is that the engineering effort to introduce what amounts to just a new access method and index data structure is small enough that it did not take that much work for the DBMS vendors to add vector search. Most vendors did not write their vector index from scratch and instead just integrated one of the several high-quality open-source libraries available (e.g., Microsoft DiskANN, Meta Faiss). \
> *--Ottertune's Databases in 2023: A Year in Review*
Personally speaking, vector database is still a new concept to me. I still remember my first involvement with vector databases while interning at Neon in the summer of 2023. One day, Nikita added me to a Slack channel called `#vector`, where I discovered that Konstantin was working on a new vector extension for PostgreSQL called [pgembedding](https://github.com/neondatabase/pg_embedding) incorporating HNSW support. Eventually, this extension got discontinued after pgvector added HNSW support later in 2023. Nevertheless, it was my initial exposure to vector searches and vector databases, and the challenges that developers face when dealing with vector indexes in relational databases are exciting areas for long-term exploration.
Personally speaking, vector database is still a new concept to me. I still remember my first involvement with vector databases while interning at Neon in the summer of 2023. One day, Nikita added me to a Slack channel called `#vector`, where I discovered that Konstantin was working on a new vector extension for PostgreSQL called [pgembedding](https://github.com/neondatabase/pg_embedding) incorporating HNSW support. Eventually, this extension got discontinued after pgvector added HNSW support later in 2023. Nevertheless, it was my initial exposure to vector searches and vector databases, and the challenges that developers face when dealing with vector indexes in relational databases are exciting areas for long-term exploration.
2 changes: 1 addition & 1 deletion tutorial/src/cpp-05-ivfflat.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# IVFFlat Index
# IVFFlat (Inverted File Flat) Index

IVFFlat (InVerted File Flat) Index is a simple vector index that splits data into buckets (aka. quantization-based index) so as to accelerate vector similarities search.

Expand Down
5 changes: 4 additions & 1 deletion tutorial/src/cpp-06-01-nsw.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
# NSW Index
# NSW (Navigable Small Worlds) Index


Before building the so-called HNSW (Hierarchical Navigable Small Worlds) index, we will start with the basic component of the HNSW index -- NSW (Navigable Small Worlds). In this chapter, we will build a graph-based index structure for vectors.

The starter code for this part is ready but the write-up is still working-in-progress.

Expand Down
11 changes: 10 additions & 1 deletion tutorial/src/cpp-06-02-hnsw.md
Original file line number Diff line number Diff line change
@@ -1 +1,10 @@
# HNSW Index
# HNSW (Hierarchical Navigable Small Worlds) Index

Now that we built NSW indexes in the previous chapter, we can now have multiple layers of NSW indexes and add hierarchy to the index structure to make it more efficient.

The list of files that you will likely need to modify:

```
src/include/storage/index/hnsw_index.h
src/storage/index/hnsw_index.cpp
```

0 comments on commit 48bbf94

Please sign in to comment.