Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

feat: lightweight, pure rust k-ANN vector database for long-term memory/knowledge-base #2

Open
jon-chuang opened this issue Apr 12, 2023 · 18 comments

Comments

@jon-chuang
Copy link

jon-chuang commented Apr 12, 2023

I think the next step in the project is lightweight ANN (approx. k-nearest neighbours search) vector database. Applications:

  1. Document store over local documents: code bases, journals, articles. Input: directory with text files.
  2. Chrome-assistant: A memory over your currently and recently opened tabs with llama-wasm
  3. Mobile: similar to local

Details:

  1. The k-ANN database should always be an optional dep compiled under a feature flag.
  2. We will reuse the loaded model for encoding. See: here, section 4.3. It suggests using the analog to [CLS] token. See e.g. LlamaIndex. Not too sure for decoding. It can just return the text string from the metadata. Alternately, one can load an embedding/decoding model that is serialized to ggml.

Problem definition:

  1. We are ok trading off a bit of performance to have something that has minimal surface area
  2. We want the database to be persistent - it should persist the index after generation. The model is very similar to ggml, we should generate the artifact once, or periodically, and then load the artifact (index) into memory.

Options:

  1. ❌ connect to existing vector database (qdrant, milvus, pinecone). But these are heavy deps, and also have many features designed that are more about scaling out (cloud native). We like transparency and owning the artifacts involved. We are willing to tradeoff a bit of perf and or implementation complexity for aim.
  2. ❌ Compile faiss as an optional dependency. Still a pretty huge dependency.
  3. 🦀 Something rust-native: e.g. hora. Not actively maintained, but still works and I've run it locally. We can slice out the core functionality (e.g. just HNSW). It already has a persisted format for index. We can add mmap support (optimization). Hopefully we can slice out to about 2K loc.

Plan:

  1. Use prompt engineering to allow model to indicate via a special unicode sequence. llama-rs will detect the unicode sequence and trigger a database lookup.
    • not clear how well this would work. Ideally, we should have prompt-tuned this, LoRA-based fine-tuning might work.
  2. Implement either partial encoding with existing LLM, or allow loading an embedding model.
@jon-chuang jon-chuang changed the title feat: lightweight, native ann-vector database for long-term memory feat: lightweight, native ann-vector database for long-term memory/knowledge-base Apr 12, 2023
@jon-chuang jon-chuang changed the title feat: lightweight, native ann-vector database for long-term memory/knowledge-base feat: lightweight, native k-ANN vector database for long-term memory/knowledge-base Apr 12, 2023
@jon-chuang jon-chuang changed the title feat: lightweight, native k-ANN vector database for long-term memory/knowledge-base feat: lightweight, pure rust k-ANN vector database for long-term memory/knowledge-base Apr 12, 2023
@hhamud
Copy link

hhamud commented Apr 12, 2023

interesting, like a rust specific version of this?

It would also be interesting if we could use this to store prompts and their outputs in the database and press the up key to re-use previous prompts or their outputs but we wouldn't even need a vector database for this specifically, we could just do that with a typical SQL database.

We would also need to re-visit this rustformers/llm#56

@jon-chuang
Copy link
Author

jon-chuang commented Apr 12, 2023

interesting, like a rust specific version of this?

Yes, there are many options available but they mainly offer the same type of indexes.

re-use previous prompts or their outputs

The problem with a hash table or KV store is that natural language queries are rarely exactly the same, especially if you are not averaging over the human population but just running local.

Milvus has already promoted a similarity search-based "cacheing" as one of its applications (repo)

@philpax
Copy link
Collaborator

philpax commented Apr 13, 2023

I think this is out of scope for this repository specifically. I could see a batteries-included implementation being built atop llama-rs, but it's unlikely to feature an implementation of a vector database itself because our focus is specifically on robust, fast inference of LLMs.

@jon-chuang
Copy link
Author

jon-chuang commented Apr 13, 2023

our focus is specifically on robust, fast inference of LLMs.

Yes, but I think the broader focus is "low-resource, low-dependency embedded LLM toolchain".

I can definitely see the sliced out k-ANN code existing in a separate repo (perhaps under this org) and compiled in as an optional dependency to llama-rs and available in the cli (on crates.io it would be cargo install llama-rs --features "knowledge-base")

@hhamud
Copy link

hhamud commented Apr 13, 2023

"low-resource, low-dependency embedded LLM toolchain".

Literally what I was thinking of yesterday

@jon-chuang
Copy link
Author

I've made an issue here to sound out the idea: ggerganov/llama.cpp#930

@philpax
Copy link
Collaborator

philpax commented Apr 13, 2023

I can definitely see the sliced out k-ANN code existing in a separate repo (perhaps under this org) and compiled in as an optional dependency to llama-rs and available in the cli (on crates.io it would be cargo install llama-rs --features "knowledge-base")

Sure, but I don't see why it would have to be part of llama-rs specifically. The CLI is really just a demo application for the library; it doesn't aspire to higher functionality than that.

I'm not opposed to having this kind of functionality - having a full-stack solution for using a LLM to do knowledge base inference would be great - but I think it's a hard sell to make it part of this crate specifically. By analogy, we're like hyper, not reqwest - we're not trying to solve all the problems, just the core problem that enables other people to solve their problems.

@jon-chuang
Copy link
Author

jon-chuang commented Apr 13, 2023

but I think it's a hard sell to make it part of this crate specifically.

I'm in agreement here. But do you think that rustformers org more generally could be expanded to this broader scope of a low-resource LLM toolchain and host the broader-scoped llama-rs-toolchain?

@philpax philpax transferred this issue from rustformers/llm Apr 16, 2023
@philpax
Copy link
Collaborator

philpax commented Apr 16, 2023

Sorry - meant to get back to you earlier. Yeah, I think having this as part of a larger solution would be great. I've created this repository to track issues that aren't directly related to llama-rs, but are for the ecosystem around it.

Has anyone experimented with this? Are there any estimates on how much work it would be?

@jon-chuang
Copy link
Author

jon-chuang commented Apr 25, 2023

I’ve not experimented, but it’s on my (currently very long) todo list. I estimate it could be a week of work to get the code in place, but it may take some additional experimentation with prompting (e.g. to emit sequence of tokens indicating search action) to get the models to work well with the knowledge base.

I’ll hopefully get to it once I’m back from holiday.

@hhamud
Copy link

hhamud commented May 8, 2023

Any updates on this? @jon-chuang

@itsbalamurali
Copy link

itsbalamurali commented Jun 13, 2023

@hhamud
Copy link

hhamud commented Jun 18, 2023

@jon-chuang @hhamud & @philpax i've taken a dig at porting chroma to rust: https://gist.github.com/itsbalamurali/118e7ce18f1519f26780b9845dee4e87 has the basic structure to it.

needs : https://github.com/chroma-core/chroma/blob/d98be4d0bfb760155d9f85c9012952ef459c10a6/chromadb/db/clickhouse.py#L583

Nice, do you have an actual full repo to share rather than just a gist?

@shkr
Copy link

shkr commented Jul 10, 2023

I am interested in implementing a rust knowledge base for llms

@zicklag
Copy link

zicklag commented Jul 10, 2023

Cozo might be useful. I'm totally out-of-the-loop, so it might not work for what you're looking for. I figured I'd share just in case.

@ealmloff
Copy link

ealmloff commented Jul 10, 2023

I implemented an in memory version of this as part of Floneum. Here is the relivent code: https://github.com/floneum/floneum/blob/master/plugin/src/vector_db.rs

Instant distance is fairly easy to work with and actively maintained

@shkr
Copy link

shkr commented Jul 14, 2023

Cozo might be useful. I'm totally out-of-the-loop, so it might not work for what you're looking for. I figured I'd share just in case.

Thanks cozo is very interesting, and might solve the use case I was thinking of.

@ayourtch
Copy link

I saw https://github.com/tensorchord/pgvecto.rs today - it fits the bill of “rust only”. (Admittedly I am too new to this field to even fully understand if this is relevant or not, but in case someone might find it useful)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants