Combining elastiknn with standard ES queries. #366
Replies: 6 comments
-
Hi, this shows how I've gotten this type of combination query working in the past: http://elastiknn.klibisz.com/api/#running-nearest-neighbors-query-on-a-filtered-subset-of-documents. If that doesn't fit your usecase, can you post some example docs to try your query? |
Beta Was this translation helpful? Give feedback.
-
@joseph-macraty I happened to run into an issue with the combined query while working on something else. I found that my original example in the docs technically works, but it will actually evaluate over all of the docs, instead of just the ones matching a filter. I updated the example linked above so that it will only run knn on the docs matching a filter. |
Beta Was this translation helpful? Give feedback.
-
Hi @alexklibisz , |
Beta Was this translation helpful? Give feedback.
-
@joseph-macraty If I may ask, what's your usecase for Elastiknn? (just trying to get a sense of how people are using it in practice, since I don't use Elasticsearch at work myself) |
Beta Was this translation helpful? Give feedback.
-
We are working on text based search engines for different use cases. Here's how we are currently using Elasitknn: We had developed a couple of BERT based models for search and were testing it out with Elastic Cloud. We were satisified with it and wanted to use them in production. We initially thought we couldn't use ES because it did not support Approximate Vector Search. We explored other options like Faiss/Annoy but for it we had to modify a non-trivial amount of our existing pipeline/codebase (we were using ES earlier). They also added a significant computational cost. That is when I came across one of your comments on the ES repo. It was really smooth to setup and we got up and running in <1hr. Our current index has about 5.8 million (768 dimmensional vectors) documents and on a All in all, I think unless you have hundreds of millions of documents or require really fast search (and can afford the added effort and computational resources) Elastiknn is the best option. For a large number of use-cases, elastiknn could replace FAISS/Annoy. In our inital searches though, Elastiknn never came up and we could have easily missed it. Is there anyway to increase the visibilty of this excellent repo? I think semantic search is an upcoming field and hence there are only a few blogs on it and all of them use the other more popular ANN implementations. If there are any ways we can contribute (writing blogs?), we would love to! |
Beta Was this translation helpful? Give feedback.
-
That's all great to hear. Great motivation for me to keep chipping away at this. Mind if I ask what company you're at? The original source of this idea was a very similar problem to the one you described. That was at an old job, and I no longer have the problem day-to-day, but I got a lot better with Java/Scala/gradle/etc in my most recent job, so I've given this another pass. In terms of visibility, I'm planning to do an "Introducing Elastiknn"-style blog post. The plan has been to do that after I get it integrated with the ann-benchmarks project. That seems to be table stakes for any ANN solution nowadays. It' been tough because the JVM is painfully slow compared to all of the C/C++/in-memory implementations used in that project. I'm pretty confident I can speedup one remaining bottleneck and that will make a big difference. Then once it's merged into ann-benchmarks I'll do a more celebratory writeup on medium or something. |
Beta Was this translation helpful? Give feedback.
-
Hi,
First of all thank you for this plugin. Relieved us of the pain of going through and hosting a seperate FAISS/Annoy index.
I just wanted to filter the results using a
multi_match
query. Here is what I am trying:Basically I am trying to filter out the documents with the query in
multi_match
. Is this possible?This is the error I am getting:
Beta Was this translation helpful? Give feedback.
All reactions