Skip to content

Latest commit

 

History

History
276 lines (179 loc) · 4.61 KB

index.md

File metadata and controls

276 lines (179 loc) · 4.61 KB
theme _class style marp inlineSVG
gaia
lead
:root { --color-background: #fff !important; --color-foreground: #182b3a !important; --color-highlight: #bc1439 !important; --color-dimmed: #888 !important; border-top: 4px solid var(--color-highlight); }
true
true

bg


$ whoami

bg right:40% 80%

  • Kumar Shivendu

  • Software Engineer @ Qdrant

  • I ❤️ information retrieval, performance, and data mining.

  • First talk!

  • Qdrant: Future of search and beyond


Traditional ways to build search

  • Approaches:

    • Keyword match, Regex, Boolean operators
    • Extracting metadata using NLP and CV
    • Knowledge graphs, Vocabs
  • Challenges:

    • Effort to maintain the knowledge
    • Growth of unstructured data
    • Multimodal search remains hard: Text, Image, Audio, Video

Vectors

  • Points in an N-dimensional space
  • Anything -> Vector
  • Generated from:
    • ML models
    • Metric learning
      • CLIP

bg vertical right:50% 90% bg right:50% 90%


Vector search

bg right:50% 50%

  • Nearest points
  • Example: Google Lens
  • But this is expensive and not easy to scale
  • Solution: Indexing and approximation

What is Qdrant

bg right:40% 50%

  • Vector Search Engine (aka Vector DB)

  • 15k+ stars on Github

  • Written in Rust 🦀

  • SDKs for Python, JS, Go, Java, etc

  • Twitter, Canva, Meesho, Flipkart


The HNSW Index

bg right:50% 100%

  • Skip Lists + Graphs
  • Approximate and Tunable
  • Filter during search
  • Quantization

Running search:

POST /collections/rentals/points/search
{
  "query": [0.2, 0.3, 0.4, 0.5], // vector generated from image/text/video
  "filter": { "must": [{"key": "locality", "match": {"value": "Indiranagar"}}] },
  "limit": 10
}
  • [
      {"id": 4, "score": 0.56, "payload": {...}},
      {"id": 2, "score": 0.40, "payload": {...}},
      {"id": 5, "score": 0.23, "payload": {...}},
    ]

Beyond search: Recommendations

  • Realtime addition of points is possible.
  • average_vector and best_score
  • POST /collections/rentals/points/recommend
    {
      "positive": [100, 231], // vector ids
      "negative": [718, [0.2, 0.3, 0.4, 0.5]], // vector id and vector
      "filter": { "must": [{"key": "locality", "match": {"value": "Indiranagar"}}] },
      "strategy": "best_score",
    }
  • DailyMotion (Qdrant), Spotify (Annoy)

How to find this thing on the internet?

bg right:35%

  • No reverse image search
  • No known name

Strategy One

  • Describe the thing
    • "Combination of human, dragon and chicken"
    • "Mythology creature of human and dragon"

bg right:50%


Strategy Two

  • Search for similar images
    • Similarity bubble

bg right:60% bg


Beyond search: Discovery API

  • Unique iterative search by Qdrant

  • Combine multi-modal vectors in single query

  • POST collections/my-collection/points/discover
    {
      "target": [0.63, 0.10, 0.91, 0.55],
      "context": [
        {
          "positive": 7125, // <-- ID of the example
          "negative": 122   // <-- This can also be a vector
        }
      ],
    }

Summary

  • Anything => vector

  • Vectors >> similarity search

  • Thousands of use-cases with Qdrant

  • Find me at
    • kshivendu.dev/bio
    • kshivendu.dev/twitter

bg right:50% 50%