- Learn C++ (this was my first C++ project)
- Learn how search engines work
- CMake
- A C++ compiler (MSVC, GCC, MINGW, etc.)
git clone https://github.com/bensengupta/search
cd search
cmake .
# Run the executable: ./search <title file> <query>
./search titles_100k.txt France
titles_100k.txt
is the first 100K Wikipedia page titles extracted from enwiki-latest-all-titles-in-ns0.gz.
- Indexing documents should also remove previous documents with same ID from index
- Removing documents by searching through entire index for ID
- or pop old document with same ID from storage, find what words it contains and only search & remove in those indices
- Heavily inspired by Victor Lavrenko's Inverted Indexing lecture series
- Search Engine Indexing, Wikipedia