A full text search engine implemented as part of HUJI's Web Information Retrieval course.
The engine currently supports a specific dataset - Amazon product review data taken from here, using
a line-oriented data format (see the .txt files under datasets
for an example)
The main classes of this library are:
-
webdata.IndexWriter
, for constructing the index given a dataset file -
webdata.IndexReader
for querying the index -
webdata.ReviewSearch
for performing various text search operations
-
Click here for an explanation and visualization of the index structure, as well as theoretical runtime analysis of index operations.
-
Click here for various benchmarks of index construction and querying.
-
Click here for an explanation of a custom product ranking function I've implemented for product search.
Most of the classes and methods were also documented, see below on how to create javadocs.
Requires Java 11+ and Maven.
-
Type
mvn package
to compile, test and package this library, and generate docs.The resulting jars will be located attarget
.Documentation can be found at
target/apidocs/index.html
(Skip testing by adding
-Dmaven.test.skip=true
)