About

This repository is a sandbox for users to experiment with the Hyperspace search engine. The repository includes multiple datasets and corresponding notebooks, desgined for classic, vector and hybrid search.

Introduction

Hyperspace is a cloud-based hybrid search engine, powered by cloud FPGA hardware. Hyperspace sets new standards in query performance by allowing high-throughput searches with extremely low latency, typically measuring x10-x100 faster than industry benchmarks, and at reduced costs. Hyperspace allows vector search, similarity search, or a combination of the two. The Hyperspace engine query syntax is native Python with supported functionality for candidate generation and scoring for similarity and vector searches.

Hyperspace Advantages

Hybrid Search: HyperSearch engine combines vector and similarity search within a single workframe, providing the best of both worlds.
Simplicity and Ease of Use: Hyperspace native Python syntax allows a seamless and natural migration of existing codebases.
Unparalleled Latency: Hyperspace offers x100-x10 lower latency than industry benchmarks, allowing more complex logic in lower latency.
Cost Efficiency: By leveraging Hyperspace, users can significantly reduce machine time requirements and associated costs.
Advanced AI Possibilities: Hyperspace separates candidate generation from scoring, combined withe the extremely low latency, this allows use of complex AI techniques that are commonly impractical.

Workflow

Download and install the client API
Create data config file
Connect to a server
Create collection
Ingest data
Run query

Example Datasets

This repository includes various datasets and notebooks, aimed to demonstrate the use of Hyperspace Engine. Currently, the following datasets are included:

arXiv Papers Dataset - The dataset is taken from kaggle and includes a list of academic papers from arXiv, and their metadata, and can be used for vector, classic or hybrid searches.
Crimes In Chicago Dataset - taken from kaggle, this dataset includes metadata and can be used to demonstrate classic search.
Stores Dataset - Randomly generated vectors of dimension 800, with corresopnding metadata that describes stores. The data can be used for vector, classic or hybrid search.
Movies Dataset - The data is taken from MovieLens Latest Datasets. The data includes 40954 valid movies. The data is in SQL format (table) and will be converted to NoSQL (documents) format. The data preprocessing is given in the notebook titles "MovieRecommendationDataPrep", available in this repository. The data can be used for vector, classic or hybrid search.

Argmax Datasets

We have added two example datasets. The data for these usecases can be found here To Run the code you should add data folder to each code example with the data from the link above.

Advec - Dataset of applications.
Image-search - Dataset of amazon items.

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
Argmax-demos		Argmax-demos
DataSets		DataSets
.gitattributes		.gitattributes
Movies_Recommendation_Demo.ipynb		Movies_Recommendation_Demo.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Introduction

Hyperspace Advantages

Workflow

Example Datasets

Argmax Datasets

About

Releases

Packages

Languages

argmaxml/HyperSpaceQuickStart

Folders and files

Latest commit

History

Repository files navigation

About

Introduction

Hyperspace Advantages

Workflow

Example Datasets

Argmax Datasets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages