This branch of the library implements the Bag of Words technique as described in section 7.2 of Programming Computer Vision with Python.
Spatial information is currently represented by splitting the image up into 16 equal parts, i.e. a level 2 decomposition in terms of the Spatial Pyramid Matching method, and projecting the SIFT descriptors present in that location onto the vocabulary.
The color histogram implementation is taken almost verbatim from The complete guide to building an image search engine with Python and OpenCV with the chi-squared distance rewritten using scipy so linear search becomes feasible.
Querying is done by computing the color histogram and projecting the BoW vocabulary onto the input image.
Ranking is done by sorting an array of tuples, the first element representing the Jaccard Distance between the BoW histogram of the input image and all the images in the database and the second element representing the chi-squared distance between the color histograms.
####Sample runs:
Here are some sample runs on a dataset of 610 images, a significant portion of which are natural outdoor scenes.
https://gist.github.com/Transfusion/5ae12b05ad9b5f797507
https://gist.github.com/Transfusion/494d5d38b4f68cd9f5a0
https://gist.github.com/Transfusion/e2ab5d699dda20b87d58
https://gist.github.com/Transfusion/32598be2c005dc3764f8 (Note how high dimensional features are completely unsuitable for sketched images; chances are the user will remember the color more accurately than the structure of the scene they have in mind!)
Script used to generate the above demos: https://gist.github.com/Transfusion/52129142a9e8e3e3963f
SIFT and SURF removed from the default install of OpenCV 3.0, it need OpenCV 3 with the opencv_contrib package
##Usage:##
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
>>> import algolib
>>> algolib.db_manager.db_manager_flat_file.init_db('db.pkl')
>>> pop = algolib.db_populator('db.pkl')
>>> pop.add_dir('/home/transfusion/InstaSketch_Algo/image_repo', recursive=True, overwrite=True)
>>> query = algolib.query_db('db.pkl')
>>> results = query.query_image('/home/transfusion/InstaSketch_Algo/image_repo/cdn/00078.JPEG', 16)
>>> a.query_image('/home/transfusion/InstaSketch_Algo/image_repo/arborgreens/Image32.jpg', 10)
[('/home/transfusion/InstaSketch_Algo/image_repo/cdn/00388.JPEG', 0.99628942486085348, 12.56696605682373), ('/home/transfusion/InstaSketch_Algo/image_repo/cdn/00390.JPEG', 0.95713107996702396, 14.383894920349121), ('/home/transfusion/InstaSketch_Algo/image_repo/cdn/00180.JPEG', 0.91382904794996522, 15.544892311096191), ...]