Skip to content

Releases: marqo-ai/marqo

Release 0.0.19

12 May 00:16
15dbcfe
Compare
Choose a tag to compare

0.0.19

New features

  • Model authorisation(#460). Non-public OpenCLIP and CLIP models can now be loaded from Hugging Face and AWS s3 via the model_location settings object and model_auth. See here (model auth during search) and here (model auth during add_documents) for usage.
  • Max replicas configuration (#465). Marqo admins now have more control over the max number of replicas that can be set for indexes on the Marqo instance. See here for how to configure this.

Breaking changes

  • Marqo now allows for a maximum of 1 replica per index by default (#465).

Bug fixes and minor changes

  • README improvements (#468)
  • OpenCLIP version bumped (#461)
  • Added extra tests (#464)
  • Unneeded files are now excluded in Docker builds (#448, #426)

Contributor shout-outs

  • Thank you to our 2.9k stargazers!
  • Thank you to community members for the increasingly exciting discussions on our Slack channel. Feedback, questions and hearing about use cases helps us build a great open source product.
  • Thank you to @jalajk24 for the PR to exclude unneeded files from Docker builds!

Release images can be found on Docker hub

Release 0.0.18

24 Apr 09:15
b05798f
Compare
Choose a tag to compare

0.0.18

New features

  • New E5 model type is available (#419). E5 models are state of the art general-purpose text embedding models that obtained the best results on the MTEB benchmark when released in Dec 2022. Read more about these models here.
  • Automatic model ejection (#372). Automatic model ejection helps prevent out-of-memory (OOM) errors on machines with a larger amount of CPU memory (16GB+) by ejecting the least recently used model.
  • Speech processing article and example (#431). @OwenPendrighElliott demonstrates how you can build and query a Marqo index from audio clips.

Optimisations

  • Delete optimisation (#436). The /delete endpoint can now handle a higher volume of requests.
  • Inference calls can now execute in batches, with batch size configurable by an environment variable (#376).

Bug fixes and minor changes

  • Configurable max value validation for HNSW graph parameters (#424). See here for how to configure.
  • Configurable maximum number of tensor search attributes (#430). See here for how to configure.
  • Unification of vectorise output type (#432)
  • Improved test pipeline reliability (#438, #439)
  • Additional image download tests (#402, #442)
  • Minor fix in the Iron Manual example (#440)
  • Refactored HTTP requests wrapper (#367)

Contributor shout-outs

  • Thank you to our 2.8k stargazers!
  • Thank you community members raising issues and discussions in our Slack channel.
  • Thank you @jess-lord and others for raising issues

Release images can be found on Docker hub

Release 0.0.17

05 Apr 05:58
971ba2c
Compare
Choose a tag to compare

0.0.17

New features

  • New parameters that allow tweaking of Marqo indexes' underlying HNSW graphs. ef_construction and m can be defined at index time (#386, #420, #421), giving you more control over the relevancy/speed tradeoff. See usage and more details here.
  • Score modification fields (#414). Rank documents using knn similarity in addition to document metadata (#414). This allows integer or float fields from a document to bias a document's score during the knn search and allows additional ranking signals to be used. Use cases include giving more reputable documents higher weighting and de-duplicating search results. See usage here.

Bug fixes and minor changes

  • Added validation for unknown parameters during bulk search (#413).
  • Improved concurrency handling when adding documents to an index as it's being deleted (#407).
  • Better error messages for multimodal combination fields (#395).
  • Examples of recently added features added to README (#403).

Contributor shout-outs

Release images can be found on Docker hub

Release 0.0.16

17 Mar 07:59
f12c948
Compare
Choose a tag to compare

0.0.16

New features

  • Bulk search (#363, #373).
    Conduct multiple searches with just one request. This improves search throughput in Marqo by parallelising multiple search queries in a single API call.
    The average search time can be decreased up to 30%, depending on your devices and models.
    Check out the usage guide here
  • Configurable number of index replicas (#391).
    You can now configure how many replicas to make for an index in Marqo using the number_of_replicas parameter. Marqo makes 1 replica by default.
    We recommend having at least one replica to prevent data loss.
    See the usage guide here
  • Use your own vectors during searches (#381). Use your own vectors as context for your queries.
    Your vectors will be incorporated into the query using a weighted sum approach,
    allowing you to reduce the number of inference requests for duplicated content.
    Check out the usage guide here

Bug fixes and minor changes

  • Fixed a bug where some Open CLIP models were unable to load checkpoints from the cache (#387).
  • Fixed a bug where multimodal search vectors are not combined based on expected weights (#383).
  • Fixed a bug where multimodal document vectors are not combined in an expected way. numpy.sum was used rather than numpy.mean. (#384).
  • Fixed a bug where an unexpected error is thrown when using_existing_tensor = True and documents are added with duplicate IDs (#390).
  • Fixed a bug where the index settings validation did not catch the model field if it is in the incorrect part of the settings json (#365).
  • Added missing descriptions and requirement files on our GPT-examples (#349).
  • Updated the instructions to start Marqo-os (#371).
  • Improved the Marqo start-up time by incorporating the downloading of the punkt tokenizer into the dockerfile (#346).

Contributor shout-outs

  • Thank you to our 2.5k stargazers.
  • Thank you to @ed-muthiah for submitting a PR (#349)
    that added missing descriptions and requirement files on our GPT-examples.

Release images can be found on Docker hub

Release 0.0.15

28 Feb 10:44
1b2d9fe
Compare
Choose a tag to compare

0.0.15

New features

  • Multimodal tensor combination (#332, #355). Combine image and text data into a single vector! Multimodal combination objects can be added as Marqo document fields. This can be used to encode text metadata into image vectors. See usage here.

Bug fixes

  • Fixed a bug that prevented CLIP's device check from behaving as expected (#337)
  • CLIP utils is set to use the OpenCLIP default tokenizer so that long text inputs are truncated correctly (#351).

Contributor shout-outs:

Release images can be found on Docker hub

Release 0.0.14

28 Feb 10:43
9711dbe
Compare
Choose a tag to compare

0.0.14

New features

  • use_existing_tensors flag, for add_documents (#335). Use existing Marqo tensors to autofill unchanged tensor fields, for existing documents. This lets you quickly add new metadata while minimising inference operations. See usage here.
  • image_download_headers parameter for search and add_documents (#336). Index and search non-publicly available images. Add image download auth information to add_documents and search requests. See usage here.

Optimisations

  • The index cache is now updated on intervals of 2 seconds (#333), rather than on every search. This reduces the pressure on Marqo-OS, allowing for greater search and indexing throughput.

Bug fixes

  • Helpful validation errors for invalid index settings (#330). Helpful error messages allow for a smoother getting-started experience.
  • Automatic conversion to fp32 when using fp16 models on CPU (#331). This allows Marqo running on a CPU-only machine to index and search a fp16 CLIP index (although we still recommend interacting with a fp16 index with device=CUDA for the best results).
  • Broadening of the types of image download errors gracefully handled. (#321)

Release images can be found on Docker hub

Release 0.0.13

14 Feb 05:56
a5caac2
Compare
Choose a tag to compare

0.0.13

New features

  • Support for custom CLIP models using the OpenAI and OpenCLIP architectures (#286). Read about usage here.
  • Concurrency throttling (#304). Configure the number of allowed concurrent indexing and search threads. Read about usage here.
  • Configurable logging levels (#314). Adjust log output for your debugging/log storage needs. See how to configure log level here.
  • New array datatype (#312). You can use these arrays as a collection of tags to filter on! See usage here.
  • Boost tensor fields during search (#300). Weight fields as higher and lower relative to each other during search. Use this to get a mix of results that suits your use case. See usage here.
  • Weighted multimodal queries (#307). You can now search with a dictionary of weighted queries. If searching an image index, these queries can be a weighted mix of image URLs and text. See usage here.
  • New GPT-Marqo integration example and article. Turn your boring user manual into a question-answering bot, with an optional persona, with GPT + Marqo!
  • Added new OpenCLIP models to Marqo (#299)

Optimisations

  • Concurrent image downloads (#281, #311)
  • Blazingly fast fp16 ViT CLIP models (#286). See usage here
  • Reduction of data transfer between Marqo and Marqo-os (#300)
  • We see a 3.0x indexing speedup, and a 1.7x search speedup, using the new fp16/ViT-L/14 CLIP model, compared to the previous release using ViT-L/14.

Bug fixes

  • Fixed 500 error when creating an index while only specifying number_of_shards(#293)
  • Fixed model cache management no parsing reranker model properties properly (#308)

Contributor shout-outs

  • Thank you to our 2.3k stargazers
  • Thank you to @codebrain and others for raising issues.

Release images can be found on Docker hub

Release 0.0.12

27 Jan 09:06
4c4b3ca
Compare
Choose a tag to compare

0.0.12

New features

  • Multilingual CLIP (#267). Search images in the language you want! Marqo now incorporates open source multilingual CLIP models. A list of available multilingual CLIP models are available here.
  • Exact text matching (#243, #288). Search for specific words and phrases using double quotes (" ") in lexical search. See usage here.

Optimisations

  • Search speed-up (#278). Latency reduction from Marqo-os indexes reconfigurations.

Contributor shout-outs

Thank you to our 2.2k stargazers and 80+ forkers!

Release images can be found on Docker hub

Release 0.0.11

10 Jan 07:54
d5a2570
Compare
Choose a tag to compare

0.0.11

New features

  • Pagination (#251). Navigate through pages of results. Provide an extensive end-user search experience without having to keep results in memory! See usage here
  • The /models endpoint (#239). View what models are loaded, and on what device. This lets Marqo admins examine loaded models and prune unneeded ones. See usage here
  • The /device endpoint (#239). See resource usage for the machine Marqo is running on. This helps Marqo admins manage resources on remote Marqo instances. See usage here
  • The index settings endpoint (/indexes/{index_name}/settings)(#248). See the model and parameters used by each index. See usage here.
  • Latency log outputs (#242). Marqo admins have better transparency about the latencies for each step of the Marqo indexing and search request pipeline
  • ONNX CLIP models are now available (#245). Index and search images in Marqo with CLIP models in the faster, and open, ONNX format - created by Marqo's machine learning team. These ONNX CLIP models give Marqo up to a 35% speedup over standard CLIP models. These ONNX CLIP models are open sourced by Marqo. Read about usage here.
  • New simple image search guide (#253, #263).

Contributor shout-outs

  • ⭐️ We've just hit over 2.1k GitHub stars! ⭐️ So an extra special thanks to our stargazers and contributors who make Marqo possible.

Release images can be found on Docker hub

Release 0.0.10

15 Dec 11:36
5cc2586
Compare
Choose a tag to compare

0.0.10

New features

  • Generic model support (#179). Create an index with your favourite SBERT-type models from HuggingFace! Read about usage here
  • Visual search update 2. (#214). Search-time image reranking and open-vocabulary localization, based on users' queries, is now available with the Owl-ViT model. Locate the part of the image corresponding to your query! Read about usage here
  • Visual search update 1. (#214). Better image patching. In addition to faster-rcnn, you can now use yolox or attention based (DINO) region proposal as a patching method at indexing time. This allows localization as the sub patches of the image can be searched. Read about usage here.

Check out this article about how this update makes image search awesome.

Bug fixes

  • Fixed imports and outdated Python client usage in Wikipedia demo (#216)

Contributor shout-outs

  • Thank you to @georgewritescode for debugging and updating the Wikipedia demo
  • Thank you to our 1.8k stargazers and 60+ forkers!

Release images can be found on Docker hub