As of 2023, we use multiple different search solutions.
The WordPress-based sites (blogs like https://blog.jquery.com, documentation sites like https://api.jquery.com, and misc sites like https://brand.jquery.org or https://learn.jquery.com) use either the default MySQL-based search backend that comes with WordPress, or the Relevanssi plugin for WordPress that improves its result quality and performance in a way that is transparent to the WordPress theme and frontend.
The plugin installation and configuration resides in the https://github.com/jquery/jquery-wp-content repository.
Most of the documentation sites additionally use Algolia's DocSearch for autocompletion. It was set up in 2013 (thread 1, thread 2) with the help of Sylvain Pace who worked on Algolia DocSearch.
The sites are crawled passively by Algolia (interval unknown), via their algolia/docsearch-scraper service. These configuration files were created for us and control the crawler:
The open source algolia/docsearch-scraper
service was deprecated in 2021 in
favour of the propietary Algolia Crawler. The above configuration files
link to a now archived read-only repository. It is our understanding that
migration to the new Crawler is opt-in and requires client and configuration
changes. It appears the legacy crawler still runs although at unknown frequency
(seemingly less than once a month, if at all).
The algolia-docsearch.js client integration resides in the https://github.com/jquery/jquery-wp-content repository.
A number of sites use an active rather than passive crawling, by pushing content directly to the Algolia API during website deployments.
These use jekyll-algolia during the CI job (GitHub Actions) that builds and deploys the static site. The frontend CSS and JS for this are part of the Amethyst theme for Jekyll: https://github.com/qunitjs/jekyll-theme-amethyst/
See also its Getting started documentation for how to works in more detail.
As of 2021, we're exploring an open-source solution that we can support within the free software ecosystem. In doing so we will increase security and availability (by reducing client-side dependence on third-party domains), and lower our privacy budget.
We first evaluated Meilisearch (private thread) and experienced some suboptimal aspects. These included: difficult upgrades (not yet committing to forward compatibility or automatic in-place upgrades), opt-out telemetry instead of opt-in, no official Debian packages, non-trivial interactive setup, missing support for querying multiple indexes (e.g. qunitjs.com and api.qunitjs.com), and a not yet clear future in terms of business model (Meilisearch Cloud was not yet in the picture, and the backend is not GPL licensed).
In mid-2022, the experiment transitioned to focus on Typesense instead.
- Canonical domain: https://typesense.jquery.com
- Bootstrap key: (
profile::typesense::api_key
in Private Hiera data)
For security reason, we don't use the "bootstrap" admin API key beyond internal provisioning and minting other API keys. If you need an admin key for anything outside Puppet, such as for a CI job that crawls a site and uploads content to Typesense, then generate a key for that one website (or for a group of related sites under the same project/owner).
Remember to set a collection prefix, and put the project name in the description.
You can either let a random key be generated, or ensure the existence of a
given API key by setting the value
key in the posted JSON message.
https://typesense.org/docs/0.24.0/api/api-keys.html
export TYPESENSE_BOOTSTRAP_KEY=...
# Create admin key for qunitjs_com and other qunit* collections.
curl http://localhost:8108/keys \
-X POST \
-H "X-TYPESENSE-API-KEY: $TYPESENSE_BOOTSTRAP_KEY" \
-H 'Content-Type: application/json' \
-d '{"description":"QUnit admin key.","actions": ["*"], "collections": ["qunit.*"]}'
"Seach-only" keys are for public use in browsers and other clients, and may
be committed to public Git repositories. Use the below command from
the search
backend server to generate such keys.
export TYPESENSE_BOOTSTRAP_KEY=...
curl http://localhost:8108/keys \
-X POST \
-H "X-TYPESENSE-API-KEY: $TYPESENSE_BOOTSTRAP_KEY" \
-H 'Content-Type: application/json' \
-d '{"description":"Search-only key.","actions": ["documents:search"], "collections": ["*"]}'
Add these two secrets to the GitHub repo's settings:
TYPESENSE_HOST
:typesense.jquery.com
(host-only, no port or protocol)TYPESENSE_ADMIN_KEY
: (an admin key with rights to relevant collections)
Then add /docsearch.config.json
and /.github/workflows/typesense.yaml
files
to the repository, similar to those in https://github.com/qunitjs/qunitjs.com/
or https://github.com/jquery/api.jquery.com/.