Skip to content
surfmuggle edited this page Apr 15, 2018 · 2 revisions

Search

  1. Apache Tika detects and extracts metadata and text from over a thousand different file types, and as well as providing a Java library, has server and command-line editions.
    • Users: Alfresco CMS, maybe Hippo CMS (see plans on https://www.bloomreach.com/en/blog/2010/04/Metadata+extraction+with+Apache+Tika.html
  2. Elasticsearch provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. It is designed for use as an integrated solution, referred to as the “Elastic Stack” (formerly the “ELK stack”) which contains
    • Logstash a data-collection and log-parsing engine and
    • Kibana an analytics and visualisation platform
  3. Apache Solr

Other libraries worth looking into

A content repository is a hierarchical content store with support for structured and unstructured content, full text search, versioning, transactions, observation, and more.

  1. Apache Jackrabbit™ a fully conforming implementation of the Content Repository for Java Technology API (JCR, specified in JSR 170 and JSR 283).
  2. Jackrabbit Oak is a complementary implementation of the JCR specification. It is an effort to implement a scalable and performant hierarchical content repository for use as the foundation of modern world-class web sites and other demanding content applications. See the Jackrabbit Oak website for more information.
Clone this wiki locally