esURP: use Solr's DataImportHandler and UpdateRequestProcessor in ElasticSearch

Description

This is implemented as a Solr UpdateRequestProcessor (URP) that redirects docs to ES. It is thoroughly explained in this blog post.

Using this, you should be able to configure your current Solr instance so it points to an ES instance and:

you can then index via DIH to ES
you can index to ES after the docs have been processed by any URP you want
both of the above
docs can also be indexed on Solr (at the same time as in ES)
it has been tested with Solr5.2.1 and ES2.0. But should work fine with newer versions too. Important: ES must be using the same Lucene version Solr is using.

Usage

On ElasticSearch side: just start ES normally.

On Solr side:

add the following (or corresponding, if you are using a diff version than ES2.0) jars from ES to solr\server\solr-webapp\webapp\WEB-INF\lib:
- elasticsearch-2.0.0.jar
- jackson-core-2.5.3.jar
- jackson-dataformat-yaml-2.5.3.jar
- jsr166e-1.1.0.jar
- guava-18.0.jar
- hppc-0.7.1.jar
- netty-3.10.3.Final.jar
- jna-4.1.0.jar
- compress-lzf-1.0.2.jar
removed the original jars from Solr that are superseeded by those just copied, in my case:
- guava-14.0.1.jar
- hppc-0.5.2.jar
also add EsUpdateRequestProcessorFactory classes to Solr. I run them from my IDE, but you can create a jar too and put it with the ones above
configure solrconfig.xml so the chain that handled the docs you want to index in ES are processed by EsUpdateRequestProcessorFactory, for example with this configuration, we would be able to index into ES using DIH:


    <updateRequestProcessorChain name="mychain">
    <processor class="com.jmlucjav.esURP.EsUpdateRequestProcessorFactory">
        <str name="esCluster">elasticsearch</str>
        <str name="esIndex">employees</str>
        <str name="esType">employee</str>
        <str name="ignoreFields">parent</str>
        <bool name="useTransportClient">false</bool>
    </processor>
    <processor class="solr.RunUpdateProcessorFactory"/>
    </updateRequestProcessorChain>

    <!-- DIH -->
    <requestHandler name="/dataimport" class="solr.DataImportHandler">
        <lst name="defaults">
          <str name="config">db-data-config.xml</str>
          <str name="update.chain">mychain</str>
        </lst>
    </requestHandler>

The parameters above are quite straighforward, the indicate what the ES cluster, index and type. And allow you to ignore certain document fields so they are not sent to ES.

start Solr this way:

solr/bin/solr start -a "-Des.path.home=path-to-es\elasticsearch-1.7.1 -Des.security.manager.enabled=false"

Now just index docs in Solr, and they will show up in ES.

Limitations

the ES mappings needed (for Nested types etc) are configured when a full delete is done from Solr. This was handy cause DIH sends a full delete when reindexing. If you are not using DIH, you can still send a full delete just so the mappings are set, or configure ES index beforehand the same way esURP does.
for delete operations, just by id or *:* are supported.
after you do the indexing to ES, if you still want to query Solr, it might be better to put the original jars in place, or some component might fail, for instance the ExpandComponent fails in my setup (due to the newer hppc jar from ES).
Important: ES must be using the same Lucene version Solr is using.

Contributing

Feel free. Pull requests, issues etc are welcome.

Contact: jmlucjav AT Google's mail

License

This is released under Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
solr-5.2.1/solr/core/src/java/com/jmlucjav/esURP		solr-5.2.1/solr/core/src/java/com/jmlucjav/esURP
solr/server/solr/collection1/conf		solr/server/solr/collection1/conf
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

esURP: use Solr's DataImportHandler and UpdateRequestProcessor in ElasticSearch

Description

Usage

Limitations

Contributing

License

About

Releases

Packages

Languages

License

jmlucjav/esURP

Folders and files

Latest commit

History

Repository files navigation

esURP: use Solr's DataImportHandler and UpdateRequestProcessor in ElasticSearch

Description

Usage

Limitations

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages