Skip to content

Latest commit

 

History

History
168 lines (138 loc) · 8.13 KB

README.md

File metadata and controls

168 lines (138 loc) · 8.13 KB

Contents

Overview

A simple to use, benchmark setup that takes care of setting up a Solr cluster on AWS and benchmarks it against different query types

Benchmark setup

Setup includes:

  • 4 Solr nodes
    • solr-node-1
    • solr-node-2
    • solr-node-3
    • solr-node-4
  • 3 Zookeeper nodes
    • zoo-node-1
    • zoo-node-2
    • zoo-node-3
  • 1 Client node [Load Generator]
    • solrj-client-1

All the above components (including the client [Load Generator] will run on individual AWS nodes)

The current benchmark setup uses:

  • Solr version: solr-7.7.3
  • Zookeeper version: zookeeper-3.4.13

In the current state, this benchmark setup can be used to benchmark Solr on the following JDK's:

  • Zing: zing21.07.0.0-3-ca-jdk11.0.12
  • Zulu: zulu11.50.19-ca-jdk11.0.12

The solr version, zookeeper version and JDK of choice can be easily changed by modifying init.sh file

What does the benchmark do ?

The benchmark setup includes a client/load generator built using SolrJ library
It is run on a dedicated AWS node to benchmark the Solr cluster

It currently supports benchmarking Solr cluster with 5 types of search/select queries:

In addition to the search requests, a typical Solr application also needs to deal with update requests, and the performance of the search requests are affected depending on how the update requests are handled.

In order to study this effect, the current benchmark setup also has the ability to run update operations/queries (atomic updates) along side the search/select queries mentioned above
But these update operations are run as a background task and the performance of only the search/select operations are measured

The search/select queries that are used are stored in text files
Depending on the type of query chosen for benchmarking, the relevant query files are read by the client and requests are continuously submitted to Solr cluster

The benchmark allows sending the requests at a fixed target rate
But in order to measure the peak throughput that can be achieved, the target rate (targetRateForSelectOpAtWarmup, targetRateForSelectOp) is deliberately set to a very high value (see config file) and the actual rate achieved is recorded.

The background update operations are run at a fixed rate of 1000 requests/sec

Details of dataset used in benchmarking

A ~50GB wikimedia dump (link) is indexed into the Solr cluster against which the benchmark is run
It is a derivative of pages-articles-multistream.xml.bz2 by the Wikimedia Foundation, used under CC BY-SA 3.0

This data dump is licensed under CC BY-SA 3.0 by Azul Systems, Inc.

How to run the benchmark ?

Prepare the benchmarking setup

Only 3 steps are necessary to prepare the setup for benchmarking:

Since the entire cluster (3 node zookeeper ensemble + 4 Solr nodes) and the client/load generator runs on AWS instances, a light-weight instance is sufficient to act as a central coordinator or leader to take care of running the above 3 steps, starting the benchmark runs on the cluster, collecting the results of the benchmark runs etc ...
This light-weight instance can either be user's laptops (with linux/mac OS) or a separate small AWS instance can be used

Provision the necessary AWS nodes

To provision the necessary AWS instances, follow the instructions here

Configure provisioned nodes

Run the below command to configure the nodes, install necessary tools, download the necessary artifacts etc ...

bash scripts/setup.sh all

NOTE: Make sure JAVA_HOME env (pointing to JDK11) is set on the host which runs this script

The above command takes care of the following:

  • prepares 3 node Zookeeper ensemble (zoo-node-1, zoo-node-2, zoo-node-3)
  • prepares 4 node Solr cluster (solr-node-1, solr-node-2, solr-node-3, solr-node-4)
  • prepares a client node (solrj-client-1)
  • wikimedia dump is indexed into the Solr cluster

Starting the benchmark

General command to run the benchmark against a given query type:
QUERY_TYPE=<QUERY TYPE> JAVA_HOME=<ABS_PATH_TO_JAVA_HOME_ON_AWS> bash scripts/main.sh startBenchmark

NOTE: To pass additional JVM args to the Solr cluster, SOLR_JAVA_MEM and GC_TUNE env variables can be used:

GC_TUNE='-XX:-UseZST -XX:+PrintGCDetails' SOLR_JAVA_MEM='-Xms55g -Xmx70g' QUERY_TYPE=<QUERY TYPE> JAVA_HOME=<ABS_PATH_TO_JAVA_HOME_ON_AWS> bash scripts/main.sh startBenchmark
Sample commands to launch/start the benchmark
  • To benchmark Solr with phrase queries, on Zing:

    COMMON_LOG_DIR=phrase-queries-on-zing QUERY_TYPE=phrase JAVA_HOME=/home/centos/zing21.07.0.0-3-ca-jdk11.0.12-linux_x64/ bash scripts/main.sh startBenchmark
    
  • To benchmark Solr with term/field queries, on Zulu:

    COMMON_LOG_DIR=field-queries-on-zulu QUERY_TYPE=field JAVA_HOME=/home/centos/zulu11.50.19-ca-jdk11.0.12-linux_x64/ bash scripts/main.sh startBenchmark
    

NOTE:
If QUERY_TYPE=<QUERY_TYPE> is omitted, the benchmark will run against a mix of all the above listed query types

Where to find the results of the benchmark run ?

The results of the benchmark runs are captured in the benchmark.log under COMMON_LOG_DIR
For the 2 sample launches shown above, the final result can be found under:

  • ${WORKING_DIR}/phrase-queries-on-zing/benchmark.log
  • ${WORKING_DIR}/field-queries-on-zulu/benchmark.log

The result is simply reported in a single line in the following format:

Requested rate = <requested_rate> req/sec | Actual rate = <actual_rate_achieved> req/sec (<nubmer requests of submmitted by the client to the Solr cluster> queries in `<duration of benchmark run>` sec)

Sample results:

Requested rate = 100000 req/sec | Actual rate = 47821 req/sec (43039651 queries in 900 sec)
Requested rate = 100000 req/sec | Actual rate = 31619 req/sec (28457667 queries in 900 sec)

In addition to the benchmark.log, the Solr logs, GC logs etc are also collected and stored under COMMON_LOG_DIR after the benchmark run

A simple script to run all the queries on Zing and Zulu multiple times:
for queryType in "field" "phrase" "proximity" "range" "fuzzy"
do
    for i in 1 2 3
    do
        HEADER=zing-${queryType}-run${i}
        COMMON_LOG_DIR=${HEADER} QUERY_TYPE=${queryType} JAVA_HOME=/home/centos/zing21.07.0.0-3-ca-jdk11.0.12-linux_x64/ bash scripts/main.sh startBenchmark

        HEADER=zulu-${queryType}-run${i}
        COMMON_LOG_DIR=${HEADER} QUERY_TYPE=${queryType} JAVA_HOME=/home/centos/zulu11.50.19-ca-jdk11.0.12-linux_x64/ bash scripts/main.sh startBenchmark
    done
done