- Overview
- Benchmark setup
- What does the benchmark do ?
- Dataset used in the benchmark
- How to run the benchmark
A simple to use, benchmark setup that takes care of setting up a Solr cluster on AWS and benchmarks it against different query types
Setup includes:
- 4 Solr nodes
- solr-node-1
- solr-node-2
- solr-node-3
- solr-node-4
- 3 Zookeeper nodes
- zoo-node-1
- zoo-node-2
- zoo-node-3
- 1 Client node
[Load Generator]
- solrj-client-1
All the above components (including the client [Load Generator]
will run on individual AWS nodes)
The current benchmark setup uses:
- Solr version: solr-7.7.3
- Zookeeper version: zookeeper-3.4.13
In the current state, this benchmark setup can be used to benchmark Solr on the following JDK's:
- Zing: zing21.07.0.0-3-ca-jdk11.0.12
- Zulu: zulu11.50.19-ca-jdk11.0.12
The solr version
, zookeeper version
and JDK of choice
can be easily changed by modifying init.sh file
The benchmark setup includes a client/load generator built using SolrJ library
It is run on a dedicated AWS node to benchmark the Solr cluster
It currently supports benchmarking Solr cluster with 5 types of search/select queries:
In addition to the search requests, a typical Solr application also needs to deal with update requests, and the performance of the search requests are affected depending on how the update requests are handled.
In order to study this effect, the current benchmark setup also has the ability to run
update operations/queries (atomic updates)
along side the search/select queries mentioned above
But these update operations are run as a background task and the performance of only the search/select operations are measured
The search/select queries that are used are stored in text files
Depending on the type of query chosen for benchmarking, the relevant query files are read by the client and requests are
continuously submitted to Solr cluster
The benchmark allows sending the requests at a fixed target rate
But in order to measure the peak throughput that can be achieved, the target rate (targetRateForSelectOpAtWarmup
, targetRateForSelectOp
)
is deliberately set to a very high value (see config file) and the actual rate achieved is recorded.
The background update operations are run at a fixed rate of 1000 requests/sec
A ~50GB wikimedia dump (link) is indexed into the Solr cluster
against which the benchmark is run
It is a derivative of pages-articles-multistream.xml.bz2 by the Wikimedia Foundation,
used under CC BY-SA 3.0
This data dump is licensed under CC BY-SA 3.0 by Azul Systems, Inc.
Only 3 steps are necessary to prepare the setup for benchmarking:
- First, clone this repo
- Provision the necessary AWS nodes
- Configure provisioned nodes
Since the entire cluster (3 node zookeeper ensemble + 4 Solr nodes) and the client/load generator runs on AWS instances,
a light-weight instance is sufficient to act as a central coordinator or leader to take care of running the above 3 steps,
starting the benchmark runs on the cluster, collecting the results of the benchmark runs etc ...
This light-weight instance can either be user's laptops (with linux/mac OS) or a separate small AWS instance can be used
To provision the necessary AWS instances, follow the instructions here
Run the below command to configure the nodes, install necessary tools, download the necessary artifacts etc ...
bash scripts/setup.sh all
NOTE: Make sure JAVA_HOME
env (pointing to JDK11
) is set on the host which runs this script
The above command takes care of the following:
- prepares 3 node Zookeeper ensemble (zoo-node-1, zoo-node-2, zoo-node-3)
- prepares 4 node Solr cluster (solr-node-1, solr-node-2, solr-node-3, solr-node-4)
- prepares a client node (solrj-client-1)
- wikimedia dump is indexed into the Solr cluster
QUERY_TYPE=<QUERY TYPE> JAVA_HOME=<ABS_PATH_TO_JAVA_HOME_ON_AWS> bash scripts/main.sh startBenchmark
NOTE: To pass additional JVM args to the Solr cluster, SOLR_JAVA_MEM
and GC_TUNE
env variables can be used:
GC_TUNE='-XX:-UseZST -XX:+PrintGCDetails' SOLR_JAVA_MEM='-Xms55g -Xmx70g' QUERY_TYPE=<QUERY TYPE> JAVA_HOME=<ABS_PATH_TO_JAVA_HOME_ON_AWS> bash scripts/main.sh startBenchmark
-
To benchmark Solr with phrase queries, on Zing:
COMMON_LOG_DIR=phrase-queries-on-zing QUERY_TYPE=phrase JAVA_HOME=/home/centos/zing21.07.0.0-3-ca-jdk11.0.12-linux_x64/ bash scripts/main.sh startBenchmark
-
To benchmark Solr with term/field queries, on Zulu:
COMMON_LOG_DIR=field-queries-on-zulu QUERY_TYPE=field JAVA_HOME=/home/centos/zulu11.50.19-ca-jdk11.0.12-linux_x64/ bash scripts/main.sh startBenchmark
NOTE:
If QUERY_TYPE=<QUERY_TYPE>
is omitted, the benchmark will run against a mix of all the above listed query types
The results of the benchmark runs are captured in the benchmark.log
under COMMON_LOG_DIR
For the 2 sample launches shown above, the final result can be found under:
${WORKING_DIR}/phrase-queries-on-zing/benchmark.log
${WORKING_DIR}/field-queries-on-zulu/benchmark.log
The result is simply reported in a single line in the following format:
Requested rate = <requested_rate> req/sec | Actual rate = <actual_rate_achieved> req/sec (<nubmer requests of submmitted by the client to the Solr cluster> queries in `<duration of benchmark run>` sec)
Sample results:
Requested rate = 100000 req/sec | Actual rate = 47821 req/sec (43039651 queries in 900 sec)
Requested rate = 100000 req/sec | Actual rate = 31619 req/sec (28457667 queries in 900 sec)
In addition to the benchmark.log
, the Solr logs, GC logs etc are also collected and stored under COMMON_LOG_DIR
after the benchmark run
for queryType in "field" "phrase" "proximity" "range" "fuzzy"
do
for i in 1 2 3
do
HEADER=zing-${queryType}-run${i}
COMMON_LOG_DIR=${HEADER} QUERY_TYPE=${queryType} JAVA_HOME=/home/centos/zing21.07.0.0-3-ca-jdk11.0.12-linux_x64/ bash scripts/main.sh startBenchmark
HEADER=zulu-${queryType}-run${i}
COMMON_LOG_DIR=${HEADER} QUERY_TYPE=${queryType} JAVA_HOME=/home/centos/zulu11.50.19-ca-jdk11.0.12-linux_x64/ bash scripts/main.sh startBenchmark
done
done