Requirements
This creates an EMQX cluster with configurable number of core and replicant nodes, emqttb and/or emqtt-bench load generator nodes, and prometheus+grafana instance in a private VPC.
- By default, all instances are launched in a private VPC in the first availability zone, with public IPv4 addresses.
- Security group allows all inbound and outbound traffic to prevent Security group connection tracking from affecting the test results. You can change this in
modules/security_group_rules/main.tf
.
terraform init
terraform apply
# this will take a while, wait until terraform finishes
# if something fails during ansible provisioning, try to re-run corresponding playbook
env no_proxy='*' ansible-playbook ansible/emqx.yml
env no_proxy='*' ansible-playbook ansible/emqttb.yml
# start emqttb benchmark
ansible emqttb -m command -a 'systemctl start emqttb' --become
# (optionally) open emqx dashboard and/or grafana to watch metrics
# or use this helper script to produce a markdown with selected metrics
./scripts/summary.sh
# cleanup when done
terraform destroy
Default emqx dashboard credentials are admin:public
, grafana credentials are admin:grafana
.
Terraform creates infrastructure and provisions ansible inventory and variables, and as the final step invokes ansible playbooks to install emqx, emqttb, emqtt-bench, prometheus and grafana. To start the test, you need to start loadgen manually (as indicated above, and in the examples below). After terraform run you can use ansible separately to start/stop loadgens, manage emqx nodes, etc.
Some examples:
# start/stop benchmark with emqttb
ansible emqttb -m command -a 'systemctl start emqttb' --become
ansible emqttb -m command -a 'systemctl stop emqttb' --become
# start/stop benchmark with emqtt-bench
ansible emqtt_bench -m command -a 'systemctl start emqtt-bench' --become
ansible emqtt_bench -m command -a 'systemctl stop emqtt-bench' --become
# start/stop/restart emqx on all nodes
ansible emqx -m command -a 'systemctl start emqx' --become
ansible emqx -m command -a 'systemctl stop emqx' --become
ansible emqx -m command -a 'systemctl restart emqx' --become
# get emqx cluster status on all nodes
ansible emqx -m command -a 'emqx ctl cluster status' --become
# reinstall emqx
ansible emqx -m shell -a 'apt-get purge emqx -y' --become
ansible-playbook ansible/emqx.yml
# reinstall emqttb
ansible-playbook ansible/emqttb.yml
# reinstall emqtt-bench
ansible-playbook ansible/emqtt-bench.yml
# reinstall emqx with some custom variables
# note: we pass them as a JSON object; otherwise,
# everything is interpreted as a string
ansible-playbook ansible/emqx.yml -e '{"cache_enabled": true, "topics": ["t/a"]}'
To ssh directly into instances, use terraform output to get IP addresses, and generated ssh private key, for example:
ssh -l ubuntu -i ./foobar.pem 52.53.191.91
# generic ssh one-liner to connect to the first emqx node
ssh -l ubuntu -i $(terraform output -raw ssh_key_path) $(terraform output -json emqx_nodes | jq -r '.[0].ip')
# open dashboard in the browser (macos only)
open http://$(terraform output -raw emqx_dashboard_url)
Key name is generated as <id>.pem
(id
is from test spec file).
You can also modify ansible variables directly under ansible/group_vars
and ansible/host_vars
directories, and re-run ansible playbooks without recreating terraform infrastructure.
NOTE: if you see ERROR! A worker was found in a dead state
while running ansible, try to add env no_proxy='*'
before ansible-playbook
commands.
Test specs are yaml files under tests
directory. By default tests/default.yml
is used. You can specify a different test spec file by adding -var spec_file=tests/myspec.yaml
parameter when running terraform.
IMPORTANT: this variable must also be used when you run terraform destroy
to cleanup the infrastructure.
Processing of test spec is implemented in locals.tf
.
- most specific settings override less specific ones
- one can use any combination of regions for the infrastructure as long as there are total of 3 regions or less
- emqx, emqttb and emqtt-bench nodes can be deployed to any region
- load balancer serving emqx dashboard, grafana UI and prometheus UI is deployed to the default region, which means
- at least one emqx node should be deployed to the default region
- monitoring node with grafana and prometheus is deployed to the default region
- scenario for emqttb and emqtt-bench goes directly into rendered systemd unit file, hence, for example, percent symbol should be escaped as
%%
Example test spec:
id: foobar # mandatory field, used as a prefix for all resources, must be unique, but not very long
region: eu-west-1 # default region
instance_type: m6g.large # default instance type
ami_filter: "ubuntu-22.04-arm64-*" # default ami filter
use_spot_instances: true # whether to use spot instances
emqx: # emqx related settings
instance_type: m6g.large # one can override default parameters here for all emqx nodes
data_dir: /data/emqx # data directory for emqx config and files, optional, default is /var/lib/emqx
extra_volumes: # attach extra ebs volumes to emqx nodes (optional)
- mount_point: /data # mount point
volume_size: 30 # volume size in GB
volume_type: gp3 # volume type (gp3, gp2, io1, etc.) - default is gp3
mount_options: defaults,noatime,discard # mount options, default is 'defaults'
nodes: # list of emqx nodes
- role: core # role of the node, can be core or replicant (default is core)
region: eu-west-1
instance_count: 1 # number of instances to launch (default is 1)
- role: replicant
region: eu-west-1
- role: core
region: us-west-1 # override region for this node
instance_type: m7g.large # override instance type for this node
- role: replicant
region: us-west-1
- role: core
region: us-east-1
- role: replicant
region: us-east-1
loadgens: # load generators section
instance_type: m5.large # instance type for all load generators
ami_filter: "ubuntu-22.04-amd64-*" # ami filter for all load generators
nodes:
# mind the '%%' in the scenario - it's quoting for systemd unit file
- type: emqttb
scenario: "@pub --topic t/%%n --conninterval 10ms --pubinterval 10ms --num-clients 100 --size 1kb"
instance_count: 3
- type: emqttb
scenario: "@sub --topic t/%%n --conninterval 10ms --num-clients 10"
instance_count: 3
For example, here is how to test rolling upgrade of emqx-enterprise from 5.3.2 to 5.4.1.
- Create a new test spec file
tests/emqx-enterprise-5.3.2.yml
:
id: emqx-enterprise
region: eu-west-1
use_spot_instances: true
monitoring_enabled: false
ami_filter: "debian-10-amd64-*"
remote_user: admin
emqx:
instance_type: m6a.large
edition: emqx-enterprise
version: 5.3.2
license_file: emqx5.lic
nodes:
- role: core
instance_count: 3
- Run terraform with this spec:
terraform init
terraform apply -var spec_file=tests/emqx-enterprise-5.3.2.yml
- In
ansible/group_vars/emqx5.yml
changeemqx_version
to5.4.1
- Run
emqx_rolling_upgrade.yml
playbook:
ansible-playbook ansible/emqx_rolling_upgrade.yml
Helpful function that you could add to your .bashrc
or .zshrc
.
Alternatively, source scripts/dev-helpers.sh
:
. scripts/dev-helpers.sh
"bm" in the function names stands for "BenchMark".
function bm-start() {
n=${1:-1}
ansible emqtt_bench -m command -a 'systemctl start emqtt_bench' --become -l $(terraform output -json emqtt_bench_nodes | jq -r ".[$((n-1))].fqdn")
}
function bm-stop () {
n=${1:-1}
ansible emqtt_bench -m command -a 'systemctl stop emqtt_bench' --become -l $(terraform output -json emqtt_bench_nodes | jq -r ".[$((n-1))].fqdn")
}
function bmb-start() {
n=${1:-1}
ansible emqttb -m command -a 'systemctl start emqttb' --become -l $(terraform output -json emqttb_nodes | jq -r ".[$((n-1))].fqdn")
}
function bmb-stop() {
n=${1:-1}
ansible emqttb -m command -a 'systemctl stop emqttb' --become -l $(terraform output -json emqttb_nodes | jq -r ".[$((n-1))].fqdn")
}
function bm-ssh() {
node=$(terraform output -json | jq -r 'to_entries[] | select(.key | endswith("_nodes")) | .value.value[] | "\(.ip)\t\(.fqdn)"' | sort -k2 | uniq | fzf | cut -d $'\t' -f 1)
if [[ -n $node ]]; then
ssh -l ubuntu -i $(terraform output -raw ssh_key_path) "$node"
fi
}
function bm-urls() {
echo "dashboard: $(terraform output -raw emqx_dashboard_url)"
echo "grafana: $(terraform output -raw grafana_url)"
}
ansible-playbook ansible/prometheus-snapshot.yml
This will create a directory prometheus_snapshots
in the root of this project and
download a tarball containing the snapshot there. The tarball then must be extracted
preserving permissions:
cd prometheus_snapshots
tar -xvpf 20241007T164452Z-11cc51d6d14856dd.tar.gz
It contains a docker compose manifest that spins up grafana and prometheus to serve snapshot data.
Tip
Currenlty, annotations are not captured in this tarball. They live in Grafana's DB.
cd 20241007T164452Z-11cc51d6d14856dd
docker compose up ; docker compose down
Then access http://localhost:3000
(username/password is admin
/grafana
).
If you want to use perf_events
tests and analysis, add emqx.enable_perf = true
to your test spec. This will install a few tools on the EMQX machines, such as perf-archive
, which current doesn't work by default with pre-installed perf
, hotspot
and flamegraph.pl.
emqx:
enable_perf: true
# ...
Then, when you want to record events, SSH into an EMQX box and run:
## this will record data for 30 seconds
perf record --call-graph=fp --pid $(pgrep beam.smp) -- sleep 30
To analyze recorded data, one option is hotspot. You can run it directly on the EMQX machine so that it may find as many symbols and libraries used as possible.
## assuming we ran the test on the first emqx box
function first_core() {
terraform output -json | jq -r 'to_entries[] | select(.key | endswith("_nodes")) | .value.value[] | "\(.ip)\t\(.fqdn)"' | sort -k2 | uniq | rg core | head -n 1 | cut -d $'\t' -f 1
}
## SSH with X11 forwarding
ssh -X -l ubuntu -i $(terraform output -raw ssh_key_path) $(first_core) /tmp/hotspot.AppImage
Hotspot supports exporting perf_events
data in a portable way (though only it can open the archive), which doesn't require a GUI:
ssh -X -l ubuntu -i $(terraform output -raw ssh_key_path) $(first_core) /tmp/hotspot.AppImage /home/ubuntu/perf.data --exportTo /tmp/perf.data.perfparser
# or
ansible 'emqx[0]' -i ansible/inventory.yml -m command -a "/tmp/hotspot.AppImage /home/ubuntu/perf.data --exportTo /tmp/perf.data.perfparser"
To use flamegraph.pl, we just follow the instructions on its repo:
## ssh into the emqx box, and...
perf script > out.perf
/opt/flamegraph/stackcollapse-perf.pl out.perf > out.folded
/opt/flamegraph/flamegraph.pl out.folded > out.svg