Moderately opinionated Terraform and Ansible configuration for running XRPL validator clusters on AWS.
This setup runs two types of rippled servers:
Nodes sync with the network, keep a copy of the ledger, and relay transactions. Nodes work in a cluster alongside a validator, acting as a sort of proxy between the network and the validator. They don't participate in consensus though.
Validator runs in a cluster with nodes and participates in consensus.
The validator hides behind a cluster of proxy nodes. The validator sits in an isolated subnet with no direct internet access. The proxy nodes handle all the public-facing traffic and relay messages to the validator.
flowchart TD
Internet((Internet))
subgraph public["Public Subnet"]
PublicNode[Public Node<br/>proxy]
end
subgraph private["Private Subnet"]
PrivateNode[Private Node<br/>proxy]
end
subgraph isolated["Isolated Private Subnet"]
Validator[Validator<br/>proposing]
end
Internet <--> PublicNode
Internet --> PrivateNode
PublicNode --> Validator
PrivateNode --> Validator
- Validator: Hidden in an isolated subnet with its own NAT gateway (prevents IP leakage of any sorts). Only accepts connections from trusted proxy nodes. Runs in "proposing" state.
- Public nodes: Have public IPs, accept inbound peer connections from the internet. Suitable for proxies.
- Private nodes: No public IP, outbound-only connections via NAT. Still proxy for the validator but don't accept random inbound peers. Suitable if you need a node to e.g. use for API access.
All nodes in the cluster share public keys and communicate as trusted peers.
For more background, see the Rabbitkick XRPL Validator Guide.
This repo contains reusable infrastructure code:
terraform/modules/validator-cluster/- Terraform module for the full clusteransible/roles/- Ansible roles for configuring instancesansible/playbooks/- Ansible playbooks
Your actual deployment config (with your AWS account IDs, regions, etc.) lives in a separate private repo that references this one as a git submodule. See terraform/example/ and ansible/example/ for reference configurations.
Each environment (e.g., testnet, mainnet) is a completely isolated cluster with its own VPC, EC2 instances, secrets, and monitoring.
The Terraform module creates a VPC with three tiers of subnets:
- Public subnets for nodes that need to accept inbound connections from the internet. These get public IPs and use an internet gateway.
- Private subnets for nodes that only make outbound connections. Traffic goes through a shared NAT gateway.
- Validator subnet is completely isolated private subnet with a dedicated NAT gateway.
To keep things simple, we decided to deploy rippled to a set of provisioned EC2 instances. The reasons why we don't containerize rippled and run containers in something like ECS or K8S:
- The number of validators will not scale and we will never want to run two validators with the same configuration. In fact, this setup only assumes one validator will ever run in a cluster.
- While it could make sense for nodes to scale, we do not anticipate a demand for scaling based on objective metrics.
- Each node is likely to be very unique in its setup. For example, if a node is also used by an application to fetch historical data, that setup will be quite different to a node that is used for redundancy in a cluster.
- All nodes and validators require SSD storage. We can't trivially location-balance them without making sure we stop them, copy all history and then start on new machine. While this can of course be achieved with a container orchestration tool, we don't anticipate the need right now.
It is quite likely that we will eventually want to have two nodes with long history used for redundancy (so one can be updated while the other one) used by a external application, in which case, for now, we will keep them behind a load balancer.
Each one gets an IAM role scoped to just its own secrets and S3 paths. The module uses instance types with NVMe storage (like z1d) - the local NVMe drive is mounted at /var/lib/rippled for ledger data. This storage is ephemeral (wiped on stop, preserved on reboot), which is fine since rippled can resync from the network. In case we want to store longer history without every losing it, we will need to run occasional backups (which might involve stopping rippled, taking the snapshot, and then continuing)
All instances have termination and stop protection enabled to prevent accidents. They're also set up for Systems Manager access instead of SSH.
EC2 instances are provisioned by Terraform and managed by Ansible.
Each node has two secrets in AWS Secrets Manager. The "secret" one holds sensitive data like the validation seed and SSL keys - only that specific instance can read it. The "var" one holds public data like the validation public key, which all nodes can read so they can build their cluster configuration.
IAM policies ensure nodes can only access their own secrets and other nodes' public data.
The module sets up alarms for both rippled health and system health.
There's also an instance status check alarm that automatically reboots the instance if the OS becomes unresponsive. All alarms notify an SNS topic you can subscribe to.
Also, each cluster gets its dashboard:
Ansible configures the instances after Terraform provisions them. It uses a dynamic inventory that discovers instances by EC2 tags. Instances are grouped by:
env_<environment>- all instances in an environment (e.g.,env_testnet)name_<name>- individual instances (e.g.,name_testnet_validator)role_validator/role_node- by role
- AWS CLI configured with credentials
- Terraform >= 1.0
- Ansible >= 2.10 (install from your package manager;
community.awsandamazon.awscollections typically included) - Session Manager plugin for AWS CLI
User or process running Terraform needs an IAM role with permissions for EC2, VPC, IAM, CloudWatch, SNS, SSM, and S3. The role should cover:
- EC2: Full VPC management (subnets, NAT gateways, security groups, instances)
- IAM: Create/manage roles and instance profiles (scoped to
*-validator,*-node-*,*-ansiblepatterns) - CloudWatch: Alarms, dashboards, log groups
- SNS: Alert topics
- SSM: Patch baselines, maintenance windows
- S3: State bucket access, Ansible SSM bucket, wallet.db backup bucket
mkdir my-xrpl-deployment && cd my-xrpl-deployment
git init
git submodule add https://github.com/commonprefix/xrpl-validator.git xrpl-validator
mkdir -p terraform/testnet ansible/inventoryCopy from the example and customize:
cp xrpl-validator/terraform/example/* terraform/testnet/Edit terraform/testnet/main.tf:
- Update the S3 backend with your state bucket
- Update the provider with your IAM role ARN
- Update the module source to point to the submodule:
module "cluster" { source = "../../xrpl-validator/terraform/modules/validator-cluster" # ... }
- Configure your nodes, region, etc.
Copy from the example and customize:
cp xrpl-validator/ansible/example/* ansible/
mv ansible/aws_ec2.yml ansible/inventory/Edit ansible/inventory/aws_ec2.yml with your IAM role ARN and SSM bucket.
Edit ansible/ansible.cfg to point to the submodule:
[defaults]
inventory = inventory/aws_ec2.yml
roles_path = ../xrpl-validator/ansible/rolesImportant: For new environments, add enable_alarm_actions = false to your Terraform config. This prevents the instance-status-check alarm from auto-rebooting instances before Ansible has a chance to configure them.
# In terraform/testnet/main.tf
module "cluster" {
source = "../../xrpl-validator/terraform/modules/validator-cluster"
# ... other config ...
enable_alarm_actions = false
}Then apply:
cd terraform/testnet
terraform init
terraform applycd ansible
ansible-playbook ../xrpl-validator/ansible/playbooks/site.yml -l env_testnetAfter Ansible completes successfully, remove enable_alarm_actions = false (or set it to true) and re-apply Terraform to enable monitoring:
terraform applyWhen Ansible first runs on a new validator, it doesn't generate validator token, which is necessary to participate in consensus.
You have to do this manually, since you may want to end up storing the key in a secure location which is never online.
The key generated in this step is stored in ~/.ripple/validator-keys.json. We suggest that, once you are done with the process, you delete it and move it to a secure location.
Generate a validator token on a secure machine:
# Generate the master key (store this securely offline!)
/opt/ripple/bin/validator-keys create_keys
# Output: validator-keys.json in ~/.ripple/
# Optionally set your domain for identification
/opt/ripple/bin/validator-keys set_domain yourdomain.com
# Generate a token from the master key
/opt/ripple/bin/validator-keys create_tokenThen add the token to your validator's secret (it might be easiest to use AWS Console to do this, but API call would look like). You can reformat the token to be in one line so it fits JSON nicely.
aws secretsmanager update-secret --region <region> \
--secret-id "rippled/myenv/secret/validator" \
--secret-string '{
"validation_seed": "ssExistingSeed...",
"validator_token": "validation_secret_key..."
}'This is also a good time to create your domain verification file.
Re-run Ansible on the validator to apply:
ansible-playbook playbooks/site.yml -l name_myenv_validatorThe validator will now participate in consensus. You can verify with:
rippled server_info | grep server_stateEach node in the nodes list accepts:
| Field | Required | Description |
|---|---|---|
name |
Yes | Unique name for the node (used in AWS tags, alarms, etc.) |
instance_type |
Yes | EC2 instance type. It's best to use an instance family with NVMe instance storage. |
root_volume_size |
Yes | Root EBS volume size in GB. This needs to be sufficient for logs, configuration, binaries, etc. |
availability_zone |
Yes | Index into availability_zones list (0, 1, etc.) |
validator |
No | Set to true for the validator. Exactly one node must have this. Default: false |
public |
No | Set to true for public subnet with public IP. Validators cannot be public. Default: false |
secret_name |
Yes | AWS Secrets Manager path for sensitive data (validation_seed, validator_token). |
var_secret_name |
Yes | AWS Secrets Manager path for public data (validation_public_key) |
ssl_subject |
No | SSL certificate details for peer connections. Required for non-validator nodes |
ledger_history |
No | Number of ledgers to retain. Default: 6000 |
node_size |
No | rippled node size (tiny, small, medium, large, huge). Default: medium |
domain |
No | Domain for validator verification. Only valid on the validator node. |
hosted_zone_id |
No | Route53 hosted zone ID. Requires domain to be set. |
| Variable | Description | Default |
|---|---|---|
environment |
Environment name (used in resource names, tags) | Required |
region |
AWS region | Required |
availability_zones |
List of AZs to use | Required |
vpc_cidr |
VPC CIDR block | 10.0.0.0/16 |
patch_schedule |
Cron for OS patching (UTC) | cron(0 11 ? * MON *) (Mondays 11:00 UTC) |
log_retention_days |
CloudWatch log retention | 30 |
rippled_log_max_size_mb |
Max rippled log size before rotation |
1024 |
rippled_log_max_files |
Rotated log files to keep | 10 |
enable_alarm_actions |
Enable alarm actions. Set false for initial deployment. |
true |
ansible_role_principals |
IAM ARNs that can assume Ansible role | [] |
alarm_thresholds |
Alarm threshold configuration (see below) | See defaults |
Alarm thresholds:
alarm_thresholds = {
ledger_age_seconds = 20
node_min_peer_count = 5
disk_used_percent = 75
memory_used_percent = 75
cpu_used_percent = 75
}Each node requires two secrets in AWS Secrets Manager. You can either:
- Pre-create them before running Ansible (e.g., when migrating an existing validator)
- Let Ansible create them automatically on first run
If the secrets don't exist or are empty, Ansible runs rippled validation_create to generate new keys and populates both secrets.
secret_name - Sensitive data (only accessible by the node's EC2 instance):
For nodes (SSL cert/key are auto-generated and added on first run):
{
"validation_seed": "ssXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"ssl_key": "-----BEGIN PRIVATE KEY-----\n...",
"ssl_cert": "-----BEGIN CERTIFICATE-----\n..."
}For validators (no SSL - validator doesn't expose peer port):
{
"validation_seed": "ssXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"validator_token": "XXXXXXXXXXXXXXXXXXXXXXXXXXX..."
}The SSL certificate is self-signed with a 10-year validity, using the ssl_subject configuration from Terraform (CN, O, C fields). Once generated, it's stored in the secret and restored on subsequent runs.
var_secret_name - Public data (readable by all nodes for cluster config):
{
"validation_public_key": "n9XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
}If you're migrating an existing validator or want to use specific keys:
# Create the secret with your existing keys
aws secretsmanager create-secret --region ap-south-1 \
--name "`rippled`/myenv/secret/validator" \
--secret-string '{
"validation_seed": "ssYourExistingSeed...",
"validator_token": "eyYourExistingToken..."
}'
aws secretsmanager create-secret --region ap-south-1 \
--name "`rippled`/myenv/var/validator" \
--secret-string '{
"validation_public_key": "n9YourExistingPublicKey..."
}'When Ansible runs, it will detect these existing secrets and use them instead of generating new ones.
rippled stores node identity in wallet.db. Since NVMe storage is ephemeral, systemd services handle backup/restore:
- On boot: Restores wallet.db from S3 before
rippledstarts - Hourly: Backs up wallet.db to S3
- On stop: Backs up wallet.db before shutdown
Each node can only access its own S3 path (IAM-enforced).
If you are migrating, you can put wallet.db in predefined s3 location, alongside using predefined secrets.
Ansible fetches public keys from all other nodes' var_secret_name and builds the cluster configuration. All nodes trust each other as peers.
No SSH. Use AWS Systems Manager Session Manager:
aws ssm start-session --region <region> --target <instance-id>- Add to
nodeslist in Terraform terraform apply- Configure new node:
ansible-playbook playbooks/site.yml -l name_myenv_node_X - Update cluster config on all nodes:
ansible-playbook playbooks/site.yml -l env_myenv
# Connect to instance
aws ssm start-session --region <region> --target <instance-id>
# Run upgrade
sudo /usr/local/bin/update-rippled-aws
# Verify
rippled server_info | grep build_versionFor rolling upgrades: upgrade nodes first (wait for full state), then validator last.
From your private deployment repo's ansible/ directory:
# Run on all instances in environment
ansible-playbook ../xrpl-validator/ansible/playbooks/site.yml -l env_myenv
# Run on specific instance
ansible-playbook ../xrpl-validator/ansible/playbooks/site.yml -l name_myenv_node_1
# Restart rippled everywhere
ansible env_myenv -m systemd -a "name=rippled state=restarted" --become
# Check server state
ansible env_myenv -m shell -a "rippled server_info | jq .result.info.server_state" --become
# List available hosts
ansible-inventory --graphThe module creates:
- CloudWatch Dashboard:
rippled-<env>with server state, peers, ledger metrics, system metrics - CloudWatch Alarms: Server state, ledger age, peer count, cluster connectivity, disk/memory/CPU, reboot required
- SNS Topics:
<env>-rippled-alertsfor notifications
Subscribe to the SNS topic for alerts (email, PagerDuty, Discord, etc.).
The dashboard is created for every environment. Since it is managed by Terraform, please resist the urge to change it directly in AWS console.
XRPL validators can associate themselves with a domain by serving a xrp-ledger.toml file. This lets network participants verify that a validator is operated by who they claim to be. See the xrp-ledger.toml specification for full details.
When you set domain on the validator node, Terraform creates:
- S3 bucket for hosting the
xrp-ledger.tomlfile - CloudFront distribution serving the bucket over HTTPS
- ACM certificate for the domain
If you also set hosted_zone_id, Terraform additionally:
- Creates DNS validation records for the ACM certificate
- Creates a Route53 A record pointing the domain to CloudFront
- Add
domainand optionallyhosted_zone_idto your validator node:
{
name = "myenv-validator"
validator = true
domain = "validator.example.com"
hosted_zone_id = "Z0123456789ABCDEFGHIJ" # optional
# ... other fields
}-
Run
terraform apply. If you providedhosted_zone_id, wait for the certificate to validate (this happens automatically via DNS). -
Generate attestation. Copy your validator-keys.json to a secure location and run:
/opt/ripple/bin/validator-keys set_domain your-domain.com-
Create your
xrp-ledger.tomlfile. Instructions are at xrpl.org. -
Upload the file to S3:
aws s3 cp xrp-ledger.toml s3://<env>-xrpl-validator-domain-verification/.well-known/xrp-ledger.toml- Verify it's working
curl https://validator.example.com/.well-known/xrp-ledger.tomlAnd use https://xrpl.org/resources/dev-tools/xrp-ledger-toml-checker and https://xrpl.org/resources/dev-tools/domain-verifier
The S3 bucket is locked down - only CloudFront can read from it, and uploads require IAM credentials with write permissions.
Warning
CORS Policy
The s3 bucket where TOML file is hosted by default has Access-Control-Allow-Origin: * CORS policy. Please do not use this bucket to serve files where this is inappropriate.
