A fault-tolerant way to replicate an entire CouchDB cluster
$ npm install -g replicate-couchdb-cluster
Usage: replicate-couchdb-cluster -s source -t target options
-s source The URL for the CouchDB cluster from which we will be
replicating
-t target The URL for the CouchDB cluster to which we will be
replicating
Options:
-c max-concurrency The maximum number of concurrent replications. If this
value is omitted then the max-concurrency is defaulted
to 20.
-i dbs-to-skip A comma separated list of DBS to skip
-a Use the target's _replicate API when replicating. This is
particularly useful when you are trying to replicate from
a remote source to localhost. By default, the source's
_replicate API is used.
-v Verbose
-d Debug info such as details of the requests and responses.
Useful for determining why long replications are failing.
Examples:
Replicate all DBs on example1.com to example2.com:
$ replicate-couchdb-cluster -s http://example1.com:5984 -t http://example2.com:5984
Replicate all DBs, except the _users and _replicator DBs:
$ replicate-couchdb-cluster -s http://example1.com:5984 -t http://example2.com:5984 \
-i _users,replicator
Replicate all DBs using SSL and authentication:
$ replicate-couchdb-cluster -s https://admin1:[email protected]:6984 \
-t https://admin2:[email protected]:6984
Replicate all DBs from a remote source to a local source:
$ replicate-couchdb-cluster -s https://admin1:[email protected] \
-t http://localhost:5984
-a
This can be useful for scheduling a reoccurring backup.
Run the replication in the foreground:
$ docker run -it \
-e SOURCE="https://admin1:[email protected]:6984" \
-e TARGET="https://admin2:[email protected]:6984" \
redgeoff/replicate-couchdb-cluster
Replicate every hour in the background. This will persist through server reboots:
$ docker run -d --name replicate-couchdb-cluster \
--restart always \
-e SOURCE="https://admin1:[email protected]:6984" \
-e TARGET="https://admin2:[email protected]:6984" \
-e RUN_EVERY_SECS=3600 \
-e VERBOSE=true \
redgeoff/replicate-couchdb-cluster
Notes:
- If the replication takes longer than RUN_EVERY_SECS, it will result to running the replications back to back. You can use
RUN_EVERY_SECS=0
if you always want the replication to run continuously. - You can view the output at
/var/lib/docker/containers/<container id>/<container id>-json.log
Replicate every day at 23:00 UTC (11 PM). This will persist through server reboots:
$ docker run -d --name replicate-couchdb-cluster \
--restart always \
-e SOURCE="https://admin1:[email protected]:6984" \
-e TARGET="https://admin2:[email protected]:6984" \
-e RUN_AT="23:00" \
-e VERBOSE=true \
redgeoff/replicate-couchdb-cluster
All options:
$ docker run -d --name replicate-couchdb-cluster \
--restart always
-e SOURCE="https://admin1:[email protected]:6984" \
-e TARGET="https://admin2:[email protected]:6984" \
-e RUN_AT="HH:MM" \
-e RUN_EVERY_SECS=3600 \
-e CONCURRENCY=10 \
-e SKIP="_users,_replicator" \
-e USE_TARGET_API=1 \
-e VERBOSE=true \
-e DEBUG=true \
redgeoff/replicate-couchdb-cluster
Note: The RUN_AT and RUN_EVERY_SECS options cannot be used simultaneously. RUN_AT will always take precedence over RUN_EVERY_SECS.
You can also use the API.
Example:
var replicate = require('replicate-couchdb-cluster');
replicate({
source: https://admin1:[email protected]:6984,
target: https://admin2:[email protected]:6984,
concurrency: 10,
skip: ['_users', '_replicator'],
verbose: true,
useTargetAPI: true,
debug: true
}).then(function () {
// Replication done
});
You may be wondering why you would need such a tool when CouchDB 2 automatically replicates data between nodes:
- We feel it is much safer to have a separate backup of our data in case something happens to our data, e.g. we accidentally delete data, there is a hacking attempt, etc...
- Sometimes you want to replicate a cluster to a different region of the world. (The built-in clustering in CouchDB 2 isn't designed to be used across different regions of the world)
- In rare cases, we have found that CouchDB sometimes corrupts database files and in these rare cases, we've had to restore data from our backups.