Bash scripts and Dockerfiles for building and deploying Chipster's Rest API backend on OpenShift Origin.
Let's assume we are restoring the sesssion-db backup.
- Scale down session-db and backup deployments to 0
- Remove the old database. Use session-db-postgres pod Terminal to run
dropdb session_db_db
. If postgres (and thus the pod) refuses to start, use the debug terminal and remove the postgres data folder)
dropdb session_db_db
- Create a new database in the terminal of the session-db-postgres pod
createdb session_db_db
psql session_db_db
alter system set synchronous_commit to off;
# press Ctrl+D to quit psql
pg_ctl reload -D /var/lib/pgsql/data/userdata
- Scale the backup deployment to 1. Run on you laptop
oc rsh dc/backup
export PGPASSWORD="$(cat /opt/chipster-web-server/conf/chipster.yaml | grep db-pass-session-db | cut -d " " -f 2)"
PGURL="$(cat /opt/chipster-web-server/conf/chipster.yaml | grep db-url-session-db | cut -d " " -f 2 | sed s/jdbc://)"
pushd db-backups
BACKUP_FILE=""
# check the uncompressed file size
cat $BACKUP_FILE | lz4 -d | pv > /dev/null
cat $BACKUP_FILE | lz4 -d | pv | psql --dbname $PGURL --username user > ../logs/session-db-restore.log
Running psql
to restore a sql dump processes the dump file about 70 kB/s when it's creating large objects and 800 kB/s when it's creating blobs. With that speed it takes several hours to restore a database of few gigabytes. pg_restore
would support parallel restore, but it works only for custom and directory formats, whereas we have a sql dump.
It's seems to be possible to speed this up by splitting the dump and creating large objects and blobs in parallel. At the moment this produced log warnings of files extending beyond their eof mark, let's try again later when the postgres data is not stored on GlusterFS.
# calculate row numbers of each section
lo_start="$(cat all.sql | grep "SELECT pg_catalog.lo_create" -n | head -n 1 | cut -d ":" -f 1)"
table_start="$(cat all.sql | grep "Data for Name: " -n | head -n 1 | cut -d ":" -f 1)"
blobs_start="$(cat all.sql | grep "Data for Name: BLOBS" -n | cut -d ":" -f 1)"
commit_end="$(cat all.sql | grep "COMMIT;" -n | cut -d ":" -f 1)"
eof="$(cat all.sql | wc -l)"
# split the file sections to separate files
# these numbers happened to work for one file, calculate more specific row numbers above if necessary
sed -n "1,$(($lo_start - 5))p" all.sql > start.sql
sed -n "$(($lo_start - 4)),$(($table_start - 2))p" all.sql > lo.sql
sed -n "$(($table_start - 1)),$(($blobs_start - 2))p" all.sql > tables.sql
sed -n "$(($blobs_start - 1)),$(($blobs_start + 4))p" all.sql > blobs_begin.sql
sed -n "$(($blobs_start + 5)),$(($commit_end - 1))p" all.sql > blobs.sql
sed -n "$(($commit_end - 0)),$(($commit_end + 1))p" all.sql > blobs_commit.sql
sed -n "$(($commit_end + 2)),$(($eof + 1))p" all.sql > end.sql
# split the large object and blob files to smaller peaces
# each record is 9 rows
mkdir -p lo
pushd lo
split -l $((9 * 10000)) ../lo.sql lo_
popd
# there is a varying number of rows in each record, split from the empty lines
# collect line numbers of empty lines first
mkdir -p blobs
pushd blobs
cat ../blobs.sql | grep -n "^$" | cut -d ":" -f 1 > empty-lines
# then split the the line numbers to suitably sized pieces
split -l $((4 * 10000)) empty-lines blobs_
# finally, split the sql based on the last line number in each file
start=1
for f in $(ls blobs_*); do
end="$(tail -n 1 $f)"
echo $f $start $end
sed -n "$(($start)),$(($end))p" ../blobs.sql > sql_${f}
cat ../blobs_begin.sql sql_${f} ../blobs_commit.sql > trans_${f}
rm sql_${f} ${f}
start=$(($end + 1))
done
popd
# restore
time cat start.sql | pv | psql --dbname $PGURL --username user
time ls lo/* | parallel -j 10 "cat {} | psql --dbname $PGURL --username user"
time cat tables.sql | pv | psql --dbname $PGURL --username user
time ls blobs/trans_blobs_* | parallel -j 10 "cat {} | psql --dbname $PGURL --username user"
time cat end.sql | pv | psql --dbname $PGURL --username user