Skip to content

Commit

Permalink
Add average transfer rate to final progress report and tweak bars (#7)
Browse files Browse the repository at this point in the history
A bug report from QA indicated that the speed with which the `extract`
operation was writing to S3 was poor; around 35MiB/s.  That doesn't make
sense, as the same code is used for writing archives in `create` and
writing extracted objects in `extract`.

I suspect the actual problem is that the bytes per second reported by
`indicatif` are very primitive.  They are instantaneous transfer rate
based on the number of bytes added since the last update divided by time
elapsed.  That's not a meaningful number, what we really care about is
the average transfer rate over time.

So I've modified the progress callback trait (warning: breaking change),
and the CLI, to report bytes per second at the end of completed
operations.

In my own testing on an `m6a.2xlarge` instance with a 100GB tar archive
containing 50 files, I got the following results:

```text
$ ssstar extract --s3 s3://tar-file98gb/file s3://anelson-ssstar-test/foo/
Extraction complete!  Read 50 objects (97.66 GiB) from archive in 8 minutes (211.27 MiB/s)
Extracted 50 objects (97.66 GiB)
Skipped 0 objects (0B)
Upload complete!  Uploaded 50 objects (97.66 GiB) in 8 minutes (211.22 MiB/s)
```

This is close to the maximum network transfer speed for the
`m6a.2xlarge` instance type.

I've also removed the useless instantenous bytes per second output from
the progress bars, and changed the sizing of the progress lines to make
more room for message text.

Target of Opportunity Changes
===

These aren't directly related to the progress issue but I made them as
targets of opportunity:

* Correct trivial typo in README
* Improve the doc comments for the Rust crates
* Add some helper scripts to make it easier to test ssstar in AWS
  • Loading branch information
anelson authored Sep 2, 2022
1 parent 1e3619d commit 1506b70
Show file tree
Hide file tree
Showing 15 changed files with 434 additions and 44 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/publish-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ jobs:
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Get ssstar version
- name: Build docker image
shell: bash
run: |
# Tag the docker image with the full semver, major.minor, and major
Expand Down
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Breaking Changes

* Report average bytes per second rates at the end of each stage of operation. This modifies the signature of the
progress callback traits.

## 0.1.3 - 31-Aug-2022

First release published via GitHub Actions. Functionally identical to 0.1.0.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ and extraction, respectively. There are a few command line options that are par

`ssstar` is developed and tested against AWS S3, however it should work with any object storage system that provides an
S3-compatible API. In particular, most of the automated tests our CI system runs actually use [Minio](https://min.io)
and not the real S3 API. To use `ssstar` with an S3-compatible API, use the `--s3_endpoint` option. For example, if
and not the real S3 API. To use `ssstar` with an S3-compatible API, use the `--s3-endpoint` option. For example, if
you have a Minio server running at `127.0.7.1:30000`, using default `minioadmin` credentials, you can use it with
`ssstar` like this:

Expand Down
110 changes: 110 additions & 0 deletions scripts/launch-instance.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
#!/bin/bash
#
# Launches an EC2 instance for testing `ssstar`
set -euo pipefail

scripts_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
default_instance_type="m5.2xlarge"
instance_type="$default_instance_type"
key_name=""
instance_name="test instance for $(whoami)"
instance_market_options="--instance-market-options MarketType=spot"
security_group="launch-instance.sh-sg"

while getopts ":hk:i:n:g:o" opt; do
case ${opt} in
h )
echo "Usage: $0 [options] [-k <key_name>] [-i <instance_type>] [-n <instance name>] [ -o ]"
echo ""
echo "Options:"
echo " -k <key_name> - Use a specified key name (Required)"
echo " -i <instance_type> - Use a specified instance type instead of the default $instance_type"
echo " -n <instance_name> - Give this instance a name to help you identify it in instance lists (prefix: $instance_name)"
echo " -g <security group> - Use this security group (default: $security_group)"
echo " -o - Create an on-demand instance instead of the default spot instance"
exit 0
;;
k )
key_name=$OPTARG
;;

i )
instance_type=$OPTARG
;;

n )
instance_name="$instance_name:$OPTARG"
;;

o)
instance_market_options=""
;;

\? )
echo "Invalid option: $OPTARG" 1>&2
exit 0
;;

: )
echo "Invalid option: $OPTARG requires an argument" 1>&2
exit 0
;;
esac
done

if [[ -z "$key_name" ]]; then
echo "Error: -k <key_name> is required"
exit -1
fi

ami=$(aws ssm get-parameters --names /aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2 --output json | jq ".Parameters[].Value" -r)
. $(dirname "${0}")/utils.sh

# Create the security group if it doesn't exist
if aws ec2 describe-security-groups --group-names "$security_group" 2>&1 > /dev/null; then
echo "Security group $security_group already exists; no need to create"
else
echo "Creating security group $security_group"

security_group_id=$(aws ec2 create-security-group \
--description "Auto-generated security group produced by ${0}" \
--group-name "$security_group" \
--vpc-id $(aws ec2 describe-vpcs --filters "Name=isDefault,Values=true" --output json | jq -r ".Vpcs[].VpcId") \
| jq -r ".GroupId")

aws ec2 authorize-security-group-ingress --group-id "$security_group_id" --protocol tcp --port 22 --cidr 0.0.0.0/0
fi

# 'envsubst` will expand placeholders in the YAML file with the values of the below env vars

# Build the cloud-init script including some env vars
user_data=$(envsubst < $scripts_dir/s0-bootstrap.yml)

echo "Launching \"$instance_name\" (instance type $instance_type) with AMI $ami"

instance_json=$(aws ec2 run-instances \
--image-id "$ami" \
--instance-type "$instance_type" \
$instance_market_options \
--key-name "$key_name" \
--ebs-optimized \
--block-device-mappings "DeviceName=/dev/xvda,Ebs={VolumeSize=40,VolumeType=gp3}" \
--user-data "$user_data" \
--security-groups "$security_group" \
--tag-specifications \
"ResourceType=instance,Tags=[{Key=Name,Value=$instance_name}]" \
--output json)

instance_id=$(echo $instance_json | jq ".Instances[].InstanceId" -r)
echo "Launched EC2 instance id $instance_id"

echo "Querying instance info for public DNS..."
instance_info=$(aws ec2 describe-instances --instance-ids $instance_id --output json)
#echo $instance_info | jq "."
instance_dns=$(echo $instance_info | jq ".Reservations[].Instances[].PublicDnsName" -r)
echo "SSH into the instance with ssh ec2-user@$instance_dns using key $key_name"



# NEXT STEP
# Another script that copies the `elastio` source tree up to the spawned EC2 instance for convenient building
69 changes: 69 additions & 0 deletions scripts/rsync-dir.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#!/bin/bash
#
# Rsync a directory (by default the `ssstar` project in its entirety) to a remote host.
#
# It's assumed the remote host has SSH and rsync installed, and is an Amazon Linux 2 EC2 instance.
set -euo pipefail

scripts_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
src_dir="$(realpath $scripts_dir/../)"
dest_username="ec2-user"
dest_host=""

usage() {
echo "Usage: $0 [-s <source_dir>] [-u <destination_username>] -d <destination_hostname>"
echo "Default source dir: $src_dir"
echo "Default destination username: $dest_username"
}

while getopts ":hs:u:d:" opt; do
case ${opt} in
h )
usage
exit 0
;;
s )
src_dir=$OPTARG
;;

u )
dest_username=$OPTARG
;;

d )
dest_host=$OPTARG
;;

\? )
echo "Invalid option: $OPTARG" 1>&2
usage
exit 1
;;

: )
echo "Invalid option: $OPTARG requires an argument" 1>&2
usage
exit 1
;;
esac
done

if [[ -z $dest_host ]]; then
echo "A destination hostname is required" 1>&2
usage
exit 1
fi

# By default, use the name of the source dir (without its entire path), and copy it into the
# home directory on the destination host
src_dir_path=$(realpath $src_dir)
src_dir_name=$(basename $src_dir_path)
dest_path="/home/$dest_username"

echo Copying $src_dir_path to $dest_path on $dest_host

# Use rsync to copy changed files, excluding anything that's ignored by `.gitignore`
rsync --info=progress2 -azzhe ssh \
--filter=":- .gitignore" \
$src_dir_path \
$dest_username@$dest_host:$dest_path
79 changes: 79 additions & 0 deletions scripts/s0-bootstrap.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
#cloud-config
#
# Bootstrap a `s0` server on Amazon Linux 2.
#
# This script is intended to run as part of the cloud init

# Very early in the boot process, enable the EPEL repo
bootcmd:
- [ "cloud-init-per", "once", "amazon-linux-extras-epel", "amazon-linux-extras", "install", "epel" ]

# Always pull in the latest updates
repo_update: true
repo_upgrade: all

packages:
- python3-pip
- clang
- llvm-devel
- libudev-devel
- openssl-devel
- jq
- daemonize
- libblkid-devel
- parted-devel

write_files:
# Add environment vars to enable metrics push by default
# Configure the AWS region appropriately
- path: /etc/profile
append: true
content: |
# These entries appended by the cloudinit script in s0-bootstrap.yml
if command -v jq > /dev/null; then
# The default region should be the region this instance runs in (duh!)
export AWS_DEFAULT_REGION=$(curl --silent http://169.254.169.254/latest/dynamic/instance-identity/document | jq ".region" -r)
else
echo "WARNING: jq isn't installed yet. You probably shelled in too early. Give the instance a few more seconds and log in again"
fi
# Can't always rely on the .cargo/config populated below to force native builds.
export RUSTFLAGS=-Ctarget-cpu=native
source ~/.cargo/env
# Tell cargo to always use all of the native CPU features
# Since we don't use these instances to build dist binaries, we don't care about making binaries that are compatible with all CPUs
- path: /.cargo/config
content: |
[target.'cfg(any(windows, unix))']
rustflags = ["-Ctarget-cpu=native"]
- path: /etc/security/limits.d/99-elastio.conf
content: |
ec2-user soft nofile 20000
ec2-user hard nofile 100000
root soft nofile 20000
root hard nofile 100000
runcmd:
# Need development tools to support Rust
- yum groupinstall -y "Development Tools"

# The EPEL repo has some additional packages we need
- yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

# viy depends on some libraries for file systems
- yum install -y e2fsprogs-devel xfsprogs-devel

# until https://github.com/elastio/elastio-snap/issues/55 is fixed, need the kernel headers
- yum install -y kernel-devel

# Need the AWS CLI and ansible
- pip3 install awscli ansible boto3

# Install rust for the ec2-user
- su ec2-user --session-command "curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y"


5 changes: 5 additions & 0 deletions scripts/utils.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
function get_ssm_secret {
local name="$1"

aws ssm get-parameter --name "$name" --with-decryption --output json | jq ".Parameter.Value" -r
}
2 changes: 2 additions & 0 deletions ssstar-cli/src/main.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
#![doc = include_str!("../../README.md")]

use clap::{ArgGroup, Parser, Subcommand};
use ssstar::{CreateArchiveJobBuilder, ExtractArchiveJobBuilder, SourceArchive, TargetArchive};
use std::path::PathBuf;
Expand Down
Loading

0 comments on commit 1506b70

Please sign in to comment.