Skip to content

NII-cloud-operation/CoursewareHub-LC_platform

Repository files navigation

Clone repository:

$ git clone [email protected]:axsh/jupyter-platform-dev.git
$ cd jupyter-platform-dev

Generic Build

For a generic build, the build directory can be create either before or after the instance servers. For the explanation here, let's start with the build directory.

$ nodecount=3 ./ind-steps/build-jh-environment/toplevel-generic-build.sh-new /some/directory/path/buildname
## ( The value for the environment variable nodecount could be 1, 2 or some other reasonable integer. )

The toplevel-generic-build.sh-new script creates a folder structure with the following files and contents:

$ head $(find /some/directory/path/buildname -name datadir.conf)
==> /some/directory/path/buildname/jhvmdir-node3/datadir.conf <==
VMIP=192.168.999.999  # replace with the private IP used between instances
publicip=180.123.999.999 # replace with IP used by this script
publicport=22   # if needed, replace with the port used by this script

==> /some/directory/path/buildname/jhvmdir-node2/datadir.conf <==
VMIP=192.168.999.999  # replace with the private IP used between instances
publicip=180.123.999.999 # replace with IP used by this script
publicport=22   # if needed, replace with the port used by this script

==> /some/directory/path/buildname/jhvmdir-node1/datadir.conf <==
VMIP=192.168.999.999  # replace with the private IP used between instances
publicip=180.123.999.999 # replace with IP used by this script
publicport=22   # if needed, replace with the port used by this script

==> /some/directory/path/buildname/jhvmdir/datadir.conf <==
VMIP=192.168.999.999  # replace with the private IP used between instances
publicip=180.123.999.999 # replace with IP used by this script
publicport=22   # if needed, replace with the port used by this script

==> /some/directory/path/buildname/jhvmdir-hub/datadir.conf <==
VMIP=192.168.999.999  # replace with the private IP used between instances
publicip=180.123.999.999 # replace with IP used by this script
publicport=22   # if needed, replace with the port used by this script

==> /some/directory/path/buildname/datadir.conf <==
node_list="node1 node2 node3"

Each jhvmdir*/datadir.conf file should be edited to contain information for one instance. (In this example, there are 5, i.e. three docker swarm instances plus a hub instance, plus an instance for ansible.)

The publicip variable value should be replaced by an IP address that can be used to ssh from the machine hosting the build directory to the corresponding instance. The publicport variable value should point to the ssh port, if port forwarding is used to reach the instance.

VMIP should be a private IP address visible to all the other instances. Ssh must be possible to port 22 on this address.

If necessary, the ssh wrapper scripts for each instance can be modified directly. Normally this should only be necessary if special ssh parameters or workarounds are required.

$ cd /some/directory/path/buildname
$ find -name ssh-shortcut.sh
jhvmdir/ssh-shortcut.sh
jhvmdir-hub/ssh-shortcut.sh
jhvmdir-node3/ssh-shortcut.sh
jhvmdir-node2/ssh-shortcut.sh
jhvmdir-node1/ssh-shortcut.sh

Each instances should be a fresh install of Ubuntu 14.4 with an account with the user name "ubuntu". It should also have the same public ssh key saved at /home/ubuntu/.ssh/authorized_keys and /root/.ssh/authorized_keys. The corresponding private key should be saved in the build directory in a file named sshkey. The commands apt-get update and apt-get upgrade should be run on each instance.

Once all the instances exist and all the information has been edited into the datadir.conf files, the following will install JupyterHub, taking somewhat more than 60 minutes:

$ /path/to/just/a/little/disk/buildname/toplevel-generic-build.sh do

The build can be checked by running:

$ /path/to/just/a/little/disk/buildname/toplevel-generic-build.sh check

Build on KVM

The directory ~/ubuntu-image-resources must exist in the home directory and contain the following files:

$ cd ~/
$ cd ubuntu-image-resources/
$ ls -l
total 580760
-rw-r--r-- 8 k-oyakata k-oyakata 594675764 Dec  6 23:16 ubuntu-14-instance-build.img-sshkeys-update-upgrade.tar.gz
-rw-r--r-- 4 k-oyakata k-oyakata      1675 Jul 15  2016 ubuntu-14-instance-build.img-sshkeys-update-upgrade.sshkey
-rw-r--r-- 4 k-oyakata k-oyakata         7 Jul 15  2016 ubuntu-14-instance-build.img-sshkeys-update-upgrade.sshuser

The *.tar.gz file contains Ubuntu 14.04.1 LTS with a 242GB root file system. It was made by doing a fresh install from an ISO, then apt-get update, then apt-get upgrade. Finally, a public key was placed in both /home/ubuntu/.ssh/authorized_keys and /root/.ssh/authorized_keys. The private part of the key pair is in the *.sshkey. The *.sshuser file just contains the string "ubuntu", because that is the user name to use when doing ssh to a VM booted from the image.

The next step is to make a build directory by using the toplevel-kvm-build.sh-new in the repository like this:

$ ./ind-steps/build-jh-environment/toplevel-kvm-build.sh-new /some/directory/path/buildname

Be sure to substitute /some/directory/path with a path for a disk that has 60GB or so of free disk space.

The above step quickly creates a new directory tree that includes this structure:

$ cd /some/directory/path/buildname
$ find -name datadir.conf
./datadir.conf
./jhvmdir/datadir.conf
./jhvmdir-hub/datadir.conf
./jhvmdir-node1/datadir.conf
./jhvmdir-node2/datadir.conf

Each jhvmdir* represents one of the 4 VMs for the build, and its datadir.conf gives configuration information used during building. Additional information from the build will be added to the appropriate datadir.conf.

The actual build is done by running a script that is now inside the build directory:

$ /some/directory/path/buildname/toplevel-kvm-build.sh do

The whole build takes about 60 to 90 minutes.

The following command can be used to verify which steps of the build have completed. (The same as above, just change do to check)

$ /some/directory/path/buildname/toplevel-kvm-build.sh check

The above command will output a list of steps similar to this: https://github.com/axsh/jupyter-platform-dev/blob/master/ind-steps/build-jh-environment/toplevel-kvm-build-map.md

The build defaults to 2 docker swarm nodes. This can be changed with the nodecount environment variable.

$ nodecount=3 ./ind-steps/build-jh-environment/toplevel-kvm-build.sh-new /some/directory/path/buildname

Build on AWS

Install awscli: http://docs.aws.amazon.com/cli/latest/userguide/installing.html Also make sure .aws/config and .aws/credentials are set up correctly.

Then:

$ ./ind-steps/build-jh-environment/toplevel-aws-build.sh-new /path/to/just/a/little/disk/buildname

$ /path/to/just/a/little/disk/buildname/toplevel-aws-build.sh check
$ /path/to/just/a/little/disk/buildname/toplevel-aws-build.sh do

(Some waits still need to be implemented, so repeating "toplevel-aws-build.sh do" several times may be necessary.)

Final Setup

Building takes a long time, so two solutions were made to make development go faster: snapshots and jhvmdir reuse.

Snapshots:

(Snapshots only work for KVM builds, and have not been implemented for AWS builds yet.)

The snapshot-whole-environment.sh script shutdowns all VMs and then makes a tar file of each VM directory. For example:

$ ./ind-steps/build-jh-environment/snapshot-whole-environment.sh build-feb15/ guest do

The "guest do" part is fixed and required, for reasons that will be explained elsewhere.

Now the build-feb15/ has a collection of tar files that can be used to create new JupyterHub environments quickly. It is done in two steps:

$ ./ind-steps/build-jh-environment/restore-environment-from-snapshot.sh-new build-feb15  build-feb15-copy1
$ ./ind-steps/build-jh-environment/restore-environment-from-snapshot.sh build-feb15-copy1 do

Now all VMs and containers are running, but more setup is probably necessary before the JupyterHub environment can be used.

JHVMDIR Reuse

A build environment is made up of one "hub" VM, one or more "node" VMs, and one extra "jhvmdir" VM. (Sometimes it is called the "main" VM or the "ansible" VM in code comments or documentation) Its purpose is to give ansible a stable place to run, and also to cache docker images after they have been built.

A new feature allows build environments to share a "jhvmdir" VM. This feature works with both KVM builds and AWS builds.

For discussion, assume an AWS environment has been built following the instructions above, with the result being a build directory at "/path/to/just/a/little/disk/buildname" that points to a "jhvmdir", "hub", and "node" VMs on AWS. To reuse the "jhvmdir" for another build, use the environment variable $mainbuilddir to specify the existing build directory. For example:

$ mainbuilddir="/path/to/just/a/little/disk/buildname" \
    ./ind-steps/build-jh-environment/toplevel-aws-build.sh-new /path/to/just/a/little/disk/buildname2

$ /path/to/just/a/little/disk/buildname2/toplevel-aws-build.sh check
$ /path/to/just/a/little/disk/buildname2/toplevel-aws-build.sh do

After you run the the "check" command, you should see that many build steps have already been done. This makes building much faster. For example, a recent build from scratch on AWS took about 50 minutes, but building a second JupyterHub environment by reusing the "jhvmdir" VM reduced the build time to less than 25 minutes.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published