This is a JupyterHub configuration and extension of the original spec with a few extras:
- Users can choose how much resources they want
- It displays metrics and tool suggestion requirements to users on the spawn page
Before we expose a powerful machine to the internet and unleash it on a few dozen users, we need to set some boundaries.
First we need to enable Docker, and make note of the network it makes, since this is information needed in the Jupyter configuration file.
First check the current addresses:
> ip a
1: lo:....
.....
inet 127.0.0.1/8
...
2: eth0: ....
....
inet blahblahblah/24 ....
....
3: eth1: ...
....
inet blahblahblah/24 ....
...
we don’t see any network device starting with “docker”, so we need to start docker
> sudo systemctl start docker
> ip a
1: lo ...
2: eth0 ...
3: eth1 ...
4: docker0 ...
...
inet 172.17.0.1/16
This is what we want to see, the inet address of the docker0
device, 172.17.0.1
.
Remember this, we will use this for later.
UFW (Uncomplicated Firewall) is a great firewall for blocking unwanted connections. The Pharma2-53 is already behind a firewall and does not accept outside connections from the facility
You can verify this by looking at the output of
> sudo ufw status
Status: active
To Action From
-- ------ ----
137/udp ALLOW blahblahblah/24
138/udp ALLOW blahblahblah/24
139/tcp ALLOW blahblahblah/24
445/tcp ALLOW blahblahblah/24
137/udp ALLOW blahblahblah/24
138/udp ALLOW blahblahblah/24
138/tcp ALLOW blahblahblah/24
445/tcp ALLOW blahblahblah/24
22/tcp ALLOW blahblahblah/24
22/tcp ALLOW blahblahblah/24
22/tcp ALLOW blahblahblah/16
At this point, any Docker containers that we make will be blocked by the system, so we need to create a new allow rule.
(Note: the machine you are using to SSH into the Pharma2-53 device, should share the prefix of one of the From addresses in the list above, otherwise you will lose ssh access in the next step.)
> sudo ufw allow from 172.17.0.0/16
> sudo ufw status
...
...
Anywhere ALLOW 172.17.0.0/16
Now we see that we have added an allow exception rule for the docker0 device.
Jupyter allows authenticated users to execute commands by offering them a terminal they can use. For this reason, it does not make sense to offer all users the ability to SSH into the machine, since they could wreak havoc on the services there.
We adjust the allowed ssh users by modifying /etc/ssh/sshd_config
,
and changing the line to:
AllowUsers user1 user2 user3
where these correspond to trusted admin usernames on the system
We then restart ssh to refresh these changes
sudo systemctl restart ssh
With the system configured for Docker and Security, we can proceed with the Jupyter Installation.
The installation comes in two parts:
- Installing the modified Jupyter base installation
The original jupyterhub does not freely offer metrics on a per-user basis, so I forked their repository and implemented it myself.
That is, we are not using vanilla JupyterHub, but JupyterHub+Metrics.
- Installing a custom Docker Spawner
A spawner is what Jupyter uses to create kernels (essentially notebooks) for each user. There are many different types, but the one we are interested in is the SystemUserSpawner which is a type of DockerSpawner (which is a kernel that creates Docker containers, instead of running everything as a single process on the machine).
Unfortunately, the SystemUserSpawner restricts kernels equally, meaning that all users get the same requirements. This is good if users all have the same demands, but typically they don’t.
Fortunately, one can extend SystemUserSpawner into a custom class I wrote called
DockerSystemProfileSpawner
which allows per-user customization, and we will go into detail about how to configure it later.
First thing’s first, we backup any existing Jupyter installation. On the Pharma2-53 machine, this involves stopping the existing JupyterHub service and moving any config files to a backup location:
sudo systemctl stop jupyterhub
sudo mkdir /opt/__<date>_jupyter_backup
sudo mv /etc/systemd/system/jupyterhub.service /opt/__<date>_jupyter_backup/
sudo mv /opt/jupyterhub/* /opt/__<date>_jupyter_backup/
The JupyterHub that we will be installing is based on version
5.0.0.dev
which is pretty new as of 2024-03-13.
It needs up-to-date Node and Python libraries, which are not a problem for bleeding edge Operating systems like Arch Linux, but is a problem for more stable OS’s like Ubuntu.
We upgrade the Node libraries in Ubuntu via
sudo apt-get update && sudo apt-get install -y ca-certificates curl gnupg
curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg
export NODE_MAJOR=21
echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$NODE_MAJOR.x nodistro main" | sudo tee /etc/apt/sources.list.d/nodesource.list
sudo apt-get update && sudo apt-get install nodejs -y
Verify that we are on version 21 via
node --version
The version of JupyterHub we’re using relies on a pretty modern Python. To avoid any discrepancies between system Python and Jupyter Python, we will build our own Python, seperate from the system.
export MYPYVER=3.11.8
export INSTALLHERE=/opt/jupyterhub/python-${MYPYVER} ## must be an absolute path
## Get and unpack python sources
cd /opt/jupyterhub
wget http://www.python.org/ftp/python/${MYPYVER}/Python-${MYPYVER}.tgz
tar -zxvf Python-${MYPYVER}.tgz
## specify installation directory
mkdir ${INSTALLHERE}
cd Python-${MYPYVER}
CXX=$(command -v g++) ./configure --prefix=${INSTALLHERE} --enable-optimizations --enable-loadable-sqlite-extensions
make
make install
## Remove unneeded source files
rm -rf /opt/jupyterhub/Python-${MYPYVER}.tgz /opt/jupyterhub/Python-${MYPYVER}
At this point we have 1 directory
> tree /opt/jupyterhub
/opt/jupyterhub/
└─ python-3.11.8
We need to prepare the other directories now, the custom Jupyter install, and the custom DockerSpawner.
Let’s clone the needed repos
- Clone this repo…
cd /opt/jupyterhub
git clone https://gitlab.com/mtekman/jupyterhub-pharma253
We do a shallow clone and use the “sysmon” branch
cd /opt/jupyterhub
git clone --depth 1 https://github.com/mtekman/jupyterhub/ -b sysmon jupyterhub-metrics
At this point we now have 3 directories
> tree /opt/jupyterhub
/opt/jupyterhub/
├─ jupyterhub-metrics (our custom jupyterhub)
├─ jupyterhub-pharma253 (the custom docker spawner)
└─ python-3.11.8 (our custom python)
We built our own Python previously in the
/opt/jupyterhub/python-3.11.8
directory, but we haven’t actually
used it yet or installed any necessary packages into it.
To do so, we create a virtual environment from it, and we keep it inside the the pharma directory.
cd /opt/jupyterhub/jupyterhub-pharma253
/opt/jupyterhub/jupyterhub-metrics/bin/python -m virtualenv venv_jupyter_metrics
Now we source this environment. We install packages inside of it and use it for launching Jupyter.
source venv_jupyter_metrics/bin/activate ## we've sourced it
pip install ../jupyterhub-metrics/ ## install the dependencies of jupyter
pip install dockerspawner psutil configurable-http-proxy ## install other dependencies
At this point Jupyter with metrics is installed. We just need to configure it.
The config file is actually a python script, so we use it to import our custom spawner, and to configure the different components of the Hub.
Ignore the first few lines, these just tell python to consider the current directory when looking for modules.
You should set the jupyter_venv
variable to the absolute path of the
venv_jupyter_metrics
virtual environment we made earlier
jupyter_venv = "/opt/jupyterhub/jupyterhub-pharma253/venv_jupyter_metrics/"
We need to define our admin users who will have permissions to oversee the server and access the servers of other users.
c.Authenticator.admin_users = ['memo', 'admin']
Here we define two users: “memo” and “admin” which are valid system user accounts.
Users can read/write to their home directories, but they might need other directories accessible too. Here we specify some paths that we want mounted from host into the container.
For each item in the list, the host path is mounted into the image at the exact same path (issues?)
c.JupyterHub.spawner_class.volumes_ro = [
"/opt/bioinformatic_software/",
"/media/daten/software/"
]
We also need to tell Jupyter what kind of server this is by setting
the server_type
variable.
- “local”
Jupyter will be served only on the local machine over an insecure http protocol.
If you wish to still use this server as is, but open it up to the entire network, then change the
c.JupyterHub.ip
variable near the bottom to “0.0.0.0”. - “https”
Jupyter will be served over the internet over a secure https protocol.
You will need to configure the
c.JupyterHub.ssl_cert
andc.JupyterHub.ssl_key
variables with your HTTPS certificate fullchain and privkeys that you will get from certbot. See the HTTPS Certification section later. - “proxy”
Jupyter will be server over the internet through a secure proxy. Users will not connect directly to this machine, but will connect first to a proxy device, and the proxy device will tunnel all requests to the machine.
The certificates do not matter here, since all certification is performed on the proxy machine and not on the Jupyter machine.
You will need to configure the
c.JupyterHub.bind_url
variable to point to the http proxy address and port. See the Proxy Machine section later.
The Pharma2-53 machine does not allow for direct outside connections (see the Firewall section previously). So either you make a few exceptions to allow port 80 (http) and port 443 (https) in the firewall, or we use the proxy option
server_type = "proxy"
This section describes the way we can configure what resources are offered to the users. The recommended CPU and MEM profiles, with maximum limits, the Docker images they can use, and the per-user overrides.
Here we set 5 resource profiles that users can choose from, defined by how many CPU cores and how many GB’s of RAM they can consume.
c.JupyterHub.spawner_class.resource_profiles = {
## These are maximum LIMITs to which a Docker Image can run.
## - At the same time, you can PREALLOCATE resources, see the preallocate
## subentry in the user_profiles
"Tiny" : {"cpu_limit": 1, "mem_limit": 2},
"Small" : {"cpu_limit": 2, "mem_limit": 4},
"Normal" : {"cpu_limit": 5, "mem_limit": 10},
"Large" : {"cpu_limit": 10, "mem_limit": 40},
"Extreme": {"cpu_limit": 36, "mem_limit": 80}
}
These are maximum limits, and the user can manually select whatever resources they want that fit their allowed resource profiles.
Users can also have “preallocated” cores and memory, meaning that at minimum a certain number of cores and memory will allocated for them.
Here we define 3 different docker images (each containing a jupyter-*lab* install), and the URLs to retrieve them.
You can find more jupyter docker “stacks” here.
c.JupyterHub.spawner_class.docker_profiles = {
## These correspond quay.io images, but see
## https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#jupyter-base-notebook
## for more
##
## Basic, users rely on their conda installations for software
"SingleUser" : "quay.io/jupyterhub/singleuser:main",
"BaseNotebook" : "quay.io/jupyter/base-notebook",
## Includes R, Python, and Julia at the system level, as well as their conda installations.
"DataScience" : "quay.io/jupyter/datascience-notebook:latest"
## Add others
##
## To prevent users complaining of the slow startup times, download the required image first,
## and then run Jupyter.
## e.g. sudo docker run <URL>
}
The first time these images are fetched and built, they will take some time, so it is better to pre-emptively fetch these images before starting the server, so that the docker containers don’t need to wait first.
You can fetch them with the docker run command shown in the comment text above.
These are the individual user restrictions. Below we define two users “default” and “memo”. By default all users use the “default” profile, unless explicitly named.
All keywords are named to be compliant with the DockerSpawner API.
c.JupyterHub.spawner_class.user_profiles = {
## Docker profiles permitted per user.
##
## The "default" entry MUST exist. These are the docker profiles
## permitted to any user who isn't explicitly listed below. The
## first entry in the list, is the preferred profile first offered
## to the user in the selection screen.
##
"default" : {
"allowed_resources": ["Normal", "Tiny", "Small", "Large", "Extreme"],
"allowed_docker": ["SingleUser", "BaseNotebook", "DataScience"],
"host_homedir_format_string" : "/media/daten/{username}",
## maximum guaranteed resources for default users
## - if the requested are smaller than the resource profile
## then these are scaled down to that profile.
"max_preallocate" : {"cpu_guarantee" : 5, "mem_guarantee": 10 }},
## User overrides
"memo" : { "allowed_resources" : ["Normal", "Tiny", "Small"],
##"allowed_docker" : ["SingleUser"], ## must be an array, not string or tuple
"max_preallocate" : {"cpu_guarantee" : 2, "mem_guarantee": 4 },
##"host_homedir_format_string" : "/opt/jupyterhub/user_home/jupyter_users/{username}"}
## Note that conda only works when home directories are set...
"host_homedir_format_string" : "/home/{username}"}
##
## Note: The allowed profile with the largest RAM and largest
## number of CPUs is the upper limit on what the HTML sliders will
## permit.
}
By default all users are allowed to use all the resource profiles
defined above, via the allowed_resources
variable. Notice how user
“memo” can only use 3 of those profiles..
Similarly one can define allowed docker images via the
allowed_docker
variable. Since the user “memo” does not have this
defined, he defaults to whatever the “default” user specifies for that
variable.
The host_homedir_format_string
must contain the placeholder
“{username}” string in it’s path, and it defines where the home
directories of the users are, along with their conda environments. The
user “memo” has his home directory in /home/memo path, which is
different than the /media/daten/memo path that would have otherwise
been specified in the default user profile.
The max_preallocate
variable specifies the minimum preallocation of
resources that are guaranteed for a user. These resources will then
grow at maximum to whatever resource profile the user chooses when
spawning a kernel.
With your config file setup, it is now time to test the server
cd /opt/jupyterhub/jupyterhub-pharma253
source venv_jupyter_metrics/bin/activate
sudo -E env PATH=$PATH /opt/jupyterhub/jupyterhub-pharma253/venv_jupyter_metrics/bin/jupyterhub
if you’re lucky, things should just work and you should be able to visit the JupyterHub login page (see the messages printed to the console).
If you need to test the server and make changes, note that it’s always useful to purge all autogenerated files, kill all docker processes, and remove any local configs of any affected users
This is typically a combination of:
sudo docker ps -a ## see all processes
sudo docker container stop $(sudo docker ps -a -q) ## stop all containers
sudo docker container rm $(sudo docker ps -a -q) ## kill all containers
##
rm jupyterhub_cookie_secret jupyterhub.sqlite ## Delete the database and cookie
##
rm -rf ~/.jupyter .local/share/jupyter ## remove your local jupyter configs if testing on your account
rm -rf /home/randomuser/.jupyter ## Do the same for any users you tested on
rm -rf /home/randomuser/.local/share/jupyter
JupyterHub doesn’t really do logs, but you can view what is happening
at the Jupyter level by monitering the output of the sudo -E env PATH=$PATH /opt/jupyterhub/jupyterhub-pharma253/venv_jupyter_metrics/bin/jupyterhub
command in realtime, or if you invoked jupyterhub via systemd, you can view the logs via
sudo journalctl -u jupyterhub --since -5m # to see the last 5 minutes
The docker logs are more verbose, especially when some user are unable
to start their servers. All users have a container usually named
jupyter-<username>
, but you can view which docker containers are
spawned via the sudo docker ps -a
command.
sudo docker logs jupyter-<username>
should tell you where the errors start.
If you are happy with the installation, then you can modify the
system/etc/system/systemd/jupyterhub.service
file in the repository
and copy it to /etc/systemd/system/
, and then enable it with sudo
systemctl start jupyterhub
. Please modify the virtual environment
paths in the file first.
To get a live readout of the Jupyterhub logs invoke via Systemd, run:
sudo journalctl --follow -u jupyterhub
We have a running Jupyter, at least when hosted directly on the machine itself. But if you’re running through a proxy, then this needs to be set up before the machine can be accessed from the outside world.
The proxy communicates with the internet, and tunnels these outside connections to the host machine (running Jupyter).
In a schematic:
Users --> Internet --(1)-> ProxyDevice <--(2)--> HostDevice (Jupyter)
The host machine needs to establish a permanent connection to the proxy. There are many ways to do this, but the easiest and most secure is via a reverse SSH connection.
ssh -i ~localuser/.ssh/id_rsa -p 51122 \
-o ServerAliveInterval=60 -o ExitOnForwardFailure=yes \
-R 58001:127.0.0.1:58001 \
proxyuser@proxydevice vmstat 120
The above will create an ssh connection from localuser on the host machine to the proxydevice machine (change the address) with a user called proxyuser on the proxydevice. It is assumed that the ssh port on the proxy machine is 51122. If not, change this too.
The proxy port is 58001 on both machines, meaning that port 58001 on the host maps to port 58001 on the host. Whatever the host sends to address “127.0.0.1:58001” will be recieved on the proxy at their port “58001”.
This builds the (2) connection in the above schematic.
This can be implemented as a systemd service. Please see the
system/etc/systemd/system
folder in the repository for the
proxy-tunnel-pharma53.service
. You just need to modify it to your
tastes and then copy it to
/etc/systemd/system/proxy-tunnel-pharma53.service
and then start it
via sudo systemctl start proxy-tunnel-pharma53
The proxy machine can now recieve signals from the host, but it now needs to map the internet to the designatied 58001 port. To do this, we need to run a secure web server.
The first step to being secure is to get a certificate from some web authority who can tell others that you are who you say you are.
So the way this all works is:
- You own a domain from some registrar. Tell the registrar where to point your domain.
Let’s say you own the domain
example.com
which you bought from godaddy.com. You need to login to your godaddy.com account, go to your domain and point it to the IP address of whichever machine is reachable by the internet. - Tell a certificate authority to give you a secure certificate for your website.
The way this works is that on your side, you run a script on the internet-facing machine (e.g. the proxy device) requesting a certificate from some certificate authority (e.g. “give me a certificate for example.com”)
This authority checks the IP address of the request, and then checks the IP address of what you typed in to your registrar. If the addresses match, the registrar gives you the certificate.
One good certificate authority owned by some good people is
LetsEncrypt. We will use their certbot
to request new certificates:
sudo certbot certonly --standalone -d www.example.com
(again, change the example to a domain you actually own)
This should install certificates to the location of
/etc/letsencrypt/live/www.example.com
on your machine, if it worked.
Once we have the certificates we can setup the webserver and proxy all requests to the host who is listening on port 58001.
There are two main ones choose from: Caddy (easy), or Nginx (stable)
You only need one. I can recommend Caddy due to sheer ease, but if something is failing on the Proxy side of things, then it can’t hurt to try Nginx.
Put this inside your caddy file at /etc/caddy/Caddyfile
(modify the website to whatever website you own)
www.example.com {
reverse_proxy localhost:58001
}
Then enable the service: sudo systemctl start caddy
If all works fine, skip nginx.
If all did not work fine with Caddy, then try the nginxy config:
#user http;
worker_processes auto;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
sendfile on;
keepalive_timeout 65;
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
if ($host = www.example.com) {
return 301 https://$host$request_uri;
} # managed by Certbot
listen 80;
server_name www.example.com;
# Redirect the request to HTTPS
return 302 https://$host$request_uri;
}
# HTTPS server to handle JupyterHub
server {
server_name www.example.com;
listen 443 ssl; # managed by Certbot
ssl_certificate /etc/letsencrypt/live/www.example.com/fullchain.pem; # managed by Certbot
ssl_certificate_key /etc/letsencrypt/live/www.example.com/privkey.pem; # managed by Certbot
## Allow Jupyter to send large data packets
client_max_body_size 0;
access_log /var/log/host.access.log;
## commented our previously
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_prefer_server_ciphers on;
##ssl_dhparam /etc/ssl/certs/dhparam.pem;
ssl_ciphers 'ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA';
ssl_session_timeout 1d;
## end commented out previously
ssl_session_cache shared:SSL:50m;
ssl_stapling on;
ssl_stapling_verify on;
add_header Strict-Transport-Security max-age=15768000;
# Managing literal requests to the JupyterHub frontend
location / {
proxy_pass http://127.0.0.1:58001/; ## again, check the proxy port.
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# websocket headers
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header X-Scheme $scheme;
proxy_buffering off;
}
# Managing requests to verify letsencrypt host
location ~ /.well-known {
allow all;
}
}
}
Change the domain where necessary, and check the proxy ports, and then start the service
sudo systemctl start nginx
We will be restricting individual Docker containers¹ for each user later, but we also want to set a global limit on Docker in general so that the rest of the OS still has some resources for itself.
1: A “docker image” is a small operating system file, and a “docker container” uses an image to create an environment, which correspond to Jupyter kernels.
We control the main docker process/daemon by making a child of a control group which has resource quotas.
We want to limit the total system resources that Docker uses. It shouldn’t try to use 100% of everything
To do this, we use slices. see the docker_limit.slice
file in the
repo. Modify it to your needs
[Unit]
Description=Slice that limits docker resources
Before=slices.target
[Slice]
CPUAccounting=true
CPUQuota=7000%
## We use 70 cores max and leave 2 cores free
MemoryAccounting=true
MemoryHigh=230G
## We leave 20GB free for the system
## Copy this file to /etc/systemd/system/docker_limit.slice
## and start/enable it
There should be an example in the system/etc/systemd/systemd
folder. Modify
it to your needs, copy it over to
/etc/systemd/system/docker_limit.slice
.
We do not start it, it gets enabled automatically when Docker starts,
but only if you modify the following Docker config file at
/etc/docker/daemon.json
{
"storage-driver": "overlay2",
"cgroup-parent": "docker_limit.slice"
}
You can verify that the limits are in place by much later invoking
sudo systemd-cgtop
and witnessing that all docker process are children of the docker-slice process.
The Jupyter Docker images offered by jupyter-docker-stacks offer python, R, and Julia notebooks. These are fine, if users have R installations specified by their own conda environments, but many don’t.
For this reason I put together a new docker image that offers R, Python, Bash kernels and offers some help text on how to use conda environments.
It needs to be built via:
cd /opt/jupyterhub/jupyterhub-pharma253
sudo docker buildx build -t bash-python-r docker-image
This will then take some time to build the 2.9GB image, but then it should be visible in the list
of docker images under the name bash-python-r
:
sudo docker image ls
You can then specify this docker image by url: docker.io/library/bash-python-r
and add it to the config
The templates folder extends the Jinja2 templating system and some customizations have been made for the Freiburg Pharmacology dept.
Customization is split into two folders:
- Templates
This extends the Jinja2 system, and ensures that common motifs such as metric charts can be enabled on many pages.
- Static
These are the CSS, images, and Javascript resources. Every time that Jupyter is started (with the custom Dockerspawner), it directly copies over these resources into the virtual environment.
Development is in the beta stage at the moment, though things look relatively stable.
To upgrade the server:
- Make a backup copy of your
jupyterhub_config.py
fileThis copy should be outside of the
jupyterhub-pharma253
repository, as any changes from the upstream repository will overwrite any config changes. - Stash your current changes
This will reset your config file (again, you should have backed this up somewhere else)
cd /opt/jupyterhub/jupyterhub-pharma253 git stash
- Pull upstream changes for both the JupyterHub Metrics and Pharma253 repositories
cd /opt/jupyterhub/jupyterhub-pharma253 git pull cd /opt/jupyterhub/jupyterhub-metrics git pull
- Reinstall the new JupyterHub into the virtual environment
cd /opt/jupyterhub/jupyterhub-pharma253 source venv_jupyter_metrics/bin/activate pip install /opt/jupyterhub/jupyterhub-metrics
After that you can try restarting (or failing that, debugging) JupyterHub.
See Debugging for extra clues.
It could just be that images have not been fetched yet and require some time to fetch, build, and then launch a container.
To speed this up, you can preload the images via sudo docker run
<image-url>
and then kernel spawning should be much faster.
It could also be that the firewall is blocking Jupyter from talking to
Docker. Temporarily disable the firewall to see if it makes a
difference sudo ufw disable
.
Also check the Jupyter logs to see what addresses it is waiting for from the notebooks. If the addresses seem correct, then check the logs of docker container to see if it’s transmitting to the right addresses.
Check that the host_homedir_format_string
matches where the user’s
home directory actually is. E.g. if I have a user called jbloggs, and
my host_homedir_format_string
is ”/media/daten/{username}
”, then
Docker expects that user’s home directory to be: /media/daten/jbloggs
Sometimes what actually happens is that the user home directory is
/media/daten/joebloggs
instead. You can verify this by
sudo su jbloggs ## login as that user
cd ~ ## change to home
pwd ## observed the location of home
If the home directory is not where Docker is expecting it to be, then either:
- Specify a user override in the jupyter_config for
host_homedir_format_string
OR
- Change that user’s home directory to be compliant with the default.
## Remove desired directory if exists already. ## If it's empty, this command will succeed. If not, then move any sensitive data OUT. sudo rmdir /media/daten/jbloggs ## Move (-m) the user's home directory (-d) from whatever it is now to ## to the new location. ## - This can be a LONG PROCESS if you are moving between disks/paritions ## - If the user is logged in, tell them to LOG OUT before you do this. sudo usermod -m -d /media/daten/jbloggs jbloggs
After that, notebooks should spawn fine.
This is to do with host_homedir_format_string
and the
image_homedir_format_string
, which are both internal spawner
variables in the DockerSpawner API.
The first tells Jupyter where the home directory for a user exists on the system, and the second tells Docker where to “place” it inside the container.
It is better that these two both match, so I have enabled this
internally such that the image_homedir_format_string
is always equal
to the host_homedir_format_string
.
If all the home directory paths are set correctly, but docker logs are still showing some weird paths, then make sure you properly stop and remove all containers related to a user and try again.
sudo docker ps -a ## look for containers matching a username
sudo docker container stop <id>; sudo docker container rm <id>;
One common issue that I see in the sudo docker logs jupyter-dbloggs
is that the /media/daten/davidbloggs/.local
could not be found, for
user dbloggs
. The problem is clearly that the
host_homedir_format_string
for that user is expecting the home
directory to be at /media/daten/dbloggs/
instead.
Either set the host_homedir_format_string
for that user with a
config override, or move that user’s home directory to the correct
location (via sudo usermod -m -d /media/daten/dbloggs/ dbloggs
),
then stop their container, remove it, and restart the server.
If your users have more than one conda environment and want to install multiple kernels, normally all you need to do is:
sudo su thatuser
source ~/.bashrc ## just in case conda is not found
micromamba activate someenv
R
library(IRkernel)
IRkernel::installspec(displayname="someenv", name="someenv")
I tend to find that people only use the displayname, but not the name, and this leads to the issue of an existing kernel being overwritten.
If other issues arise, please make a PR or email me.