Skip to content

Latest commit

 

History

History
447 lines (336 loc) · 21.3 KB

Notebook.rst

File metadata and controls

447 lines (336 loc) · 21.3 KB

About This Document

This document was created October 2021 as part of a Trusted CI engagement. The intent was to review existing deployment documentation for Jupyter Notebook / Server with a focus on security-related instructions, and suggest modifications and additions to help users secure their Jupyter deployment on a single-user system.


Security in notebook documents

Note

The section below originates from a separate Configuration section Security in the Jupyter notebook server.

As Jupyter notebooks become more popular for sharing and collaboration, the potential for malicious people to attempt to exploit the notebook for their nefarious purposes increases. IPython 2.0, upon which Jupyter is built, introduced a security model to prevent execution of untrusted code without explicit user input. The context of the problem and the details of this model are described below.

The paradox

The whole point of Jupyter is arbitrary code execution. We have no desire to limit what can be done with a notebook, which would negatively impact its utility.

Unlike other programs, a Jupyter notebook document includes output. Unlike other documents, that output exists in a context that can execute code (via Javascript).

The security problem we need to solve is that no code should execute just because a user has opened a notebook that they did not write. Like any other program, once a user decides to execute code in a notebook, it is considered trusted, and should be allowed to do anything.

Our security model

The central question of whether content should be trusted revolves around the question of whether the current user generated it. In an untrusted notebook, only outputs generated by the current user are trusted. Aside from that:

  • Untrusted HTML is always sanitized
  • Untrusted Javascript is never executed
  • HTML and Javascript in Markdown cells are never trusted
  • Any other HTML or Javascript (in Markdown cells, output generated by others) is never trusted

The details of trust

Trust applies to both output from executed code as well as the notebook in its entirety. If the notebook as a whole is trusted, then all output is trusted as well. Otherwise, only output generated by the user during the current session is trusted.

When a notebook is executed and saved, a signature is computed from a digest of the notebook's contents plus a secret key. This is stored in a database, writable only by the current user. By default, this is located at:

~/.local/share/jupyter/nbsignatures.db  # Linux
~/Library/Jupyter/nbsignatures.db       # OS X
%APPDATA%/jupyter/nbsignatures.db       # Windows

Each signature represents a series of outputs which were produced by code the current user executed, and are therefore trusted. This signature exists only for the duration of a single session unless all outputs have been generated by the current user and are therefore trusted or all untrusted outputs have been removed either via Clear Output or by the user re-executing them.

When you open a notebook, the server computes its signature, and checks if it's in the database. If a match is found, then all HTML and Javascript output in the notebook will be trusted at load, otherwise the notebook as a whole will be untrusted.

Explicit trust

Sometimes re-executing a notebook to generate trusted output is not an option, either because dependencies are unavailable, or it would take a long time. Users can explicitly trust a notebook in two ways:

  • At the command-line, with:

    jupyter trust /path/to/notebook.ipynb
    
  • After loading the untrusted notebook, with File / Trust Notebook

These two methods simply load the notebook, compute a new signature, and add that signature to the user's database.

Javascript and CSS in Markdown cells

While never officially supported, it had become common practice in past versions of IPython to put hidden Javascript or CSS styling in Markdown cells, so that they would not be visible on the page but would change the page behavior or rendering. Since Markdown cells are now sanitized, all Javascript (including click event handlers, etc.) and CSS will be stripped.

Since this is no longer an option, styling the notebook can instead be done via either custom.css or CSS in HTML output. The latter only have an effect if the notebook is trusted, because otherwise the output will be sanitized just like Markdown.

Collaboration

When collaborating on a notebook through sharing a file, people probably want to see the outputs produced by their colleagues' most recent executions. Since each collaborator's key will differ, this will result in each share starting in an untrusted state. There are three basic approaches to this:

  • re-run notebooks when you get them (not always viable)
  • explicitly trust notebooks via jupyter trust or the notebook menu (easy, but required after each change)
  • share a notebook signatures database, and use configuration dedicated to the collaboration while working on the project (only works with shared storage)

To share a signatures database among users, you can configure:

c.NotebookNotary.data_dir = "/path/to/signature_dir"

to specify a non-default path to the SQLite database (of notebook hashes, essentially). This generally only works well on a shared machine, as SQLite doesn't work well on NFS.

Installing the Notebook Server

The Jupyter installation instructions can be found here. They mention that Jupyter can either be installed with user privileges or administrative (root) privileges. As a general security rule, if an application doesn't require root privileges, it should be installed at the user level, and this applies to Jupyter as well.

If you're considering installing the Jupyter notebook with root privileges in order to provide multi-user access, install JupyterHub instead. See also Running on a Multi-User Machine below.

Remote Access to a Notebook

Note

The section below originates from Running a Notebook Server.

Note

By default, a notebook server runs locally at 127.0.0.1:8888 and is accessible only from localhost. You may access the notebook server from the browser using http://127.0.0.1:8888.

General Computer Security

Before you consider enabling remote access to your personal machine, you should ensure that your computer adheres to some basic security principles. There are many articles available on this topic (just do an internet search for secure your computer). Some typical security measures include the following.

  • Use a strong, unique password for your account.
  • Enable a 'screensaver' which automatically locks your account after a short interval.
  • Ensure your operating system firewall is enabled.
  • Keep all software, including the operating system, up to date.
  • Perform regular backups of essential data to external, offline storage.
  • Use anti-virus / anti-malware software

Remote Access and Firewall Settings

Remotely accessing your personal machine typically requires opening a port in your firewall to allow outside connections. In some cases, your local router may also need to be configured to allow outside connections. The port opened depends on the remote access method used. There are several methods for remote access.

  • Run software designed to remotely view your desktop. Software such as Microsoft Remote Desktop, Apple Remote Desktop, VNC, and TeamViewer enable you to view your computer's desktop and automatically open the necessary firewall port. This is an easy solution with a disadvantage that your network connection needs to be fast enough to handle sending a graphical representation of your desktop.
  • Use SSH port forwarding (tunneling). By running an SSH server on your desktop, you can use port forwarding to set up an SSH tunnel to access your Jupyter server using a local browser proxy. Configuring the SSH server requires some technical expertise, and setting up the SSH tunnel requires an SSH client on the remote machine, so this solution is more complex than running a remote desktop server, but the amount of data sent over the network is less.
  • Open the firewall to allow remote access to a secure HTTPS port. Before considering this solution, ensure your Jupyter server is configured to use TLS/SSL. DO NOT expose your Jupyter server on the 'http' port since all traffic is viewable to anyone watching. This is covered in more detail below.

Note

The section below originates from Running a Notebook Server.

Important

This is not the multi-user server you are looking for. This document describes how you can run a public server with a single user. This should only be done by someone who wants remote access to their personal machine. Even so, doing this requires a thorough understanding of the set-ups limitations and security implications. If you allow multiple users to access a notebook server as it is described in this document, their commands may collide, clobber and overwrite each other.

If you want a multi-user server, the official solution is JupyterHub. To use JupyterHub, you need a Unix server (typically Linux) running somewhere that is accessible to your users on a network. This may run over the public internet, but doing so introduces additional security concerns.

Trade-offs with Remote Access

As noted above, a Jupyter notebook server only runs locally by default and cannot be accessed outside of the machine upon which it’s running. The notebook server can be changed to listen on other interfaces so that it can be accessed remotely but this usability comes at the expense of security concerns that should be considered and carefully implemented.

Notebook servers should not have remote access enabled with the default settings. By default, traffic between a user’s browser and notebook server is unencrypted, meaning anyone able to see this traffic can take over a user session by sniffing the password or authentication token.

Configuration for Secure Remote Access

The following steps are covered in more detail below and should be implemented prior to opening remote access to the machine.

  • Set a notebook password
  • Enable SSL/TLS for encrypted communications
  • [Optional] Enable domain name (#TODO)

Note

The section below originates from Running a Notebook Server.

Setting a Notebook Password

By default, Jupyter notebook servers generate a token for authentication on startup. This is inconvenient for remote access as the token changes each time the notebook server is started and creates a dependency on having access to the hosting machine to get the token even while accessing the notebook from another machine.

Instead, the notebook can be configured to use a password. This can be configured either automatically the first time the notebook is accessed, via the command line, or by creating a hashed password and manually updating the notebook configuration file.

Automatic Password Setup in Browser

Note

The section below originates from Running a Notebook Server.

As of notebook 5.3, the first time you log-in using a token, the notebook server should give you the opportunity to setup a password from the user interface.

You will be presented with a form asking for the current token, as well as your new password; enter both and click on Login and setup new password.

Next time you need to log in you'll be able to use the new password instead of the login token, otherwise follow the procedure to set a password from the command line.

Note

The ability to change the password at first login time may be disabled by integrations by setting --NotebookApp.allow_password_change=False.

Automatic Password Setup on Command Line

Starting at notebook version 5.0, you can enter and store a password for your notebook server with a single command. jupyter notebook password will prompt you for your password and record the hashed password in your jupyter_notebook_config.json.

$ jupyter notebook password
Enter password:  ****
Verify password: ****
[NotebookPasswordApp] Wrote hashed password to /Users/you/.jupyter/jupyter_notebook_config.json

This can be used to reset a lost password; or if you believe your credentials have been leaked and need to change your password. Changing your password will invalidate all logged-in sessions after a server restart.

Setting a password on the command line will store the hash in jupyter_notebook_config.py while creating a manually created hash should be stored in jupyter_notebook_config.py. The .json configuration options take precedence over the .py one, so automatic passwords will always take precedence over ones calculated with a manual hash.

Manual Password Setup

A hashed password can also be manually calculated and added to the notebook configuration file. Create the hash using the function notebook.auth.security.passwd:

In [1]: from notebook.auth import passwd
In [2]: passwd()
Enter password:
Verify password:
Out[2]: 'sha1:67c9e60bb8b6:9ffede0825894254b2e042ea597d771089e11aed'

Caution!

~notebook.auth.security.passwd when called with no arguments will prompt you to enter and verify your password such as in the above code snippet. Although the function can also be passed a string as an argument such as passwd('mypassword'), please do not pass a string as an argument inside an IPython session, as it will be saved in your input history.

You can then add the hashed password to your jupyter_notebook_config.py. The default location for this file jupyter_notebook_config.py is in your Jupyter folder in your home directory, ~/.jupyter, e.g.:

c.NotebookApp.password = u'sha1:67c9e60bb8b6:9ffede0825894254b2e042ea597d771089e11aed'

Automatic password setup will store the hash in jupyter_notebook_config.json while this method stores the hash in jupyter_notebook_config.py. The .json configuration options take precedence over the .py one, thus the manual password may not take effect if the json file has a password set.

Using TLS/SSL for encrypted communication

Certificate-based encryption using TLS/SSL should be used to protect communication between a user’s browser and the notebook server. Multiple options exist for deploying a certificate. Self-signed certificates are the fastest and cheapest to deploy but are less secure. Fully trusted certificates can be provisioned through a local certificate authority if available, purchased from a certificate provider, or through the free service Let’s Encrypt. These certificates require a fully-qualified domain name, however. See the documentation below for more information on using self-signed certificates and Let’s Encrypt.

Regardless of how the certificate is provisioned, the Jupyter notebook can be started on the command line in secure protocol mode by setting the certfile option to the certificate, i.e. mycert.pem, along with the private key keyfile option using the command:

$ jupyter notebook --certfile=mycert.pem --keyfile mykey.key

Alternatively, for a more permanent solution, the configuration file for the notebook can be modified to include these values:

c.NotebookApp.certfile = u'/absolute/path/to/your/certificate/fullchain.pem'
c.NotebookApp.keyfile = u'/absolute/path/to/your/certificate/privkey.pem'

Note

The section below originates from Running a Notebook Server, Let's Encrypt section.

Important

Use 'https'. Keep in mind that when you enable TLS/SSL support, you must access the notebook server over https://, not over plain http://. The startup message from the server prints a reminder in the console, but it is easy to overlook this detail and think the server is for some reason non-responsive.

When using TLS/SSL, always access the notebook server with 'https://'.

Using Self-Signed Certificates

Tip

A self-signed certificate can be generated with openssl. For example, the following command will create a certificate valid for 365 days with both the key and certificate data written to the same file:

$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout mykey.key -out mycert.pem

When starting the notebook server, your browser may warn that your self-signed certificate is insecure or unrecognized. For this reason, self-signed certificates are not the most secure option available.

Note

Using Safari with HTTPS and an untrusted certificate is known to not work (websockets will fail).

Note

The section below originates from Running a Notebook Server, Let's Encrypt section.

Let's Encrypt is a nonprofit service that provides free SSL/TLS certificates through a global certificate authority (CA).

Unlike most public certificates from a global CA, Let's Encrypt certificates are only valid for ninety days. However, Let's Encrypt provides an easy-to-automate solution for automatically renewing a certificate. See their website for more details.

Running on a Multi-User Machine

Running a notebook server on a machine other people can log into brings its own set of risks because some of the communication among processes on the machine itself occurs unencrypted and may be accessible to other users. This risk can be mitigated by turning on optional security features.

If the notebook will be accessed remotely, all of the information in the previous section also applies and should also be followed.

Note

The section below originates from Running a Notebook Server.

Important

This is not the multi-user server you are looking for. This document describes how you can run a public server with a single user. This should only be done by someone who wants remote access to their personal machine. Even so, doing this requires a thorough understanding of the set-ups limitations and security implications. If you allow multiple users to access a notebook server as it is described in this document, their commands may collide, clobber and overwrite each other.

If you want a multi-user server, the official solution is JupyterHub. To use JupyterHub, you need a Unix server (typically Linux) running somewhere that is accessible to your users on a network. This may run over the public internet, but doing so introduces additional security concerns.

By default, ZeroMQ TCP (zmq tcp) sockets are used for communication between the notebook client and kernel. A random high port is allocated when the notebook starts up. This port can be identified by looking at the iopub value in the .local/share/jupyter/runtime/kernel-*.json file.

While users cannot submit arbitrary commands to another user's kernel, they can easily bind to these sockets and listen by using a tool like tcpdump. On a multi-user machine, this eavesdropping can be mitigated by setting KernelManager.transport to ipc or using --transport ipc on the command line. This switches ZeroMQ to use UNIX domain sockets, which leverages standard Unix permissions to the communication sockets, thereby restricting communication to the socket owner.

Reporting Vulnerabilities

Note

The section below originates from Security in the Jupyter notebook.

Reporting security issues

If you find a security vulnerability in Jupyter, including a failure of the code to properly implement the trust model as described or a failure of the model itself, please see https://jupyter.org/security for information on how to report it.