- Runs a pipeline of Jupyter Notebooks
- Prevents multiple instances of the program from running concurrently
- Log and error handling
- Emails the user when the pipeline is complete or if an error occurs
Make sure you have pip3
and python3
installed. If you don't, run
sudo apt update
and then
sudo apt-get install python3-pip
.
Then, install the requirements by running
pip3 install -r requirements.txt
Then type
python3 main.py
.
Requirements.txt is a file that contains all the dependencies for the program. Right now, it contains dependencies needed to run the dummy-test notebooks. Everywhere an external library is used in the notebooks, you must add it to requirements.txt.
Make sure you add all libraries used in the notebooks into the requirements.txt file. If you don't, the program WILL cause an error mid-pipeline.
When you've added, type pip3 install -r requirements.txt
to install these dependencies into your environment.
On the server (DigitalOcean, etc.) that you want to run the pipeline on, type the command
git clone [email protected]:kevinmonisit/notebook-pipeline-runner.git
cd notebook-pipeline-runner
-
Make sure you have the requirements installed via
pip3 install -r requirements.txt
. (also make sure you have Python3) -
Go to SendGrid.com and create an account. At the beginning of account regisration, you're going to have to create a verified sender and verify the email you wish to send an email from. This is the first thing you do before your account is created. Set the Sender Email to any email you wish to send alerts from. (Note: You cannot send emails to yourself with SendGrid.com)
-
When you create your account, you should see a dashboard.
-
On the right, there is a "Settings" button. Click on that.
-
Click on
API KEYS
. -
On the top right, click
Create API Key
. -
Set the API Key Permissions to
Restricted Access
, and then give the API Key permission for "Mail Send".
-
Make sure to copy the API key
-
Go to the file
test.env
and set theSENDGRID_API_KEY=
to'<API KEY>'
. Make sure the single quotations are there if they aren't already. -
Rename the file
test.env
to.env
. -
Fill out the
.env
, replacingSENDGRID_FROM_EMAIL
andSENDGRID_TO_EMAIL
with the emails you choose.SENDGRID_FROM_EMAIL
should be the email you set up your SendGrid.com account with to send emaisl. -
After setting the
.env
file, you can now run commandpython3 main.py
from the directory of the project. -
An email will be sent to
SENDGRID_TO_EMAIL
stating that pipeline initialization has started.
Note: SendGrid API allows for 100 free emails per day.
Bypass the confirmation prompt by typing
python3 main.py --bypass-confirm
.
The process:
[ your personal computer ] -- [transferring requirements.txt and notebooks] --> [ server ]
-
Before you run the pipeline, from the environment (personal computer, etc.) that you usually run the pipeline, type
python3 -m pip freeze > requirements.txt
. This will create a requirements.txt which contains all the dependencies needed to run the pipeline normally. -
Then, replace the
requirements.txt
file in the project directory with the one you just created. -
Run
pip3 -m install -r requirements.txt
to install the dependencies into your environment (assuming this is the server). -
Type
python3 main.py --bypass-confirm
to run the dummy pipeline to make sure everything works. Now, it's time to modify the pipeline to your liking now that we've verified that the pipeline works. -
To modify the pipeline, edit the
main.py
file. Themain.py
file contains the main function, which contains an array callednotebooks
, containing the path to each notebook that will be run and in what order they will be run. Modify this array to your liking.
notebooks = ['./notebooks/notebook.ipynb',
'./notebooks/notebook2.ipynb',
'./notebooks/notebook4.ipynb',
'./notebooks/notebookERROR.ipynb',
'./notebooks/notebook4.ipynb'
]
In the terminal, run
pwd
to get the path to the directory containing the main.py file.
Then type
crontab -e
.
This will open the crontab file in your default text editor. It will most likely be Vim.
Press i
to enter insert mode, and then add the following line to the file:
0 0 * * * python3 /path/to/main.py --bypass-confirm
Replace /path/to/main.py
with the actual path to the main.py file. Then press esc
, type
:wq
, and press enter
to save and exit the file.
You should be good to go. The program will now run every day at midnight. If you want to modify the date/time at which it runs, you can check out https://crontab.guru/.
To run the test script that verifies the program will not run multiple instances concurrently:
chmod +x test.sh
./test.sh