This example will walk you through building Time Series Forecasting using Prophet. Prophet is a forecasting procedure implemented in R and Python. It is fast and provides completely automated forecasts that can be tuned by hand by data scientists and analysts.
{% hint style="info" %} Quick script to run custom R container on Bacalhau:
bacalhau docker run \
-i ipfs://QmY8BAftd48wWRYDf5XnZGkhwqgjpzjyUG3hN1se6SYaFt:/example_wp_log_R.csv \
ghcr.io/bacalhau-project/examples/r-prophet:0.0.2 \
-- Rscript Saturating-Forecasts.R "/example_wp_log_R.csv" "/outputs/output0.pdf" "/outputs/output1.pdf"
{% endhint %}
To get started, you need to install the Bacalhau client, see more information here
Open R studio or R-supported IDE. If you want to run this on a notebook server, then make sure you use an R kernel. Prophet is a CRAN package, so you can use install.packages
to install the prophet
package:
R -e "install.packages('prophet',dependencies=TRUE, repos='http://cran.rstudio.com/')"
After installation is finished, you can download the example data that is stored in IPFS:
wget https://w3s.link/ipfs/QmZiwZz7fXAvQANKYnt7ya838VPpj4agJt5EDvRYp3Deeo/example_wp_log_R.csv
The code below instantiates the library and fits a model to the data.
mkdir -p outputs
mkdir -p R
Create a new file called Saturating-Forecasts.R
and in it paste the following script:
# content of the Saturating-Forecasts.R
# Library Inclusion
library('prophet')
# Command Line Arguments:
args = commandArgs(trailingOnly=TRUE)
args
input = args[1]
output = args[2]
output1 = args[3]
# File Path Processing:
I <- paste("", input, sep ="")
O <- paste("", output, sep ="")
O1 <- paste("", output1 ,sep ="")
# Read CSV Data:
df <- read.csv(I)
# Forecasting 1:
df$cap <- 8.5
m <- prophet(df, growth = 'logistic')
future <- make_future_dataframe(m, periods = 1826)
future$cap <- 8.5
fcst <- predict(m, future)
pdf(O)
plot(m, fcst)
dev.off()
# Forecasting 2:
df$y <- 10 - df$y
df$cap <- 6
df$floor <- 1.5
future$cap <- 6
future$floor <- 1.5
m <- prophet(df, growth = 'logistic')
fcst <- predict(m, future)
pdf(O1)
plot(m, fcst)
dev.off()
This script performs time series forecasting using the Prophet library in R, taking input data from a CSV file, applying the forecasting model, and generating plots for analysis.
Let's have a look at the command below:
Rscript Saturating-Forecasts.R "example_wp_log_R.csv" "outputs/output0.pdf" "outputs/output1.pdf"
This command uses Rscript to execute the script that was created and written to the Saturating-Forecasts.R
file.
The input parameters provided in this case are the names of input and output files:
example_wp_log_R.csv
- the example data that was previously downloaded.
outputs/output0.pdf
- the name of the file to save the first forecast plot.
outputs/output1.pdf
- the name of the file to save the second forecast plot.
To use Bacalhau, you need to package your code in an appropriate format. The developers have already pushed a container for you to use, but if you want to build your own, you can follow the steps below. You can view a dedicated container example in the documentation.
To build your own docker container, create a Dockerfile
, which contains instructions to build your image.
FROM r-base
RUN R -e "install.packages('prophet',dependencies=TRUE, repos='http://cran.rstudio.com/')"
RUN mkdir /R
RUN mkdir /outputs
COPY Saturating-Forecasts.R R
WORKDIR /R
These commands specify how the image will be built, and what extra requirements will be included. We use r-base
as the base image and then install the prophet
package. We then copy the Saturating-Forecasts.R
script into the container and set the working directory to the R
folder.
We will run docker build
command to build the container:
docker build -t <hub-user>/<repo-name>:<tag> .
Before running the command replace:
hub-user
with your docker hub username. If you don’t have a docker hub account follow these instructions to create docker account, and use the username of the account you created
repo-name
with the name of the container, you can name it anything you want
tag
this is not required but you can use the latest
tag
In our case:
docker buildx build --platform linux/amd64 -t ghcr.io/bacalhau-project/examples/r-prophet:0.0.1 .
Next, upload the image to the registry. This can be done by using the Docker hub username, repo name, or tag.
docker push <hub-user>/<repo-name>:<tag>
In our case:
docker push ghcr.io/bacalhau-project/examples/r-prophet:0.0.1
The following command passes a prompt to the model and generates the results in the outputs directory. It takes approximately 2 minutes to run.
export JOB_ID=$(bacalhau docker run \
--wait \
--id-only \
-i ipfs://QmY8BAftd48wWRYDf5XnZGkhwqgjpzjyUG3hN1se6SYaFt:/example_wp_log_R.csv \
ghcr.io/bacalhau-project/examples/r-prophet:0.0.2 \
-- Rscript Saturating-Forecasts.R "/example_wp_log_R.csv" "/outputs/output0.pdf" "/outputs/output1.pdf")
bacalhau docker run
: call to Bacalhau-i ipfs://QmY8BAftd48wWRYDf5XnZGkhwqgjpzjyUG3hN1se6SYaFt:/example_wp_log_R.csv
: Mounting the uploaded dataset at/inputs
in the execution. It takes two arguments, the first is the IPFS CID (QmY8BAftd48wWRYDf5XnZGkhwqgjpzjyUG3hN1se6SYaFtz
) and the second is file path within IPFS (/example_wp_log_R.csv
)ghcr.io/bacalhau-project/examples/r-prophet:0.0.2
: the name and the tag of the docker image we are using/example_wp_log_R.csv
: path to the input dataset/outputs/output0.pdf
,/outputs/output1.pdf
: paths to the outputRscript Saturating-Forecasts.R
: execute the R script
When a job is submitted, Bacalhau prints out the related job_id
. We store that in an environment variable so that we can reuse it later on:
Job status: You can check the status of the job using bacalhau job list
.
bacalhau job list --id-filter ${JOB_ID}
When it says Published
or Completed
, that means the job is done, and we can get the results.
Job information: You can find out more information about your job by using bacalhau job describe
.
bacalhau job describe ${JOB_ID}
Job download: You can download your job results directly by using bacalhau job get
. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory (results
) and downloaded our job output to be stored in that directory.
rm -rf results && mkdir -p results
bacalhau job get ${JOB_ID} --output-dir results
To view the file, run the following command:
ls results/outputs
You can't natively display PDFs in notebooks, so here are some static images of the PDFs:
output0.pdf
output1.pdf
If you have questions or need support or guidance, please reach out to the Bacalhau team via Slack (#general channel).