You can use official Docker containers for each language, like R or Python. In this example, we will use the official R container and run it on Bacalhau.
In this tutorial example, we will run a "hello world" R script on Bacalhau.
To get started, you need to install the Bacalhau client, see more information here
To install R follow these instructions A Installing R and RStudio | Hands-On Programming with R. After R and RStudio are installed, create and run a script called hello.R
:
# hello.R
print("hello world")
Run the script:
Rscript hello.R
Next, upload the script to your public storage (in our case, IPFS). We've already uploaded the script to IPFS and the CID is: QmVHSWhAL7fNkRiHfoEJGeMYjaYZUsKHvix7L54SptR8ie
. You can look at this by browsing to one of the HTTP IPFS proxies like ipfs.io or w3s.link.
Now it's time to run the script on Bacalhau:
export JOB_ID=$(bacalhau docker run \
--wait \
--id-only \
-i ipfs://QmQRVx3gXVLaRXywgwo8GCTQ63fHqWV88FiwEqCidmUGhk:/hello.R \
r-base \
-- Rscript hello.R)
bacalhau docker run
: call to Bacalhaui ipfs://QmQRVx3gXVLaRXywgwo8GCTQ63fHqWV88FiwEqCidmUGhk:/hello.R
: Mounting the uploaded dataset at/inputs
in the execution. It takes two arguments, the first is the IPFS CID (QmQRVx3gXVLaRXywgwo8GCTQ63fHqWV88FiwEqCidmUGhk
) and the second is file path within IPFS (/hello.R
)r-base
: docker official image we are usingRscript hello.R
: execute the R script
When a job is submitted, Bacalhau prints out the related job_id
. We store that in an environment variable so that we can reuse it later on:
The same job can be presented in the declarative format. In this case, the description will look like this:
name: Running a Simple R Script
type: batch
count: 1
tasks:
- name: My main task
Engine:
type: docker
params:
Image: r-base:latest
Entrypoint:
- /bin/bash
Parameters:
- -c
- Rscript /hello.R
InputSources:
- Target: "/"
Source:
Type: urlDownload
Params:
URL: https://raw.githubusercontent.com/bacalhau-project/examples/main/scripts/hello.R
Path: /hello.R
The job description should be saved in .yaml
format, e.g. rhello.yaml
, and then run with the command:
bacalhau job run rhello.yaml
Job status: You can check the status of the job using bacalhau job list
.
bacalhau job list --id-filter ${JOB_ID}
When it says Published
or Completed
, that means the job is done, and we can get the results.
Job information: You can find out more information about your job by using bacalhau job describe
.
bacalhau job describe ${JOB_ID}
Job download: You can download your job results directly by using bacalhau job get
. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory (results
) and downloaded our job output to be stored in that directory.
rm -rf results && mkdir results
bacalhau job get ${JOB_ID} --output-dir results
To view the file, run the following command:
cat results/stdout
You can generate the job request using bacalhau job describe
with the --spec
flag. This will allow you to re-run that job in the future:
bacalhau job describe ${JOB_ID} --spec > job.yaml
cat job.yaml
If you have questions or need support or guidance, please reach out to the Bacalhau team via Slack (#general channel).