diff --git a/docs/docs/setting-up/workload-onboarding/CUDA/index.md b/docs/docs/setting-up/workload-onboarding/CUDA/index.md index 3998a37986..de5942fa18 100644 --- a/docs/docs/setting-up/workload-onboarding/CUDA/index.md +++ b/docs/docs/setting-up/workload-onboarding/CUDA/index.md @@ -5,8 +5,6 @@ sidebar_position: 10 # Run CUDA programs on Bacalhau -[![stars - badge-generator](https://img.shields.io/github/stars/bacalhau-project/bacalhau?style=social)](https://github.com/bacalhau-project/bacalhau) - ### What is CUDA In this tutorial, we will look at how to run CUDA programs on Bacalhau. CUDA (Compute Unified Device Architecture) is an extension of C/C++ programming. It is a parallel computing platform and programming model created by NVIDIA. It helps developers speed up their applications by harnessing the power of GPU accelerators. @@ -51,10 +49,8 @@ wget -P inputs https://raw.githubusercontent.com/tristanpenman/cuda-examples/mas 1. **`00-hello-world.cu`**: ```bash -%%bash - # View the contents of the standard C++ program -cat inputs/00-hello-world.cu +!cat inputs/00-hello-world.cu # Measure the time it takes to compile and run the program %%timeit @@ -66,8 +62,6 @@ This example represents a standard C++ program that inefficiently utilizes GPU r 2. **`02-cuda-hello-world-faster.cu`**: ```bash -%%bash - # View the contents of the CUDA program with vector addition !cat inputs/02-cuda-hello-world-faster.cu @@ -116,6 +110,10 @@ Note that there is `;` between the commands: `./outputs/hello`: Execution hello binary: You can combine compilation and execution commands. +:::info +Note that the CUDA version will need to be compatible with the graphics card on the host machine. +::: + When a job is submitted, Bacalhau prints out the related `job_id`. We store that in an environment variable so that we can reuse it later on: ```python diff --git a/docs/docs/setting-up/workload-onboarding/Sparkov-Data-Generation/index.md b/docs/docs/setting-up/workload-onboarding/Sparkov-Data-Generation/index.md index 0a6b68fafe..4b8f67ab2e 100644 --- a/docs/docs/setting-up/workload-onboarding/Sparkov-Data-Generation/index.md +++ b/docs/docs/setting-up/workload-onboarding/Sparkov-Data-Generation/index.md @@ -5,26 +5,21 @@ sidebar_position: 11 # Generate Synthetic Data using Sparkov Data Generation technique -[![stars - badge-generator](https://img.shields.io/github/stars/bacalhau-project/bacalhau?style=social)](https://github.com/bacalhau-project/bacalhau) - ## Introduction -A synthetic dataset is generated by algorithms or simulations which has similar characteristics to real-world data. Collecting real-world data, especially data that contains sensitive user data like credit card information, is not possible due to security and privacy concerns. If a data scientist needs to train a model to detect credit fraud they can use synthetically generated data instead of using real data without compromising the privacy of users. +A synthetic dataset is generated by algorithms or simulations which has similar characteristics to real-world data. Collecting real-world data, especially data that contains sensitive user data like credit card information, is not possible due to security and privacy concerns. If a data scientist needs to train a model to detect credit fraud, they can use synthetically generated data instead of using real data without compromising the privacy of users. The advantage of using Bacalhau is that you can generate terabytes of synthetic data without having to install any dependencies or store the data locally. -In this example, we will generate synthetic credit card transaction data using the Sparkov program and store the results in IPFS. - -## TD;LR -Run Bacalhau on a synthetic dataset. +In this example, we will learn how to run Bacalhau on a synthetic dataset. We will generate synthetic credit card transaction data using the Sparkov program and store the results in IPFS. -## Prerequisite +### Prerequisite -To get started, you need to install the Bacalhau client, see more information [here](https://docs.bacalhau.org/getting-started/installation) +To get started, you need to install the Bacalhau client, see more information [here](../../../getting-started/installation.md) -## Running Sparkov Locally​ +## 1. Running Sparkov Locally​ -To run Sparkov locally, you'll need to clone the repo and install dependencies. +To run Sparkov locally, you'll need to clone the repo and install dependencies: @@ -34,14 +29,14 @@ git clone https://github.com/js-ts/Sparkov_Data_Generation/ pip3 install -r Sparkov_Data_Generation/requirements.txt ``` -Go to the Sparkov_Data_Generation directory +Go to the `Sparkov_Data_Generation` directory: ```python %cd Sparkov_Data_Generation ``` -Creating a temporary directory to store the outputs +Create a temporary directory (`outputs`) to store the outputs: ```bash @@ -49,25 +44,25 @@ Creating a temporary directory to store the outputs mkdir ../outputs ``` -## Running the script - -After the repo image has been pushed to Docker Hub, we can now use the container for running on Bacalhau. To submit a job, run the following Bacalhau command: - +## 2. Running the script ```bash %%bash python3 datagen.py -n 1000 -o ../outputs "01-01-2022" "10-01-2022" ``` -Below are some of the parameters you need before running the script +The command above executes the Python script `datagen.py`, passing the following arguments to it: -- `-n`: Number of customers to generate +`-n 1000`: Number of customers to generate -- `-o`: path to store the outputs +`-o ../outputs`: path to store the outputs -- `Start date`: "01-01-2022" +`"01-01-2022"`: Start date + +`"10-01-2022"`: End date + +Thus, this command uses a Python script to generate synthetic credit card transaction data for the period from `01-01-2022` to `10-01-2022` and saves the results in the `../outputs` directory. -- `End date`: "10-01-2022" To see the full list of options, use: @@ -77,18 +72,14 @@ To see the full list of options, use: python datagen.py -h ``` -## Containerize Script using Docker +## 3. Containerize Script using Docker -:::info -You can skip this entirely and directly go to running on Bacalhau. -::: - -If you want any additional dependencies to be installed along with DuckDB, you need to build your own container. - -To build your own docker container, create a `Dockerfile`, which contains instructions to build your DuckDB docker container. +To build your own docker container, create a `Dockerfile`, which contains instructions to build your image: ``` +%%writefile Dockerfile + FROM python:3.8 RUN apt update && apt install git @@ -100,6 +91,8 @@ WORKDIR /Sparkov_Data_Generation/ RUN pip3 install -r requirements.txt ``` +These commands specify how the image will be built, and what extra requirements will be included. We use `python:3.8` as the base image, install `git`, clone the `Sparkov_Data_Generation` repository from GitHub, set the working directory inside the container to `/Sparkov_Data_Generation/`, and install Python dependencies listed in the `requirements.txt` file." + :::info See more information on how to containerize your script/app [here](https://docs.docker.com/get-started/02_our_app/) ::: @@ -107,24 +100,24 @@ See more information on how to containerize your script/app [here](https://docs. ### Build the container -We will run `docker build` command to build the container; +We will run `docker build` command to build the container: ``` docker build -t /: . ``` -Before running the command replace; +Before running the command replace: -- **hub-user** with your docker hub username, If you don’t have a docker hub account [follow these instructions to create Docker account](https://docs.docker.com/docker-id/), and use the username of the account you created +**`hub-user`** with your docker hub username. If you don’t have a docker hub account [follow these instructions to create docker account](https://docs.docker.com/docker-id/), and use the username of the account you created -- **repo-name** with the name of the container, you can name it anything you want +**`repo-name`** with the name of the container, you can name it anything you want -- **tag** this is not required but you can use the latest tag +**`tag`** this is not required but you can use the `latest` tag In our case: ``` -docker build -t jsacex/sparkov-data-generation +docker build -t jsacex/sparkov-data-generation . ``` ### Push the container @@ -144,45 +137,40 @@ docker push jsacex/sparkov-data-generation After the repo image has been pushed to Docker Hub, we can now use the container for running on Bacalhau -## Running a Bacalhau Job +## 4. Running a Bacalhau Job -Now we're ready to run a Bacalhau job. This code runs a job, downloads the results, and prints the stdout. - -Copy and paste the following code to your terminal +Now we're ready to run a Bacalhau job: ```bash %%bash --out job_id bacalhau docker run \ ---id-only \ ---wait \ -jsacex/sparkov-data-generation \ --- python3 datagen.py -n 1000 -o ../outputs "01-01-2022" "10-01-2022" + --id-only \ + --wait \ + jsacex/sparkov-data-generation \ + -- python3 datagen.py -n 1000 -o ../outputs "01-01-2022" "10-01-2022" ``` -### Structure of the command - -Let's look closely at the command above: +### Structure of the command: -* `bacalhau docker run`: call to bacalhau +`bacalhau docker run`: call to Bacalhau -* `jsacex/sparkov-data-generation`: the name and the tag of the docker image we are using +`jsacex/sparkov-data-generation`: the name of the docker image we are using -* `-o ../outputs "01-01-2022" "10-01-2022`: path to store the outputs, start date and end-date. +`-- python3 datagen.py -n 1000 -o ../outputs "01-01-2022" "10-01-2022"`: the arguments passed into the container, specifying the execution of the Python script `datagen.py` with specific parameters, such as the amount of data, output path, and time range. -* `python3 datagen.py -n 1000`: execute Sparktov -When a job is submitted, Bacalhau prints out the related `job_id`. We store that in an environment variable so that we can reuse it later on. +When a job is submitted, Bacalhau prints out the related `job_id`. We store that in an environment variable so that we can reuse it later on: ```python %env JOB_ID={job_id} ``` -## Checking the State of your Jobs +## 5. Checking the State of your Jobs -- **Job status**: You can check the status of the job using `bacalhau list`. +**Job status**: You can check the status of the job using `bacalhau list`. ```bash @@ -190,9 +178,9 @@ When a job is submitted, Bacalhau prints out the related `job_id`. We store that bacalhau list --id-filter ${JOB_ID} ``` -When it says `Completed`, that means the job is done, and we can get the results. +When it says `Published` or `Completed`, that means the job is done, and we can get the results. -- **Job information**: You can find out more information about your job by using `bacalhau describe`. +**Job information**: You can find out more information about your job by using `bacalhau describe`. @@ -201,23 +189,25 @@ When it says `Completed`, that means the job is done, and we can get the results bacalhau describe ${JOB_ID} ``` -- **Job download**: You can download your job results directly by using `bacalhau get`. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory. +**Job download**: You can download your job results directly by using `bacalhau get`. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory (`results`) and downloaded our job output to be stored in that directory. ```bash %%bash rm -rf results && mkdir -p results -bacalhau get $JOB_ID --output-dir results +bacalhau get ${JOB_ID} --output-dir results ``` -After the download has finished you should see the following contents in the results directory -## Viewing your Job Output +## 6. Viewing your Job Output -To view the file, run the following command: +To view the contents of the current directory, run the following command: ```bash %%bash -ls results/outputs # list the contents of the current directory +ls results/outputs ``` + +## Support +If you have questions or need support or guidance, please reach out to the [Bacalhau team via Slack](https://bacalhauproject.slack.com/ssb/redirect) (**#general** channel). \ No newline at end of file diff --git a/docs/docs/setting-up/workload-onboarding/bacalhau-docker-image/index.md b/docs/docs/setting-up/workload-onboarding/bacalhau-docker-image/index.md index 15b88221c8..5b0eedb353 100644 --- a/docs/docs/setting-up/workload-onboarding/bacalhau-docker-image/index.md +++ b/docs/docs/setting-up/workload-onboarding/bacalhau-docker-image/index.md @@ -5,16 +5,13 @@ description: How to use the Bacalhau Docker image --- # Bacalhau Docker Image - -[![stars - badge-generator](https://img.shields.io/github/stars/bacalhau-project/bacalhau?style=social)](https://github.com/bacalhau-project/bacalhau) - This documentation explains how to use the Bacalhau Docker image to run tasks and manage them using the Bacalhau client. ## Prerequisites To get started, you need to install the Bacalhau client (see more information [here](../../../getting-started/installation.md)) and Docker. -## 1. Pull the Docker image +## 1. Pull the Bacalhau Docker image The first step is to pull the Bacalhau Docker image from the [Github container registry](https://github.com/orgs/bacalhau-project/packages/container/package/bacalhau). @@ -55,27 +52,32 @@ v1.2.0 v1.2.0 ## 3. Running a Bacalhau Job -To submit a job to Bacalhau, we use the `bacalhau docker run` command: +In the example below, an Ubuntu-based job runs to print the message 'Hello from Docker Bacalhau: ```shell docker run -t ghcr.io/bacalhau-project/bacalhau:latest \ docker run \ --id-only \ --wait \ - ubuntu:latest -- \ - sh -c 'uname -a && echo "Hello from Docker Bacalhau!"' + ubuntu:latest \ + -- sh -c 'uname -a && echo "Hello from Docker Bacalhau!"' ``` -In this example, an Ubuntu-based job runs, prints the `Hello from Docker Bacalhau` message, then exits. ### Structure of the command `ghcr.io/bacalhau-project/bacalhau:latest `: Name of the Bacalhau Docker image -`--id-only......`: Output only the job id +`--id-only`: Output only the job id + +`--wait`: Wait for the job to finish `ubuntu:latest.` Ubuntu container + `--`: Separate Bacalhau parameters from the command to be executed inside the container + + `sh -c 'uname -a && echo "Hello from Docker Bacalhau!"'`: The command executed inside the container + Let's have a look at the command execution in the terminal: ```shell diff --git a/docs/docs/setting-up/workload-onboarding/custom-containers/index.md b/docs/docs/setting-up/workload-onboarding/custom-containers/index.md index 70f4a63ae1..0c614e48c3 100644 --- a/docs/docs/setting-up/workload-onboarding/custom-containers/index.md +++ b/docs/docs/setting-up/workload-onboarding/custom-containers/index.md @@ -9,18 +9,16 @@ sidebar_position: 11 Bacalhau operates by executing jobs within containers. This example shows you how to build and use a custom docker container. -## TD;LR -Running Custom Containers in Bacalhau +### Prerequisite -## Prerequisite +1. To get started, you need to install the Bacalhau client, see more information [here](../../../getting-started/installation.md) +2. This example requires Docker. If you don't have Docker installed, you can install it from [here](https://docs.docker.com/install/). Docker commands will not work on hosted notebooks like Google Colab, but the Bacalhau commands will. -- To get started, you need to install the Bacalhau client, see more information [here](https://docs.bacalhau.org/getting-started/installation) -- This example requires Docker. If you don't have Docker installed, you can install it from [here](https://docs.docker.com/install/). Docker commands will not work on hosted notebooks like Google Colab, but the Bacalhau commands will. +## 1. Running Containers -## Running Containers in Bacalhau - -You're probably used to running docker commands to run a container. +### Docker Command +You're likely familiar with executing Docker commands to start a container: ```bash @@ -28,23 +26,50 @@ You're probably used to running docker commands to run a container. docker run docker/whalesay cowsay sup old fashioned container run ``` -Bacalhau uses a syntax that is similar to docker and you can use the same containers. The main difference is that input and output data is passed to the container via IPFS, to enable planetary scale. In this example, it doesn't make too much difference except that we need to download the stdout. - -The `--wait` flag tells Bacalhau to wait for the job to finish before returning. This is useful in interactive sessions like this, but you would normally allow jobs to complete in the background and use the `list` command to check on their status. - -Another difference is that by default Bacalhau overwrites the default entry point for the container so you have to pass all shell commands as arguments to the `run` command after the `--` flag. - -### Running a Bacalhau Job +This command runs a container from the `docker/whalesay` image. +The container executes the `cowsay sup old fashioned container run` command: + +```shell +Expected output: +_________________________________ +< sup old fashioned container run > + --------------------------------- + \ + \ + \ + ## . + ## ## ## == + ## ## ## ## === + /""""""""""""""""___/ === + ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ / ===- ~~~ + \______ o __/ + \ \ __/ + \____\______/ +``` +### Bacalhau Command ```bash %%bash --out job_id bacalhau docker run --wait --id-only docker/whalesay -- bash -c 'cowsay hello web3 uber-run' ``` -When a job is submitted, Bacalhau prints out the related `job_id`. We store that in an environment variable so that we can reuse it later on. +This command also runs a container from the `docker/whalesay` image, using Bacalhau. We use the `bacalhau docker run` command to start a job in a Docker container. +It contains additional flags such as `--wait` to wait for job completion and `--id-only` to return only the job identifier. +Inside the container, the `bash -c 'cowsay hello web3 uber-run'` command is executed. + +When a job is submitted, Bacalhau prints out the related `job_id` (`7e41b9b9-a9e2-4866-9fce-17020d8ec9e0`): + +```shell +7e41b9b9-a9e2-4866-9fce-17020d8ec9e0 +``` +We store that in an environment variable so that we can reuse it later on. -You can download your job results directly by using `bacalhau get`. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory. +```python +%env JOB_ID={job_id} +``` + +You can download your job results directly by using `bacalhau get`. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory (`results`) and downloaded our job output to be stored in that directory. ```bash @@ -59,13 +84,42 @@ Viewing your job output ```bash %%bash cat ./results/stdout + +Expected output: + + _____________________ +< hello web3 uber-run > + --------------------- + \ + \ + \ + ## . + ## ## ## == + ## ## ## ## === + /""""""""""""""""___/ === + ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ / ===- ~~~ + \______ o __/ + \ \ __/ + \____\______/ ``` -## Building Your Own Custom Container For Bacalhau +Both commands execute cowsay in the `docker/whalesay` container, but Bacalhau provides additional features for working with jobs at scale. + +### Bacalhau Syntax +Bacalhau uses a syntax that is similar to Docker, and you can use the same containers. The main difference is that input and output data is passed to the container via IPFS, to enable planetary scale. In the example above, it doesn't make too much difference except that we need to download the stdout. + +The `--wait` flag tells Bacalhau to wait for the job to finish before returning. This is useful in interactive sessions like this, but you would normally allow jobs to complete in the background and use the `bacalhau list` command to check on their status. + +Another difference is that by default Bacalhau overwrites the default entry point for the container, so you have to pass all shell commands as arguments to the `run` command after the `--` flag. + + + + +## 2. Building Your Own Custom Container For Bacalhau To use your own custom container, you must publish the container to a container registry that is accessible from the Bacalhau network. At this time, only public container registries are supported. -To demonstrate this, you will develop and build a simple custom container that comes from an old docker example. I remember seeing cowsay at a Docker conference about a decade ago. I think it's about time we brought it back to life and distribute it across the Bacalhau network. +To demonstrate this, you will develop and build a simple custom container that comes from an old Docker example. I remember seeing cowsay at a Docker conference about a decade ago. I think it's about time we brought it back to life and distribute it across the Bacalhau network. ```python @@ -118,7 +172,7 @@ docker build -t ghcr.io/bacalhau-project/examples/codsay:latest . 2> /dev/null docker run --rm ghcr.io/bacalhau-project/examples/codsay:latest codsay I like swimming in data ``` -Once your container is working as expected then you should push it to a public container registry. In this example, I'm pushing to Github's container registry, but we'll skip the step below because you probably don't have permission.Remember that the Bacalhau nodes expect your container to have a `linux/amd64` architecture. +Once your container is working as expected then you should push it to a public container registry. In this example, I'm pushing to Github's container registry, but we'll skip the step below because you probably don't have permission. Remember that the Bacalhau nodes expect your container to have a `linux/amd64` architecture. ```bash @@ -126,7 +180,7 @@ Once your container is working as expected then you should push it to a public c # docker buildx build --platform linux/amd64,linux/arm64 --push -t ghcr.io/bacalhau-project/examples/codsay:latest . ``` -## Running Your Custom Container on Bacalhau +## 3. Running Your Custom Container on Bacalhau Now we're ready to submit a Bacalhau job using your custom container. This code runs a job, downloads the results, and prints the stdout. @@ -161,4 +215,29 @@ View your job output ```bash %%bash cat ./results/stdout + +Expected output: + +_______________________ +< Look at all this data > + ----------------------- + \ + \ + ,,,,_ + ┌Φ▓╬▓╬▓▓▓W @▓▓▒, + ╠▓╬▓╬╣╬╬▓╬▓▓ ╔╣╬╬▓╬╣▓, + __,┌╓═╠╬╠╬╬╬Ñ╬╬╬Ñ╬╬¼,╣╬╬▓╬╬▓╬▓▓▓┐ ╔W_ ,φ▓▓ + ,«@▒╠╠╠╠╩╚╙╙╩Ü╚╚╚╚╩╙╙╚╠╩╚╚╟▓▒╠╠╫╣╬╬╫╬╣▓, _φ╬▓╬╬▓, ,φ╣▓▓╬╬ + _,φÆ╩╬╩╙╚╩░╙╙░░╩`=░╙╚»»╦░=╓╙Ü1R░│░╚Ü░╙╙╚╠╠╠╣╣╬≡Φ╬▀╬╣╬╬▓▓▓_ ╓▄▓▓▓▓▓▓╬▌ + _,φ╬Ñ╩▌▐█[▒░░░░R░░▀░`,_`!R`````╙`-'╚Ü░░Ü░░░░░░░│││░╚╚╙╚╩╩╩╣Ñ╩╠▒▒╩╩▀▓▓╣▓▓╬╠▌ + '╚╩Ü╙│░░╙Ö▒Ü░░░H░░R ▒¥╣╣@@@▓▓▓ := '` `░``````````````````````````]▓▓▓╬╬╠H + '¬═▄ `░╙Ü░╠DjK` Å»»╙╣▓▓▓▓╬Ñ -»` -` ` ,;╓▄╔╗∞ ~▓▓▓▀▓▓╬╬╬▌ + '^^^` _╒Γ `╙▀▓▓╨ _, ⁿD╣▓╬╣▓╬▓╜ ╙╬▓▓╬╬▓▓ + ```└ _╓▄@▓▓▓╜ `╝╬▓▓╙ ²╣╬▓▓ + %φ▄╓_ ~#▓╠▓▒╬▓╬▓▓^ ` ╙╙ + `╣▓▓▓ ╠╬▓╬▓╬▀` + ╚▓▌ '╨▀╜ ``` + +## Support +If you have questions or need support or guidance, please reach out to the [Bacalhau team via Slack](https://bacalhauproject.slack.com/ssb/redirect) (**#general** channel). \ No newline at end of file diff --git a/docs/docs/setting-up/workload-onboarding/python-script/index.md b/docs/docs/setting-up/workload-onboarding/python-script/index.md index 946db45b32..a9521b1ae6 100644 --- a/docs/docs/setting-up/workload-onboarding/python-script/index.md +++ b/docs/docs/setting-up/workload-onboarding/python-script/index.md @@ -1,24 +1,17 @@ # Scripting Bacalhau with Python - -[![stars - badge-generator](https://img.shields.io/github/stars/bacalhau-project/bacalhau?style=social)](https://github.com/bacalhau-project/bacalhau) - Bacalhau allows you to easily execute batch jobs via the CLI. But sometimes you need to do more than that. You might need to execute a script that requires user input, or you might need to execute a script that requires a lot of parameters. In any case, you probably want to execute your jobs in a repeatable manner. This example demonstrates a simple Python script that is able to orchestrate the execution of lots of jobs in a repeatable manner. -## TD;LR -Running Python script in Bacalhau - -## Prerequisite +### Prerequisite -To get started, you need to install the Bacalhau client, see more information [here](https://docs.bacalhau.org/getting-started/installation) +To get started, you need to install the Bacalhau client, see more information [here](../../../getting-started/installation.md) ## Executing Bacalhau Jobs with Python Scripts -To demonstrate this example, I will use the data generated from a Ethereum example. This produced a list of hashes that I will iterate over and execute a job for each one. - +To demonstrate this example, I will use the data generated from an Ethereum example. This produced a list of hashes that I will iterate over and execute a job for each one. ```python %%writefile hashes.txt @@ -30,8 +23,7 @@ bafybeih6te26iwf5kzzby2wqp67m7a5pmwilwzaciii3zipvhy64utikre bafybeicjd4545xph6rcyoc74wvzxyaz2vftapap64iqsp5ky6nz3f5yndm ``` -Now let's run the following script. You can execute this script anywhere with `python bacalhau.py`. - +Now let's create a file called `bacalhau.py`. The script below automates the submission, monitoring, and retrieval of results for multiple Bacalhau jobs in parallel. It is designed to be used in a scenario where there are multiple hash files, each representing a job, and the script manages the execution of these jobs using Bacalhau commands. ```python %%writefile bacalhau.py @@ -165,9 +157,9 @@ if __name__ == "__main__": ``` This code has a few interesting features: -* Change the value in the `main` call to change the number of jobs to execute -* Because all jobs are complete at different times, there's a loop to check that all jobs have been completed before downloading the results -- if you don't do this you'll likely see an error when trying to download the results -* When downloading the results, the IPFS get often times out, so I wrapped that in a loop +1. Change the value in the `main` call (`main("hashes.txt", 10)`) to change the number of jobs to execute. +2. Because all jobs are complete at different times, there's a loop to check that all jobs have been completed before downloading the results. If you don't do this, you'll likely see an error when trying to download the results. The `while True` loop is used to monitor the status of jobs and wait for them to complete. +3. When downloading the results, the IPFS get often times out, so I wrapped that in a loop. The `for i in range(0, 5)` loop in the `getResultsFromJob` function involves retrying the `bacalhau get` operation if it fails to complete successfully. Let's run it! @@ -177,18 +169,28 @@ Let's run it! python bacalhau.py ``` -Hopefully, the results directory contains all the combined results from the jobs we just executed. Here's we're expecting to see CSV files: +Hopefully, the `results` directory contains all the combined results from the jobs we just executed. Here's we're expecting to see CSV files: ```bash %%bash -ls -l results +ls results + +Expected Output: +transactions_00000000_00049999.csv transactions_00150000_00199999.csv +transactions_00050000_00099999.csv transactions_00200000_00249999.csv +transactions_00100000_00149999.csv transactions_00250000_00299999.csv + ``` Success! We've now executed a bunch of jobs in parallel using Python. This is a great way to execute lots of jobs in a repeatable manner. You can alter the file above for your purposes. -### Next Steps +## Next Steps You might also be interested in the following examples: -* [Analysing Data with Python Pandas](../python-pandas/index.md) +[Analysing Data with Python Pandas](../python-pandas/index.md) + + +## Support +If you have questions or need support or guidance, please reach out to the [Bacalhau team via Slack](https://bacalhauproject.slack.com/ssb/redirect) (**#general** channel). diff --git a/docs/docs/setting-up/workload-onboarding/rust-wasm/index.md b/docs/docs/setting-up/workload-onboarding/rust-wasm/index.md index e87616e896..d54169f740 100644 --- a/docs/docs/setting-up/workload-onboarding/rust-wasm/index.md +++ b/docs/docs/setting-up/workload-onboarding/rust-wasm/index.md @@ -7,21 +7,17 @@ sidebar_position: 10 [![stars - badge-generator](https://img.shields.io/github/stars/bacalhau-project/bacalhau?style=social)](https://github.com/bacalhau-project/bacalhau) -Bacalhau supports running jobs as a [WebAssembly (WASM)](https://webassembly.org/) program rather than using a Docker container. This example demonstrates how to compile a [Rust](https://www.rust-lang.org/) project into WebAssembly and run the program on Bacalhau. - -## TD;LR -Run WASM job on Bacalhau +Bacalhau supports running jobs as a [WebAssembly (WASM)](https://webassembly.org/) program. This example demonstrates how to compile a [Rust](https://www.rust-lang.org/) project into WebAssembly and run the program on Bacalhau. ### Prerequisites -* To get started, you need to install the Bacalhau client, see more information [here](https://docs.bacalhau.org/getting-started/installation) -* A working Rust installation with the `wasm32-wasi` target. For example, you can use [`rustup`](https://rustup.rs/) to install Rust and configure it to build WASM targets. +1. To get started, you need to install the Bacalhau client, see more information [here](../../../getting-started/installation.md). -For those using the notebook, these are installed in hidden cells below. +2. A working Rust installation with the `wasm32-wasi` target. For example, you can use [`rustup`](https://rustup.rs/) to install Rust and configure it to build WASM targets. For those using the notebook, these are installed in hidden cells below. -## Develop a Rust Program Locally +## 1. Develop a Rust Program Locally -We can use `cargo` (which will have been installed by `rustup`) to start a new project and compile it. +We can use `cargo` (which will have been installed by `rustup`) to start a new project (`my-program`) and compile it: ```bash @@ -29,9 +25,10 @@ We can use `cargo` (which will have been installed by `rustup`) to start a new p cargo init my-program ``` + We can then write a Rust program. Rust programs that run on Bacalhau can read and write files, access a simple clock, and make use of pseudo-random numbers. They cannot memory-map files or run code on multiple threads. -The below program will make use of the Rust `imageproc` create to resize an image through seam carving, based on [an example from their repository](https://github.com/image-rs/imageproc/blob/master/examples/seam_carving.rs). +The program below will use the Rust `imageproc` crate to resize an image through seam carving, based on [an example from their repository](https://github.com/image-rs/imageproc/blob/master/examples/seam_carving.rs). ```python @@ -99,7 +96,9 @@ fn main() { } ``` -We also need to install the `imageproc` and `image` libraries and switch off the default features to make sure that multi-threading is disabled. +In the main function `main()` an image is loaded, the original is saved, and then a loop is performed to reduce the width of the image by removing "seams." The results of the process are saved, including the original image with drawn seams and a gradient image with highlighted seams. + +We also need to install the `imageproc` and `image` libraries and switch off the default features to make sure that multi-threading is disabled (`default-features = false`). After disabling the default features, you need to explicitly specify only the features that you need: ```python @@ -119,20 +118,21 @@ version = "0.23.0" default-features = false ``` -We can now build the Rust program into a WASM blob using `cargo`. +We can now build the Rust program into a WASM blob using `cargo`: ```bash %%bash cd my-program && cargo build --target wasm32-wasi --release ``` +This command navigates to the `my-program` directory and builds the project using Cargo with the target set to `wasm32-wasi` in release mode. -This will generate a WASM file at `./my-program/target/wasm32-wasi/my-program.wasm` which can now be run on Bacalhau. +This will generate a WASM file at `./my-program/target/wasm32-wasi/release/my-program.wasm` which can now be run on Bacalhau. -## Running WASM on Bacalhau +## 2. Running WASM on Bacalhau Now that we have a WASM binary, we can upload it to IPFS and use it as input to a Bacalhau job. -The -i switch allows specifying a URI to be mounted as a named volume in the job, which can be an IPFS CID, HTTP URL, or S3 object. +The `-i` flag allows specifying a URI to be mounted as a named volume in the job, which can be an IPFS CID, HTTP URL, or S3 object. For this example, we are using an image of the Statue of Liberty that has been pinned to a storage facility. @@ -144,14 +144,35 @@ bacalhau wasm run ./my-program/target/wasm32-wasi/release/my-program.wasm _start -i ipfs://bafybeifdpl6dw7atz6uealwjdklolvxrocavceorhb3eoq6y53cbtitbeu:/inputs ``` -We can now get the results. When we view the files, we can see the original image, the resulting shrunk image, and the seams that were removed. +### Structure of the Commands + +`bacalhau wasm run`: call to Bacalhau + +`./my-program/target/wasm32-wasi/release/my-program.wasm`: the path to the WASM file that will be executed + +` _start`: the entry point of the WASM program, where its execution begins + +`--id-only`: this flag indicates that only the identifier of the executed job should be returned + +`-i ipfs://bafybeifdpl6dw7atz6uealwjdklolvxrocavceorhb3eoq6y53cbtitbeu:/inputs`: input data volume that will be accessible within the job at the specified destination path + +When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on: ```python %env JOB_ID={job_id} ``` + + + +You can download your job results directly by using `bacalhau get`. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory (`wasm_results`) and downloaded our job output to be stored in that directory. + +We can now get the results. + + + ```bash %%bash rm -rf wasm_results && mkdir -p wasm_results @@ -160,6 +181,7 @@ bacalhau get ${JOB_ID} --output-dir wasm_results ## Viewing Job Output +When we view the files, we can see the original image, the resulting shrunk image, and the seams that were removed. ```python import IPython.display as display @@ -201,4 +223,5 @@ display.Image("./wasm_results/outputs/shrunk.png") ![png](index_files/index_20_0.png) - +## Support +If you have questions or need support or guidance, please reach out to the [Bacalhau team via Slack](https://bacalhauproject.slack.com/ssb/redirect) (**#general** channel). diff --git a/docs/docs/setting-up/workload-onboarding/trivial-python/index.md b/docs/docs/setting-up/workload-onboarding/trivial-python/index.md index dd45ae180d..97fb6743a5 100644 --- a/docs/docs/setting-up/workload-onboarding/trivial-python/index.md +++ b/docs/docs/setting-up/workload-onboarding/trivial-python/index.md @@ -8,63 +8,88 @@ description: How to run a Python file hosted on Bacalhau [![stars - badge-generator](https://img.shields.io/github/stars/bacalhau-project/bacalhau?style=social)](https://github.com/bacalhau-project/bacalhau) -This example tutorial serves as an introduction to Bacalhau. Here, you'll be running a Python file hosted on a website on Bacalhau. +This tutorial serves as an introduction to Bacalhau. In this example, you'll be executing a simple "Hello, World!" Python script hosted on a website on Bacalhau. -## TD;LR -A quick guide on how to run a hello world script on Bacalhau +### Prerequisites -## Prerequisites +To get started, you need to install the Bacalhau client, see more information [here](../../../getting-started/installation.md) -To get started, you need to install the Bacalhau client, see more information [here](https://docs.bacalhau.org/getting-started/installation) +## 1. Running Python Locally -## Creating a Hello World File - -We'll be using a very simple Python script that displays the [traditional first greeting](https://en.wikipedia.org/wiki/%22Hello,_World!%22_program). +We'll be using a very simple Python script that displays the [traditional first greeting](https://en.wikipedia.org/wiki/%22Hello,_World!%22_program). Create a file called `hello-world.py`: +```python +%%writefile hello-world.py +print("Hello, world!") +``` ```python %cat hello-world.py ``` -## Submit the workload +Running the script to print out the output: + +```bash +%%bash +python3 hello-world.py +``` +After the script has run successfully locally we can now run it on Bacalhau. + +## 2. Running a Bacalhau Job -To submit a workload to Bacalhau you can use the `bacalhau docker run` command. + + +To submit a workload to Bacalhau you can use the `bacalhau docker run` command. This command allows passing input data into the container using [content identifier (CID)](https://github.com/multiformats/cid) volumes, we will be using the `--input URL:path` [argument](../../../dev/cli-reference/all-flags.md#docker-run) for simplicity. This results in Bacalhau mounting a *data volume* inside the container. By default, Bacalhau mounts the input volume at the path `/inputs` inside the container. + +:::info +[Bacalhau overwrites the default entrypoint](https://github.com/filecoin-project/bacalhau/blob/v0.2.3/cmd/bacalhau/docker_run.go#L64), so we must run the full command after the `--` argument. +::: ```bash %%bash --out job_id bacalhau docker run \ - --id-only \ - --input https://raw.githubusercontent.com/bacalhau-project/examples/151eebe895151edd83468e3d8b546612bf96cd05/workload-onboarding/trivial-python/hello-world.py \ - python:3.10-slim -- python3 /inputs/hello-world.py + --id-only \ + --input https://raw.githubusercontent.com/bacalhau-project/examples/151eebe895151edd83468e3d8b546612bf96cd05/workload-onboarding/trivial-python/hello-world.py \ + python:3.10-slim \ + -- python3 /inputs/hello-world.py ``` -When a job is submitted, Bacalhau prints out the related `job_id`. We store that in an environment variable so that we can reuse it later on. +### Structure of the command + +`bacalhau docker run`: call to Bacalhau + +`--id-only`: specifies that only the job identifier (job_id) will be returned after executing the container, not the entire output + +`--input https://raw.githubusercontent.com/bacalhau-project/examples/151eebe895151edd83468e3d8b546612bf96cd05/workload-onboarding/trivial-python/hello-world.py \`: indicates where to get the input data for the container. In this case, the input data is downloaded from the specified URL, which represents the Python script "hello-world.py". + +`python:3.10-slim`: the Docker image that will be used to run the container. In this case, it uses the Python 3.10 image with a minimal set of components (slim). + +`--`: This double dash is used to separate the Bacalhau command options from the command that will be executed inside the Docker container. + +`python3 /inputs/hello-world.py`: running the `hello-world.py` Python script stored in `/inputs`. + + +When a job is submitted, Bacalhau prints out the related `job_id`. We store that in an environment variable so that we can reuse it later on: ```python %env JOB_ID={job_id} ``` -The `bacalhau docker run` command allows passing input data into the container using [content identifier (CID)](https://github.com/multiformats/cid) volumes, we will be using the `-i URL:path` [argument](https://docs.bacalhau.org/all-flags#docker-run) for simplicity. This results in Bacalhau mounting a *data volume* inside the container. By default, Bacalhau mounts the input volume at the path `/inputs` inside the container. +## 3. Checking the State of your Jobs -:::info -[Bacalhau overwrites the default entrypoint](https://github.com/filecoin-project/bacalhau/blob/v0.2.3/cmd/bacalhau/docker_run.go#L64), so we must run the full command after the `--` argument. -::: - -## Checking the State of your Jobs - -- **Job status**: You can check the status of the job using `bacalhau list`. +**Job status**: You can check the status of the job using `bacalhau list`. ```bash %%bash -bacalhau list --id-filter=${JOB_ID} --no-style +bacalhau list --id-filter ${JOB_ID} --no-style ``` When it says `Published` or `Completed`, that means the job is done, and we can get the results. -- **Job information**: You can find out more information about your job by using `bacalhau describe`. +**Job information**: You can find out more information about your job by using `bacalhau describe`. ```bash @@ -72,7 +97,7 @@ When it says `Published` or `Completed`, that means the job is done, and we can bacalhau describe ${JOB_ID} ``` -- **Job download**: You can download your job results directly by using `bacalhau get`. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory. +**Job download**: You can download your job results directly by using `bacalhau get`. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory (`results`) and downloaded our job output to be stored in that directory. ```bash @@ -81,18 +106,15 @@ rm -rf results && mkdir results bacalhau get ${JOB_ID} --output-dir results ``` -## Viewing your Job Output +## 4. Viewing your Job Output To view the file, run the following command: ```bash - %%bash cat results/stdout - ``` -## Need Support? - -For questions, and feedback, please reach out to our [forum](https://github.com/filecoin-project/bacalhau/discussions) +## Support +If you have questions or need support or guidance, please reach out to the [Bacalhau team via Slack](https://bacalhauproject.slack.com/ssb/redirect) (**#general** channel).