Skip to content

Commit

Permalink
docs(dask): add initial dask docs for users and admins (#212)
Browse files Browse the repository at this point in the history
  • Loading branch information
Alputer committed Dec 2, 2024
1 parent 761c23e commit e9ec027
Show file tree
Hide file tree
Showing 6 changed files with 178 additions and 0 deletions.
1 change: 1 addition & 0 deletions AUTHORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ The list of contributors in alphabetical order:

- [Adelina Lintuluoto](https://orcid.org/0000-0002-0726-1452)
- [Agisilaos Kounelis](https://orcid.org/0000-0001-9312-3189)
- [Alp Tuna](https://orcid.org/0009-0001-1915-3993)
- [Audrius Mecionis](https://orcid.org/0000-0002-3759-1663)
- [Clemens Lange](https://orcid.org/0000-0002-3632-3157)
- [Daan Rosendal](https://orcid.org/0000-0002-3447-9000)
Expand Down
87 changes: 87 additions & 0 deletions docs/administration/configuration/configuring-dask/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Configuring Dask

Dask integration in REANA allows users to request dedicated Dask clusters for their workflow requirements. Each cluster operates independently, providing the computational resources necessary to efficiently execute workflows.

## Enabling Dask

Dask support is disabled by default in REANA. If you would like to enable them so that users can ask for a Dask cluster for their workflow, you can set [`dask.enabled`](https://github.com/reanahub/reana/tree/master/helm/reana) Helm value to `true`.

## Configuring Autoscaler

Each Dask cluster in REANA comes with an autoscaler by default. If you would like to disable autoscaler feature, you can set the [`dask.autoscaler_enabled`](https://github.com/reanahub/reana/tree/master/helm/reana) Helm value to `false`.

The autoscaler manages the Dask cluster for your workflow by scaling up to a maximum of N workers when needed and scaling down during less resource-intensive periods. You can define the number of workers (N) in your `reana.yaml` file or use the default worker count set for Dask clusters.

For more details on how the autoscaler works under the hood, you can check the [official Dask Kubernetes Operator autoscaler documentation](https://kubernetes.dask.org/en/latest/operator_resources.html#daskautoscaler).

## Limiting Cluster Memory

The maximum memory allocated for a Dask cluster can be configured using the [dask.cluster_max_memory_limit](https://github.com/reanahub/reana/tree/master/helm/reana) Helm value which is set to `16Gi` by default. This setting defines the upper memory limit that can be requested for a cluster by users, based on the combined memory usage of all workers.

For instance, if the `dask.cluster_max_memory_limit` is set to 9Gi, a user can request a cluster with 3 workers, each utilizing up to 3Gi of memory. Any configuration exceeding this limit (e.g., 5 workers with 2Gi each, totaling 10Gi) will not be permitted.

## Configuring Default and Maximum Number Of Workers

When configuring Dask clusters in REANA, there are two important Helm values to control the number of workers in a cluster:

### [`dask.cluster_default_number_of_workers`](https://github.com/reanahub/reana/tree/master/helm/reana)

This value determines the default number of workers assigned to a Dask cluster if the user does not explicitly specify the `number_of_workers` field in their `reana.yaml` workflow configuration file. Setting this value ensures that all Dask clusters start with a reasonable number of workers to handle typical workloads without requiring user input.

For example:

```yaml
dask:
cluster_default_number_of_workers: 3
```
In this case, every Dask cluster will start with 3 workers unless a different number is provided in the reana.yaml
If the cluster administrator does not overwrite the `dask.cluster_default_number_of_workers` variable, it is set to `2` by default.

### [`dask.cluster_max_number_of_workers`](https://github.com/reanahub/reana/tree/master/helm/reana)

This value defines the upper limit on the number of workers a user can request in their reana.yaml, even if their workflow does not reach the cluster_memory_limit. It acts as a safeguard to prevent users from requesting an excessive number of workers with very low memory allocations (e.g., 100 workers with only 30Mi memory).

```yaml
dask:
cluster_max_number_of_workers: 50
```

In this case, users can request up to 50 workers in their reana.yaml. If they attempt to request more than the maximum limit, the system will cap the cluster size to 50 workers, regardless of the memory limits.

If the cluster administrator does not overwrite the `dask.cluster_max_number_of_workers` variable, it is set to `20` by default.

## Configuring Default and Maximum Memory for Single Workers

In addition to managing the number of workers in a Dask cluster, it is crucial to configure memory limits for individual workers to ensure resource efficiency and prevent workloads from exceeding the capacity of your cluster nodes. REANA provides two Helm values for controlling the default and maximum memory allocation per worker:

### [`dask.cluster_default_single_worker_memory`](https://github.com/reanahub/reana/tree/master/helm/reana)

```yaml
dask:
cluster_default_single_worker_memory: "2Gi"
```

This value sets the default memory allocated to a single worker in a Dask cluster if the user does not specify the `single_worker_memory` field in their `reana.yaml` workflow configuration file.

For example:

In this case, if the user does not explicitly set a memory limit for their workers, each worker in the Dask cluster will be allocated 2Gi of memory by default. This ensures a predictable baseline configuration for workflows.

If the cluster administrator does not overwrite the `dask.cluster_default_single_worker_memory` variable, it is set to `2Gi` by default.

### [`dask.cluster_max_single_worker_memory`](https://github.com/reanahub/reana/tree/master/helm/reana)

This value defines the upper memory limit that a user can request for a single worker in their reana.yaml. It acts as a safeguard to prevent users from allocating more memory than the underlying Kubernetes nodes can handle, which could lead to scheduling failures.

For example:

```yaml
dask:
cluster_max_single_worker_memory: "15Gi"
```

In this case, users can request up to 15Gi of memory for each worker. Any request exceeding this limit (e.g., 20Gi) will be rejected. This ensures that users cannot allocate more memory than is safe for the cluster's infrastructure.

If the cluster administrator does not overwrite the `dask.cluster_max_single_worker_memory` variable, it is set to `8Gi` by default.
1 change: 1 addition & 0 deletions docs/administration/configuration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@
- [Configuring TLS certificates](configuring-tls-certificates)
- [Configuring global workspace retention rules](configuring-global-workspace-retention-rules)
- [Configuring GitLab integration](configuring-gitlab-integration)
- [Configuring Dask](configuring-dask)
1 change: 1 addition & 0 deletions docs/administration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
- [Configuring TLS certificates](configuration/configuring-tls-certificates)
- [Configuring global workspace retention rules](configuration/configuring-global-workspace-retention-rules)
- [Configuring GitLab integration](configuration/configuring-gitlab-integration)
- [Configuring Dask](configuration/configuring-dask)

**Managing REANA platform?**

Expand Down
84 changes: 84 additions & 0 deletions docs/advanced-usage/dask/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Dask

REANA supports the integration of Dask clusters to provide scalable, distributed computing capabilities for workflows. This documentation explains how to set up and configure a Dask cluster using REANA which is dedicated for your workflow, query cluster settings, and utilize Dask's features such as the dashboard for monitoring your workflows.

## Setting up Dask cluster

To configure your Dask cluster, you can set 3 different variables via `reana.yaml`

1. **`image` (mandatory)**
Specifies the Docker image to be used by the Dask workers, scheduler, and the job pod that executes your analysis. Ensure the image includes all dependencies required for your workflow.
2. **`number_of_workers` (optional)**
Defines the number of Dask workers for your cluster. If not specified, a default value configured by your REANA cluster administrator will be used. If you request more workers than the allowed maximum, the following error occurs:

```console
$ reana-client run
...
==> ERROR: Cannot create workflow :
The number of requested Dask workers (N) exceeds the maximum limit (M).
```

In such cases, reduce the number of workers and try again.

3. **`single_worker_memory` (optional)**
Sets the amount of memory allocated to each Dask worker. If not specified, a default value configured by your REANA cluster administrator will be used. Requests exceeding the maximum allowed memory per worker will also result in the following error:

```console
$ reana-client run
...
==> ERROR: Cannot create workflow :
The "single_worker_memory" provided in the dask resources exceeds the limit (8Gi).
```

An example configuration:

```yaml
---
resources:
---
dask:
image: docker.io/coffeateam/coffea-dask-cc7:0.7.22-py3.10-g7f049
number_of_workers: 3
single_worker_memory: 4Gi
```
## Querying Dask Settings and Limits
REANA administrators set some settings and limits about the Dask clusters such as maximum memory limit and whether autoscaler is enabled or not. You can query them with `reana-client info` command.

```console
$ reana-client info
...
Dask autoscaler enabled in the cluster: True
The number of Dask workers created by default: 2
The amount of memory used by default by a single Dask worker: 2Gi
The maximum memory limit for Dask clusters created by users: 16Gi
The maximum number of workers that users can ask for the single Dask cluster: 20
The maximum amount of memory that users can ask for the single Dask worker: 8Gi
Dask workflows allowed in the cluster: True
...
```

## Usage

Once you configure your dask cluster via `reana.yaml`, REANA runs your workflow within an environment and provides the scheduler's uri via environment variable called `DASK_SCHEDULER_URI`. You can use the environment variable as follows:

```python
...
import os
from dask.distributed import Client
DASK_SCHEDULER_URI = os.getenv("DASK_SCHEDULER_URI")
client = Client(DASK_SCHEDULER_URI)
...
```

You can also refer to the demo [here](https://github.com/reanahub/reana-demo-dask-coffea) for a full example which uses Dask with analysis code and `reana.yaml`.

## Dask Dashboard

You can inspect your analysis and Dask cluster via Dask dashbard by clicking the following icon.

SS Here --> Icon URL
SS Here --> Dashboard SS
4 changes: 4 additions & 0 deletions docs/advanced-usage/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,7 @@
**Workspace Retention**

- [Workspace retention](workspace-retention) to automatically remove files after some specified time.

**Dask**

- [Dask](dask) to configure your Dask clusters for your workflow

0 comments on commit e9ec027

Please sign in to comment.