Skip to content

Commit

Permalink
Adding back the job files
Browse files Browse the repository at this point in the history
  • Loading branch information
MichaelHoepler committed Jan 24, 2024
1 parent f718ad5 commit 6dafb3a
Show file tree
Hide file tree
Showing 18 changed files with 839 additions and 1 deletion.
5 changes: 4 additions & 1 deletion docs/docs/setting-up/data-ingestion/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
---
sidebar_label: Data Ingestion"
sidebar_position: 4
---
---
# Data ingestion with Bacalhau

This directory contains instructions on how to setup the data ingestion in Bacalhau.
92 changes: 92 additions & 0 deletions docs/docs/setting-up/jobs/job-selection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
sidebar_label: 'Job Selection Policy'
sidebar_position: 2
---

# Job selection policy

When running a node, you can choose which jobs you want to run by using
configuration options, environment variables or flags to specify a job selection
policy.

| Config property | `serve` flag | Default value | Meaning |
|---|---|---|---|
| Node.Compute.JobSelection.Locality | `--job-selection-data-locality` | Anywhere | Only accept jobs that reference data we have locally ("local") or anywhere ("anywhere"). |
| Node.Compute.JobSelection.ProbeExec | `--job-selection-probe-exec` | unused | Use the result of an external program to decide if we should take on the job. |
| Node.Compute.JobSelection.ProbeHttp | `--job-selection-probe-http` | unused | Use the result of a HTTP POST to decide if we should take on the job. |
| Node.Compute.JobSelection.RejectStatelessJobs | `--job-selection-reject-stateless` | False | Reject jobs that don't specify any [input data](../data-ingestion/index.md). |
| Node.Compute.JobSelection.AcceptNetworkedJobs | `--job-selection-accept-networked` | False | Accept jobs that require [network connections](../networking-instructions/networking.md). |

setting-up/networking-instructions/networking.md

## Job selection probes

If you want more control over making the decision to take on jobs, you can use the `--job-selection-probe-exec` and `--job-selection-probe-http` flags.

These are external programs that are passed the following data structure so that they can make a decision about whether or not to take on a job:

```json
{
"node_id": "XXX",
"job_id": "XXX",
"spec": {
"engine": "docker",
"verifier": "ipfs",
"job_spec_vm": {
"image": "ubuntu:latest",
"entrypoint": ["cat", "/file.txt"]
},
"inputs": [{
"engine": "ipfs",
"cid": "XXX",
"path": "/file.txt"
}]
}
}
```

The `exec` probe is a script to run that will be given the job data on `stdin`, and must exit with status code 0 if the job should be run.

The `http` probe is a URL to POST the job data to. The job will be rejected if
the HTTP request returns a non-positive status code (e.g. >= 400).

If the HTTP response is a JSON blob, it should match the [following
schema](https://github.com/bacalhau-project/bacalhau/blob/885d53e93b01fb343294d7ddbdbffe89918db800/pkg/bidstrategy/type.go#L18-L22)
and will be used to respond to the bid directly:

```json
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"shouldBid": {
"description": "If the job should be accepted",
"type": "boolean"
},
"shouldWait": {
"description": "If the node should wait for an async response that will come later. `shouldBid` will be ignored",
"type": "boolean",
"default": false,
},
"reason": {
"description": "Human-readable string explaining why the job should be accepted or rejected, or why the wait is required",
"type": "string"
}
},
"required": [
"shouldBid",
"reason"
]
}
```

For example, the following response will reject the job:

```json
{
"shouldBid": false,
"reason": "The job did not pass this specific validation: ...",
}
```

If the HTTP response is not a JSON blob, the content is ignored and any non-error status code will accept the job.
4 changes: 4 additions & 0 deletions docs/docs/setting-up/jobs/job-specification/_category_.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
label: "Job Specification"
link:
type: doc
id: references/job-specification/job
52 changes: 52 additions & 0 deletions docs/docs/setting-up/jobs/job-specification/constraint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
sidebar_label: Constraint
---

# Constraint Specification

A `Constraint` represents a condition that must be met for a compute node to be eligible to run a given job. Operators have the flexibility to manually define node labels when initiating a node using the bacalhau serve command. Additionally, Bacalhau boasts features like automatic resource detection and dynamic labeling, further enhancing its capability.

By defining constraints, you can ensure that jobs are scheduled on nodes that have the necessary requirements or conditions.

### `Constraint` Parameters:

- **Key**: The name of the attribute or property to check on the compute node. This could be anything from a specific hardware feature, operating system version, or any other node property.

- **Operator**: Determines the kind of comparison to be made against the `Key`'s value, which can be:
- `in`: Checks if the Key's value exists within the provided list of values.
- `notin`: Ensures the Key's value doesn't match any in the provided list of values.
- `exists`: Verifies that a value for the specified Key is present, regardless of its actual value.
- `!`: Confirms the absence of the specified Key. i.e DoesNotExist
- `gt`: Assesses if the Key's value is greater than the provided value.
- `lt`: Assesses if the Key's value is less than the provided value.
- `=` & `==`: Both are used to compare the Key's value for an exact match with the provided value.
- `!=`: Ensures the Key's value is not the same as the provided value.


- **Values (optional)**: A list of values that the node attribute, specified by the `Key`, is compared against using the `Operator`. This is not needed for operators like `exists` or `!`.

### Example:

Consider a scenario where a job should only run on nodes with a GPU and an operating system version greater than `2.0`. The constraints for such a requirement might look like:

```yaml
constraints:
- key: "hardware.gpu"
operator: "exists"
- key: "Operating-System"
operator: "="
values: ["linux"]
- key: "region"
operator: "in"
values: ["eu-west-1,eu-west-2"]
```
In this example, the first constraint checks if the node has a GPU, the second constraint ensures the OS is linux, and deployed in eu-west-1 or eu-west-2`.

### Notes:

- Constraints are evaluated as a logical AND, meaning all constraints must be satisfied for a node to be eligible.

- Using too many specific constraints can lead to a job not being scheduled if no nodes satisfy all the conditions.

- It's essential to balance the specificity of constraints with the broader needs and resources available in the cluster.
36 changes: 36 additions & 0 deletions docs/docs/setting-up/jobs/job-specification/input-source.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
sidebar_label: InputSource
---

# InputSource Specification

An `InputSource` defines where and how to retrieve specific artifacts needed for a [`Task`](task), such as files or data, and where to mount them within the task's context. This ensures the necessary data is present before the task's execution begins.

Bacalhau's `InputSource` natively supports fetching data from remote sources like S3 and IPFS and can also mount local directories. It is intended to be flexible for future expansion.

## `InputSource` Parameters:

- **Source** <code>(<a href="./spec-config">SpecConfig</a> : \<required\>)</code>: Specifies the origin of the artifact, which could be a URL, an S3 bucket, or other locations.

- **Alias** `(string: <optional>)`: An optional identifier for this input source. It's particularly useful for dynamic operations within a task, such as dynamically importing data in WebAssembly using an alias.

- **Target** `(string: <required>)`: Defines the path inside the task's environment where the retrieved artifact should be mounted or stored. This ensures that the task can access the data during its execution.

## Usage Examples
```YAML
InputSources:
- Source:
Type: s3
Params:
Bucket: my_bucket
Region: us-west-1
Target: /my_s3_data
- Source:
Type: localDirectory
Params:
SourcePath: /path/to/local/directory
ReadWrite: true
Target: /my_local_data
```
In this example, the first input source fetches data from an S3 bucket and mounts it at `/my_s3_data` within the task. The second input source mounts a local directory at `/my_local_data` and allows the task to read and write data to it.
48 changes: 48 additions & 0 deletions docs/docs/setting-up/jobs/job-specification/job.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
sidebar_label: Job
---

# Job Specification

A `Job` represents a discrete unit of work that can be scheduled and executed. It carries all the necessary information to define the nature of the work, how it should be executed, and the resources it requires.

```yaml
Type: batch
Count: 1
Priority: 50
Meta:
version: "1.2.5"
Labels:
project: "my-project"
Constraints:
- Key: Architecture
Operator: '='
Values:
- arm64
- Key: region
Operator: '='
Values:
- us-west-2
Tasks:
#...
```

## `job` Parameters
- **Name** `(string : <optional>)`: A logical name to refer to the job. Defaults to job ID.
- **Namespace** `(string: "default")`: The namespace in which the job is running. `ClientID` is used as a namespace in the public demo network.
- **Type** `(string: <required>)`: The type of the job, such as `batch`, `ops`, `daemon` or `service`. You can learn more about the supported jobs types in the [Job Types](/topic-guides/job-types) guide.
- **Priority** `(int: 0`): Determines the scheduling priority.
- **Count** `(int: <required)`: Number of replicas to be scheduled. This is only applicable for jobs of type `batch` and `service`.
- **Meta** <code>(<a href="./meta">Meta</a> : nil)</code>: Arbitrary metadata associated with the job.
- **Labels** <code>(<a href="./label">Label</a>[] : nil)</code>: Arbitrary labels associated with the job for filtering purposes.
- **Constraints** <code>(<a href="./constraint">Constraint</a>[] : nil)</code>: These are selectors which must be true for a compute node to run this job.
- **Tasks** <code>(<a href="./task">Task</a>[] : \<required\>)</code>:: Task associated with the job, which defines a unit of work within the job. Today we are only supporting single task per job, but with future plans to extend this.

## Server-Generated Parameters
The following parameters are generated by the server and should not be set directly.
- **ID** `(string)`: A unique identifier assigned to this job. It's auto-generated by the server and should not be set directly. Used for distinguishing between jobs with similar names.
- **State** <code>(<a href="./state">State</a>)</code>: Represents the current state of the job.
- **Version** `(int)`: A monotonically increasing version number incremented on job specification update.
- **Revision** `(int)`: A monotonically increasing revision number incremented on each update to the job's state or specification.
- **CreateTime** `(int)`: Timestamp of job creation.
- **ModifyTime** `(int)`: Timestamp of last job modification.
61 changes: 61 additions & 0 deletions docs/docs/setting-up/jobs/job-specification/label.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---

sidebar_label: Label
---

# Labels Specification

The `Labels` block within a `Job` specification plays a crucial role in Bacalhau, serving as a mechanism for filtering jobs. By attaching specific labels to jobs, users can quickly and effectively filter and manage jobs via both the Command Line Interface (CLI) and Application Programming Interface (API) based on various criteria.

## `Labels` Parameters

Labels are essentially key-value pairs attached to jobs, allowing for detailed categorizations and filtrations. Each label consists of a `Key` and a `Value`. These labels can be filtered using operators to pinpoint specific jobs fitting certain criteria.

### Filtering Operators

Jobs can be filtered using the following operators:

- `in`: Checks if the key's value matches any within a specified list of values.
- `notin`: Validates that the key's value isn’t within a provided list of values.
- `exists`: Checks for the presence of a specified key, regardless of its value.
- `!`: Validates the absence of a specified key. (i.e., DoesNotExist)
- `gt`: Checks if the key's value is greater than a specified value.
- `lt`: Checks if the key's value is less than a specified value.
- `= & ==`: Used for exact match comparisons between the key’s value and a specified value.
- `!=`: Validates that the key’s value doesn't match a specified value.

### Example Usage

Filter jobs with a label whose key is "environment" and value is "development":

```shell
bacalhau job list --labels 'environment=development'
```

Filter jobs with a label whose key is "version" and value is greater than "2.0":

```shell
bacalhau job list --labels 'version gt 2.0'
```

Filter jobs with a label "project" existing:

```shell
bacalhau job list --labels 'project'
```

Filter jobs without a "project" label:

```shell
bacalhau job list --labels '!project'
```

### Practical Applications

- **Job Management**: Enables efficient management of jobs by categorizing them based on distinct attributes or criteria.
- **Automation**: Facilitates the automation of job deployment and management processes by allowing scripts and tools to target specific categories of jobs.
- **Monitoring & Analytics**: Enhances monitoring and analytics by grouping jobs into meaningful categories, allowing for detailed insights and analysis.

## Conclusion

The `Labels` block is instrumental in the enhanced management, filtering, and operation of jobs within Bacalhau. By understanding and utilizing the available operators and label parameters effectively, users can optimize their workflow, automate processes, and achieve detailed insights into their jobs.
61 changes: 61 additions & 0 deletions docs/docs/setting-up/jobs/job-specification/meta.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
sidebar_label: Meta
---

# Meta Specification

In both the `Job` and `Task` specifications within Bacalhau, the `Meta` block is a versatile element used to attach arbitrary metadata. This metadata isn't utilized for filtering or categorizing jobs; there's a separate [`Labels`](./label) block specifically designated for that purpose. Instead, the `Meta` block is instrumental for embedding additional information for operators or external systems, enhancing clarity and context.

## `Meta` Parameters in Job and Task Specs

The `Meta` block is comprised of key-value pairs, with both keys and values being strings. These pairs aren't constrained by a predefined structure, offering flexibility for users to annotate jobs and tasks with diverse metadata.

### User-Defined Metadata

Users can incorporate any arbitrary key-value pairs to convey descriptive information or context about the job or task.

#### Example:

```json
"Meta": {
"project": "frontend",
"version": "1.2.5",
"owner": "team-alpha",
"environment": "development"
}
```

- **project**: Identifies the associated project.
- **version**: Specifies the version of the application or service.
- **owner**: Names the responsible team or individual.
- **environment**: Indicates the stage in the development lifecycle.

## Auto-Generated Metadata by Bacalhau

Beyond user-defined metadata, Bacalhau automatically injects specific metadata keys for identification and security purposes.

### Bacalhau Auto-Generated Keys:

- **bacalhau.org/requester.id**: A unique identifier for the orchestrator that handled the job.
- **bacalhau.org/requester.publicKey**: The public key of the requester, aiding in security and validation.
- **bacalhau.org/client.id**: The ID for the client submitting the job, enhancing traceability.

#### Example:

```json
"Meta": {
"bacalhau.org/requester.id": "QmfZwnVWYjHSchAVxJqXn18Bvd1cpG2ATRYceBBvUGZf2f",
"bacalhau.org/requester.publicKey": "CAASpgIwggEiMA0GCSqG...BcyEhfEZKnAgMBAAE=",
"bacalhau.org/client.id": "dfadea67ab6d8c65761c3d879119e11f157923036f945d969d19a51066dc663a"
}
```

### Implications and Utility

- **Identification**: The metadata aids in uniquely identifying jobs and tasks, connecting them to their originators and executors.

- **Context Enhancement**: Metadata can supplement jobs and tasks with additional data, offering insights and context that aren't captured by standard parameters.

- **Security Enhancement**: Auto-generated keys like the requester's public key contribute to the secure handling and execution of jobs and tasks.

While the `Meta` block is distinct from the [`Labels`](./label) block used for filtering, its contribution to providing context, security, and traceability is integral in managing and understanding the diverse jobs and tasks within the Bacalhau ecosystem effectively.
Loading

0 comments on commit 6dafb3a

Please sign in to comment.