Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds bacalhau support #1

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Adds bacalhau support #1

wants to merge 5 commits into from

Conversation

rossjones
Copy link

@rossjones rossjones commented Jan 12, 2024

Adds a destination node for bacalhau that is able to submit jobs to a local requester node, using the orchestrator API.

Each instance of the destination node is configured with the location of the jobstore, a local directory containing .yaml files describing jobs. It should also be provided with the job name (one of the .yaml files) so that it might be configured with the following

jobstore: /tmp/jobstore
job: process

and will load /tmp/jobstore/process.yaml as the job specification. Once loaded the job specification can be templated using handlebars syntax, so that {{key}} will be replaced with values from the 'key' field in the input message.

TODO

  • Extract all fields from message in a hashmap for render/submit
  • Clean up logging so that we can see what executed

Summary by CodeRabbit

  • New Features

    • Introduced a new destination option "Bacalhau" for enhanced data processing capabilities.
    • Added functionality to submit and manage jobs through a new "Bacalhau" destination.
    • Enabled configuration settings for the "Bacalhau" destination in the example configuration file.
  • Documentation

    • Updated example configurations to illustrate how to set up the new "Bacalhau" destination.
  • Refactor

    • Updated dependencies and added new modules to support the "Bacalhau" functionality.
  • Chores

    • Updated .gitignore to exclude the .vscode/ directory.

Adds a destination node for bacalhau that is able to submit jobs to a
local requester node, using the orchestrator API.

Each instance of the destination node is configured with the location of
the jobstore, a local directory containing .yaml files describing jobs.
It should also be provided with the job name (one of the .yaml files)
so that it might be configured with the following

```
jobstore: /tmp/jobstore
job: process
```

and will load /tmp/jobstore/process.yaml as the job specification.  Once
loaded the job specification can be templated using handlebars syntax,
so that {{key}} will be replaced with values from the 'key' field in the
input message.
We really only expect one row in the incoming message and so we'll
convert that to a string-string hashmap to pass as the arguments for
the job submission.  We do this in a roundabout route
(dataframe->recordbatch->json->hashmap).
Copy link

coderabbitai bot commented Apr 12, 2024

Walkthrough

The recent updates introduce a new Bacalhau module within the system, enhancing data processing capabilities by adding a specific destination type. This includes constructing new configurations, updating dependencies, and improving the runtime setup. The changes span across multiple files, setting up structures, API interactions, and job handling mechanisms to integrate with the Bacalhau module effectively.

Changes

Files Changes
.gitignore Added .vscode/ directory to ignore list.
Cargo.toml, myceliald/Cargo.toml, .../bacalhau/Cargo.toml Updated dependencies and added new modules and crates.
common/src/lib.rs, myceliald/config.example.toml Added Bacalhau destination configurations and variants.
myceliald/src/constructors/..., myceliald/src/runtime.rs Introduced constructors and runtime setup for Bacalhau destination.
pipe/section/section_impls/bacalhau/src/... Developed Bacalhau section implementation including API, destination handling, and job store management.
pipe/section/section_impls/bacalhau/testdata/process.yaml Added new batch process configuration for task handling.

🐰✨🎉
A hop, a skip, a code deploy,
In the land of Git, there's much to enjoy.
With Bacalhau in tow, we set the stage,
For data tales to script on our page.
Let's munch on carrots, and cheer hooray,
For new code paths we pave today! 🥕🌟


Recent Review Details

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between e25fdc0 and d6426c2.
Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
Files selected for processing (14)
  • .gitignore (1 hunks)
  • Cargo.toml (1 hunks)
  • common/src/lib.rs (2 hunks)
  • myceliald/Cargo.toml (1 hunks)
  • myceliald/config.example.toml (1 hunks)
  • myceliald/src/constructors/bacalhau.rs (1 hunks)
  • myceliald/src/constructors/mod.rs (1 hunks)
  • myceliald/src/runtime.rs (1 hunks)
  • pipe/section/section_impls/bacalhau/Cargo.toml (1 hunks)
  • pipe/section/section_impls/bacalhau/src/api.rs (1 hunks)
  • pipe/section/section_impls/bacalhau/src/destination.rs (1 hunks)
  • pipe/section/section_impls/bacalhau/src/jobstore.rs (1 hunks)
  • pipe/section/section_impls/bacalhau/src/lib.rs (1 hunks)
  • pipe/section/section_impls/bacalhau/testdata/process.yaml (1 hunks)
Files skipped from review due to trivial changes (1)
  • pipe/section/section_impls/bacalhau/Cargo.toml
Additional comments not posted (13)
pipe/section/section_impls/bacalhau/src/lib.rs (2)

1-3: The module declarations for api, destination, and jobstore are correctly defined. This setup modularizes the functionality related to the bacalhau node, which is good for maintainability.


5-5: The StdError type alias is well-defined, providing a common error type across the module. This is useful for error handling consistency.

pipe/section/section_impls/bacalhau/testdata/process.yaml (1)

1-10: The YAML structure for the job configuration is well-defined with clear and necessary fields such as Name, Type, Namespace, Count, and Tasks. This structure should facilitate easy job specification and modification.

myceliald/src/constructors/mod.rs (1)

1-1: The addition of the bacalhau module to the list of public modules in constructors/mod.rs is correctly implemented. This change is necessary for integrating the new bacalhau functionality into the system.

.gitignore (1)

16-16: The addition of the .vscode/ directory to the .gitignore file is appropriate. This prevents user-specific settings from being tracked in version control, which is a best practice.

Cargo.toml (1)

9-9: The addition of the pipe/section/section_impls/bacalhau module to the workspace members in Cargo.toml is correctly implemented. This ensures that the new module is recognized as part of the project's workspace, which is necessary for build and dependency management.

myceliald/src/constructors/bacalhau.rs (1)

6-28: The destination_ctor function is well-implemented with robust error handling. It correctly checks for the existence of the jobstore directory and handles potential configuration errors. This function is crucial for setting up the bacalhau destination with the necessary configurations.

pipe/section/section_impls/bacalhau/src/api.rs (1)

24-49: The submit function is well-implemented with comprehensive error handling and proper use of asynchronous programming. The use of reqwest for making HTTP requests and handling responses is appropriate. This function is essential for submitting jobs to the orchestrator API.

myceliald/Cargo.toml (1)

14-21: The updates to the dependencies in myceliald/Cargo.toml are correctly implemented. The addition of the thiserror crate is appropriate for error handling, and the modifications to the pipe dependency ensure that the project uses the correct versions. These changes are necessary for the new functionality introduced by the bacalhau node.

myceliald/src/runtime.rs (1)

18-21: The addition of "bacalhau_destination" to the registry is correctly implemented and follows the established pattern for registering section constructors.

myceliald/config.example.toml (1)

91-97: The configuration for the "Bacalhau" destination is correctly added and follows the established pattern. However, consider verifying the path format (c:/temp/jobstore) for cross-platform compatibility if the system is expected to run on different operating systems.

Verification successful

The configuration for the "Bacalhau" destination in the config.example.toml file is correctly added and follows the established pattern. The use of a Windows-style path (c:/temp/jobstore) is confirmed to be present in the example configuration. It is recommended to verify the path format for cross-platform compatibility if the system is expected to run on different operating systems.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Check if there are any platform-specific configurations that might affect the Bacalhau destination.
rg --type toml 'jobstore' | grep 'c:/'

Length of output: 96

pipe/section/section_impls/bacalhau/src/destination.rs (1)

34-41: The submit_job method in Bacalhau is well-implemented with appropriate error handling for the rendering process and job submission. This method ensures that errors are properly managed and propagated.

common/src/lib.rs (1)

164-170: The BacalhauDestinationConfig struct is correctly defined and includes all necessary fields for configuring the Bacalhau destination. The struct follows the established pattern and uses appropriate serde attributes.


Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Comment on lines +19 to +49
impl JobStore {
pub fn new(jobstore: impl Into<String>) -> Result<Self, StdError> {
let path = PathBuf::from(jobstore.into());
let entries = path.read_dir()?;

let mut handlebars = Handlebars::new();
for e in entries {
let entry = e?;
let meta = entry.metadata()?;
if !meta.is_file() {
continue;
}

let entryname = entry.file_name();
let fname = entryname.to_str().unwrap();
if fname.ends_with("yaml") || fname.ends_with("yml") {
let p = entry.path();
let name = p.file_stem().unwrap();

handlebars.register_template_string(
name.to_str().unwrap(),
&*fs::read_to_string(entry.path())
.expect("should have been able to read the file"),
)?;
}
}

Ok(Self {
templates: handlebars,
})
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider handling potential errors more gracefully in the new method of JobStore. Currently, the method uses unwrap and expect, which could lead to panics if the data is not as expected. It would be safer to return an error in these cases.

- let fname = entryname.to_str().unwrap();
- let name = p.file_stem().unwrap();
+ let fname = entryname.to_str().ok_or("Filename not valid UTF-8")?;
+ let name = p.file_stem().ok_or("File stem not available")?;

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
impl JobStore {
pub fn new(jobstore: impl Into<String>) -> Result<Self, StdError> {
let path = PathBuf::from(jobstore.into());
let entries = path.read_dir()?;
let mut handlebars = Handlebars::new();
for e in entries {
let entry = e?;
let meta = entry.metadata()?;
if !meta.is_file() {
continue;
}
let entryname = entry.file_name();
let fname = entryname.to_str().unwrap();
if fname.ends_with("yaml") || fname.ends_with("yml") {
let p = entry.path();
let name = p.file_stem().unwrap();
handlebars.register_template_string(
name.to_str().unwrap(),
&*fs::read_to_string(entry.path())
.expect("should have been able to read the file"),
)?;
}
}
Ok(Self {
templates: handlebars,
})
}
impl JobStore {
pub fn new(jobstore: impl Into<String>) -> Result<Self, StdError> {
let path = PathBuf::from(jobstore.into());
let entries = path.read_dir()?;
let mut handlebars = Handlebars::new();
for e in entries {
let entry = e?;
let meta = entry.metadata()?;
if !meta.is_file() {
continue;
}
let entryname = entry.file_name();
let fname = entryname.to_str().ok_or("Filename not valid UTF-8")?;
if fname.ends_with("yaml") || fname.ends_with("yml") {
let p = entry.path();
let name = p.file_stem().ok_or("File stem not available")?;
handlebars.register_template_string(
name.to_str().unwrap(),
&*fs::read_to_string(entry.path())
.expect("should have been able to read the file"),
)?;
}
}
Ok(Self {
templates: handlebars,
})
}
}

Comment on lines +51 to +63
pub fn render(&self, name: String, data: &HashMap<String, String>) -> Result<String, StdError> {
let text = self.templates.render(&name, &json!(data))?;
let job: Value = serde_yaml::from_str(&text).unwrap();

let request = PutJobRequest {
job,
idempotency_token: None,
namespace: None,
headers: None,
};

Ok(serde_json::to_string(&request).unwrap())
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the render method of JobStore, consider handling the potential error from YAML parsing instead of using unwrap. This change would make the method more robust by properly handling cases where the rendered template is not valid YAML.

- let job: Value = serde_yaml::from_str(&text).unwrap();
+ let job: Value = serde_yaml::from_str(&text).map_err(|e| StdError::from(e))?;

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
pub fn render(&self, name: String, data: &HashMap<String, String>) -> Result<String, StdError> {
let text = self.templates.render(&name, &json!(data))?;
let job: Value = serde_yaml::from_str(&text).unwrap();
let request = PutJobRequest {
job,
idempotency_token: None,
namespace: None,
headers: None,
};
Ok(serde_json::to_string(&request).unwrap())
}
pub fn render(&self, name: String, data: &HashMap<String, String>) -> Result<String, StdError> {
let text = self.templates.render(&name, &json!(data))?;
let job: Value = serde_yaml::from_str(&text).map_err(|e| StdError::from(e))?;
let request = PutJobRequest {
job,
idempotency_token: None,
namespace: None,
headers: None,
};
Ok(serde_json::to_string(&request).unwrap())
}

Comment on lines +26 to +31
impl Bacalhau {
pub fn new(job: impl Into<String>, jobstore: impl Into<String>) -> Self {
Self {
job: job.into(),
jobstore: JobStore::new(jobstore).expect("should be able to create jobstore"),
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new method of Bacalhau, consider modifying the method to return an error instead of using expect. This change would make the method more robust by properly handling cases where the jobstore cannot be created.

- jobstore: JobStore::new(jobstore).expect("should be able to create jobstore"),
+ jobstore: JobStore::new(jobstore)?,

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
impl Bacalhau {
pub fn new(job: impl Into<String>, jobstore: impl Into<String>) -> Self {
Self {
job: job.into(),
jobstore: JobStore::new(jobstore).expect("should be able to create jobstore"),
}
impl Bacalhau {
pub fn new(job: impl Into<String>, jobstore: impl Into<String>) -> Self {
Self {
job: job.into(),
jobstore: JobStore::new(jobstore)?,
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant