How to run parallel pipelines #2097

eduardo-fp-romao · 2023-09-04T15:10:17Z

eduardo-fp-romao
Sep 4, 2023

Hello o/

I'm having a little problem how to configure the pipeline with the parallel. I have 4 custom processors: "employee_mapper", "employee_reporter", "info_mapper", "info_reporter". The processer "employee_reporter "needs to run after the "employee_mapper" (we pass the result to report) and the same for "info" but the "employee group" can run in parallel with the info. I tried something like this:

pipeline:
  threads: 4
  processors:
    - parallel:
        pipeline:
          processors:
            - type: employee_mapper
            - type: employee_reporter
        pipeline:
          processors:
            - type: info_mapper
            - type: info_reporter

Unfortunately I get this error:

Configuration file read error: ..\config\benthos.yaml: (18,1) yaml: unmarshal errors:
  line 23: mapping key "pipeline" already defined at line 19

How can I configure the pipeline?

Answered by mihaitodor

Sep 4, 2023

Hey @eduardo-fp-romao 👋 I think you're confusing a few things here. The pipeline is a top level config section which lets you run the sequence of processors under pipeline.processors in parallel against batches of messages emitted by your input(s) as dictated by pipeline.threads. If you don't configure any batching, then individual messages are considered batches of size 1. You can, additionally, use a parallel processor in your pipeline.processors which can have one or more child processors under the processors field and it also has a cap field which lets you control how many individual messages from a batch it should process in parallel.

In your case, if the employee_reporter processor …

View full answer

mihaitodor · 2023-09-04T17:52:26Z

mihaitodor
Sep 4, 2023
Collaborator

Hey @eduardo-fp-romao 👋 I think you're confusing a few things here. The pipeline is a top level config section which lets you run the sequence of processors under pipeline.processors in parallel against batches of messages emitted by your input(s) as dictated by pipeline.threads. If you don't configure any batching, then individual messages are considered batches of size 1. You can, additionally, use a parallel processor in your pipeline.processors which can have one or more child processors under the processors field and it also has a cap field which lets you control how many individual messages from a batch it should process in parallel.

In your case, if the employee_reporter processor needs to run after the employee_mapper processor, then it's fine to have them one after the other in pipeline.processors. The goal of parallelism in Benthos is to process messages in parallel rather than run multiple processors in parallel for a given message. There are ways to achieve that if needed, but I didn't need to do anything like that so far.

1 reply

eduardo-fp-romao Sep 5, 2023
Author

Oh... I think I understand, so basically for each batch of message I can execute multiple task. For me I have 1 message that will be mapped to multiple objects, so I can do something like this:

pipeline:
  threads: 4
  processors:
    - type: get_message (example,)
    - parallel:
          processors:
            - type: employee_mapper
            - type: info_mapper
    - type: employee_reporter
    - type: info_reporter

or even

pipeline:
  threads: 4
  processors:
    - type: get_message (example)
    - parallel:
          processors:
            - type: employee_mapper
            - type: info_mapper
    - parallel:
          processors:
            - type: employee_reporter
            - type: info_reporter

And than use the cache resource to pass the rights arguments to the each reports

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run parallel pipelines #2097

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How to run parallel pipelines #2097

eduardo-fp-romao Sep 4, 2023

Replies: 1 comment · 1 reply

mihaitodor Sep 4, 2023 Collaborator

eduardo-fp-romao Sep 5, 2023 Author

eduardo-fp-romao
Sep 4, 2023

Replies: 1 comment 1 reply

mihaitodor
Sep 4, 2023
Collaborator

eduardo-fp-romao Sep 5, 2023
Author