Draft: Create workflow_v2 processor #16

jem-davies · 2024-06-02T14:10:31Z

This Draft Pull Request is to request interim feedback on the proposed solution to the issue #17

A new Benthos Processor workflow_v2 has been created here.

Some example benthos config files have been included in ./benthos/cmd/benthos.

config_old.yaml - runs the example DAG from the workflow processor docs using the workflow processor
config.yaml - runs the example DAG from the workflow processor docs using the new workflow_v2 processor
The config_old.yaml will output:

INFO STAGE A FINISHED   
INFO STAGE C FINISHED   
INFO STAGE B FINISHED   
INFO STAGE E FINISHED   
INFO STAGE D FINISHED   
INFO STAGE F FINISHED

The config.yaml will output:

INFO STAGE A FINISHED   
INFO STAGE C FINISHED   
INFO STAGE E FINISHED   
INFO STAGE F FINISHED   
INFO STAGE B FINISHED   
INFO STAGE D FINISHED

Notice that E will start / finish before B is started.

I would greatly appreciate some feedback regarding the proposed solution before I spend more time fixing some of the things that I know that I need to do:

unit tests
writing docs
implementing the ability to restart DAG execution at particular places described on the workflow documentation under structured metadata
Currently there is no logic to process batches > 1 .

Also I am a quite new to Go - so could well have made some newbie mistakes.

Specific questions that I have are:

Should this be a new processor i.e V2 or should the existing workflow be altered?
The ability to infer the DAG from the request_map fields of the branch - Is this something that the V2 needs to have?

Signed-off-by: Jem Davies <[email protected]>

internal/impl/pure/processor_workflow_v2.go

Signed-off-by: Jem Davies <[email protected]>

gregfurman

Looking great, Jem! Very happy with the changes + documentation for this. Couple of things:

Remove all those test flles in the cmd/bento directory
Some grammar/phrasing nits
Small concern re: the channel + goroutine batching functionality -- am worried this could introduce a memory leak
One or two concerns about race conditions when operating on branches concurrently

gregfurman · 2024-07-09T10:23:40Z

internal/impl/pure/processor_workflow_branch_map_v2.go

+func (w *workflowBranchMapV2) Close(ctx context.Context) error {
+	for _, c := range w.Branches {
+		if err := c.Close(ctx); err != nil {
+			return err
+		}
+	}
+	return nil
+}


Should we maybe be using a mutex here?

Also, am thinking we should try close all branches as opposed to returning an error early if a single Close() fails -- concerned about memory leaks.

gregfurman · 2024-07-09T10:24:48Z

internal/impl/pure/processor_workflow_branch_map_v2.go

+
+// Locks all branches contained in the branch map and returns the latest DAG, a
+// map of resources, and a func to unlock the resources that were locked. If
+// any error occurs in locked each branch (the resource is missing, or the DAG


Suggested change

// any error occurs in locked each branch (the resource is missing, or the DAG

// an error occurs in any locked branch (the resource is missing, or the DAG

gregfurman · 2024-07-09T10:27:52Z

internal/impl/pure/processor_workflow_branch_map_v2.go

+
+func validateDAG(graph map[string][]string) bool {
+	// Status maps to track the state of each node:
+	// 0 = unvisited, 1 = visiting, 2 = visited


nit: consider making these values enums. IMO it'll make he function easier to follow

const ( Unvisited NodeState = iota Visiting Visited )

gregfurman · 2024-07-09T10:29:14Z

internal/impl/pure/processor_workflow_v2.go

+		Description(`
+## workflow vs workflow_v2
+
+The workflow_v2 processor is an evolution of the original `+"[`workflow` processor][processors.workflow]"+`. The two key differences are: a change to the way the topology of branch processors are defined & an enhancement that increases the parallelism of the DAG execution. Also, the original workflow processor has some features such as: implicitly creating the DAG based upon the request_map & result_map field of the branch processors, which have been dropped in workflow_v2. 


nit: believe "and" makes more sense here than "&" but feel free to ignore this lol

Suggested change

The workflow_v2 processor is an evolution of the original `+"[`workflow` processor][processors.workflow]"+`. The two key differences are: a change to the way the topology of branch processors are defined & an enhancement that increases the parallelism of the DAG execution. Also, the original workflow processor has some features such as: implicitly creating the DAG based upon the request_map & result_map field of the branch processors, which have been dropped in workflow_v2.

The workflow_v2 processor is an evolution of the original `+"[`workflow` processor][processors.workflow]"+`. The two key differences are: a change to the way the topology of branch processors are defined and an enhancement that increases the parallelism of the DAG execution. Also, the original workflow processor has some features such as: implicitly creating the DAG based upon the request_map and result_map field of the branch processors, which have been dropped in workflow_v2.

gregfurman · 2024-07-09T10:31:06Z

internal/impl/pure/processor_workflow_v2.go

+				Description("A [dot path](/docs/configuration/field_paths) indicating where to store and reference structured metadata about the workflow_v2 execution.").
+				Default("meta.workflow_v2"),
+			service.NewObjectMapField(wflowProcFieldBranchesV2, workflowv2BranchSpecFields()...).
+				Description("An object of named [`branch` processors](/docs/components/processors/branch) that make up the workflow_v2."))


Suggested change

Description("An object of named [`branch` processors](/docs/components/processors/branch) that make up the workflow_v2."))

Description("An object named [`branch` processors](/docs/components/processors/branch) that make up the workflow_v2."))

gregfurman · 2024-07-09T10:51:43Z

internal/impl/pure/processor_workflow_v2.go

+			resultsBodge := make([]*message.Part, msg.Len())
+			xxx := mssge.results[0]
+			resultsBodge[mssge.mssgPartID] = xxx[0]


nit: I assume these variables are going to be renamed 🫠

gregfurman · 2024-07-09T11:01:47Z

internal/impl/pure/processor_workflow_v2.go

+
+	batchResultChan := make(chan collector)
+
+	go func() {


Can this functionality be placed in a separate function?

gregfurman · 2024-07-09T11:02:34Z

internal/impl/pure/processor_workflow_v2.go

+
+		branchMsg, branchSpans := tracing.WithChildSpans(w.tracer, eid, propMsg.ShallowCopy())
+
+		go func(i int, id string) {


Some comment as above. Maybe place goroutine code in separate function

gregfurman · 2024-07-09T11:12:28Z

internal/impl/pure/processor_workflow_v2.go

+
+	go func() {
+		for {
+			mssge := <-batchResultChan


Think this may cause a memory leak since this channel is never closed. The <-batchResultChan will always block and cause the loop to never finish, and the goroutine to never clear.

I think you should create another channel called doneCh and run a for-select loop, where the function returns if/when doneCh <- true is called else the rest of the function runs as normal in the <-batchResultChan case.

gregfurman · 2024-07-09T11:21:33Z

cmd/bento/config.yaml

I imagine all these files should be discarded/removed from your branch

Signed-off-by: Jem Davies <[email protected]>

add workflow_v2 processor

09785fa

Signed-off-by: Jem Davies <[email protected]>

jem-davies mentioned this pull request Jun 2, 2024

Workflow Processor - DAG Execution Ordering #17

Open

jem-davies marked this pull request as draft June 2, 2024 14:16

jem-davies and others added 14 commits June 6, 2024 21:08

fix ./resources/docker/streams_mode

5e3dd9f

Signed-off-by: Jem Davies <[email protected]>

Merge branch 'warpstreamlabs:main' into main

2d42bc7

fix: remove /v4/ from imports

f019258

Signed-off-by: Jem Davies <[email protected]>

add config_batch.yaml + input.json for running in a batch

c79ea5d

Signed-off-by: Jem Davies <[email protected]>

refactor branchSpecFieldsV2

ae778dc

Signed-off-by: Jem Davies <[email protected]>

go mod tidy

0e7ebdb

Signed-off-by: Jem Davies <[email protected]>

create workflow_v2.md

fde1d3c

Signed-off-by: Jem Davies <[email protected]>

rename func branchSpecFieldsV2 -> workflowBranchSpecFields

727b531

Signed-off-by: Jem Davies <[email protected]>

refactor resutlTrackerV2 for mulitple batches

19057ae

Signed-off-by: Jem Davies <[email protected]>

fix last commit

a71ef46

Signed-off-by: Jem Davies <[email protected]>

fix last commit

f6e5476

Signed-off-by: Jem Davies <[email protected]>

workflow_v2 working with multiple messages in a batch

449ec80

Signed-off-by: Jem Davies <[email protected]>

add skip functionality

6f208eb

Signed-off-by: Jem Davies <[email protected]>

fix skip functionality

e53bfe7

Signed-off-by: Jem Davies <[email protected]>

jem-davies force-pushed the main branch from b1387e7 to e53bfe7 Compare June 29, 2024 22:02

jem-davies and others added 2 commits June 29, 2024 23:17

remove sleep, add todo

d146c08

Signed-off-by: Jem Davies <[email protected]>

Merge branch 'warpstreamlabs:main' into main

73183ac

jem-davies commented Jun 30, 2024

View reviewed changes

internal/impl/pure/processor_workflow_v2.go Outdated Show resolved Hide resolved

jem-davies and others added 8 commits July 7, 2024 16:02

fix ToObjectV2 failed

dd0601f

Signed-off-by: Jem Davies <[email protected]>

working workflow_v2

db68b45

Signed-off-by: Jem Davies <[email protected]>

add bool to outputs of getAReadyBranch

0efeb57

Signed-off-by: Jem Davies <[email protected]>

update comments

d9c17fe

Signed-off-by: Jem Davies <[email protected]>

add v2 to workflow references

61acddc

Signed-off-by: Jem Davies <[email protected]>

add docs for workflow_v2

242b6fe

Signed-off-by: Jem Davies <[email protected]>

Merge branch 'warpstreamlabs:main' into main

fb801fd

add checks to DAG definition

6b256e8

Signed-off-by: Jem Davies <[email protected]>

gregfurman requested changes Jul 9, 2024

View reviewed changes

jem-davies and others added 4 commits July 9, 2024 22:35

add some unit tests for workflow_v2

82899ad

Signed-off-by: Jem Davies <[email protected]>

remove tracing

b8ada34

Signed-off-by: Jem Davies <[email protected]>

fix unit test

2b80360

Signed-off-by: Jem Davies <[email protected]>

Merge branch 'warpstreamlabs:main' into main

aa04f5f

jem-davies closed this Sep 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Create workflow_v2 processor #16

Draft: Create workflow_v2 processor #16

jem-davies commented Jun 2, 2024 •

edited

Loading

gregfurman left a comment

gregfurman Jul 9, 2024

gregfurman Jul 9, 2024

gregfurman Jul 9, 2024

gregfurman Jul 9, 2024

gregfurman Jul 9, 2024

gregfurman Jul 9, 2024

gregfurman Jul 9, 2024

gregfurman Jul 9, 2024

gregfurman Jul 9, 2024

gregfurman Jul 9, 2024

	// any error occurs in locked each branch (the resource is missing, or the DAG
	// an error occurs in any locked branch (the resource is missing, or the DAG

	Description("An object of named [`branch` processors](/docs/components/processors/branch) that make up the workflow_v2."))
	Description("An object named [`branch` processors](/docs/components/processors/branch) that make up the workflow_v2."))


		branchMsg, branchSpans := tracing.WithChildSpans(w.tracer, eid, propMsg.ShallowCopy())

		go func(i int, id string) {

Draft: Create workflow_v2 processor #16

Draft: Create workflow_v2 processor #16

Conversation

jem-davies commented Jun 2, 2024 • edited Loading

gregfurman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jem-davies commented Jun 2, 2024 •

edited

Loading