What is the recommended practice for dealing with functions that have multiple outputs (or list inputs)? #1589

Andrew-S-Rosen · 2023-04-21T20:59:45Z

Andrew-S-Rosen
Apr 21, 2023

Question

What is considered best practice for dealing with a multi-step electron lattice where you need to pass only part of the output from electron 1 to electron 2?

Example: Multiple Outputs

Consider the following scenario:

import covalent as ct


def add(val1, val2):
    return {"output":val1 + val2, "name":"add"}


def mult(val1, val2):
    return {"output":val1 * val2, "name":"mult"}


@ct.lattice
def workflow(val1, val2):
    job1 = ct.electron(add)
    job2 = ct.electron(mult)

    result1= job1(val1, val2)
    result2 = job2(result1["output"], val2)
    return result2


dispatch_id = ct.dispatch(workflow)(1, 2)
result = ct.get_result(dispatch_id, wait=True)
print(result)

This works just fine, but as you can see in the image below, there is a step in the diagram where covalent needs to .__getitem__. This is obviously a super cheap step, but I definitely want to be careful if (for instance) I set slurm to be my executor because I don't want to queue up a compute task that will take 0 seconds (although I'm not sure how I'd be able to get around that). Anyway, is this the recommended approach? I just wanted to check. Of course, the better solution altogether would probably be to make it so mult() takes a Dict as input and indexes it from within the function itself, but that can sometimes be less intuitive.

I also tried the following example to see if it made a difference simply out of curious. Perhaps unsurprisingly in hindsight, it made the "problem" worse because now two entries must be queried.

import covalent as ct


def add(val1, val2):
    return val1 + val2, "add"


def mult(val1, val2):
    return val1 * val2, "mult"


@ct.lattice
def workflow(val1, val2):
    job1 = ct.electron(add)
    job2 = ct.electron(mult)

    out1, _ = job1(val1, val2)
    result2 = job2(out1, val2)
    return result2


dispatch_id = ct.dispatch(workflow)(1, 2)
result = ct.get_result(dispatch_id, wait=True)
print(result)

The Underlying Issue

The problem arises due to two main requirements of the Covalent framework:

Electrons should be capable of returning any and all Python objects.
Lattice should act as a robust compiler for Electrons, without needing users to specify the number of output objects.
As a result, uncomputed operations on Electrons, such as sum([electron_output1,electron_output2...]) or electron_output['data'], must also be Electrons. This is because performing operations like summation on arbitrary Python objects after computing the electron_output futu…

View full answer

santoshkumarradha · 2023-05-02T18:25:50Z

santoshkumarradha
May 2, 2023
Maintainer

Hey @arosen93, finally getting back to this. I'm happy to provide a more comprehensive explanation. Let's dive into this interesting topic.

The Underlying Issue

The problem arises due to two main requirements of the Covalent framework:

Electrons should be capable of returning any and all Python objects.
Lattice should act as a robust compiler for Electrons, without needing users to specify the number of output objects.
As a result, uncomputed operations on Electrons, such as sum([electron_output1,electron_output2...]) or electron_output['data'], must also be Electrons. This is because performing operations like summation on arbitrary Python objects after computing the electron_output future requires unpickling them, handling potential errors, and dealing with environment dependencies. All of these factors make them suitable for being Electrons.

In older versions of Covalent, these Electrons used the default local executor, running on the same server. This meant that the server had to have all the dependencies of the Electron result objects, which can be a big ask if Covalent is running on a remote self-hosted server instead of locally.

Current Solution: The Workflow Executor

To address this issue, Covalent now uses a Lattice level parameter called workflow_executor. This parameter informs Lattice where to run these miscellaneous Electrons. The recommended setup is a low-compute machine with good dependencies at the workflow level (if not using local Covalent). However, there is a known bug related to arithmetic operations like the sum of Electrons, not respecting the workflow_executor which will be resolved soon.

How to Work Around the Problem

Short-term approach: One way users can avoid this issue is by using a custom data class returned by an Electron as a single object, which is then unpacked (or operated) within other Electrons rather than inside the Lattice. It's important to realize that any operation performed on an Electron inside a Lattice is actually being performed on the Future of the Electron, which the Lattice has no knowledge of, hence its always wise to defer these operations to the next Electron it will be connected to. Here's an example:

@ct.electron
def task1(data):
    X, y = data
    ...

@ct.electron
def gen_data(X, y):
    data = transform(X, y)
    return data

@ct.electron
def sum_y(data):
    return sum(data[1])

@ct.lattice
def workflow(X, y):
    data = gen_data(X, y)
    result = task1(data)
    result_sum = sum_y(data)
    ...

instead of

@ct.electron
def task1(X,y):
	...

@ct.electron
def gen_data(X,y):
	X,y=transform(X,y)
	return X,y

@ct.lattice
def workflow(X,y):
	X,y=gen_data(X,y) # This unpacking is an electron as we dont know how many objects are being returned, we need to iterate the unpickled result of (X,y)
	result=task1(X,y)
	result_sum=sum(y) # this again is converted to electron even though not specified by user as we need to unpickle and sum it up.
	....

Medium to long-term solution: Covalent will soon introduce the feature of "task packing," allowing users to pack multiple tasks to be shipped to the same executor instance. This will enable us to automatically in the background pack these trivial Electrons connected serially to be executed in the same executor as their parent's machine without using the workflow_executor. This is very ideal as the parent is supposed to have all the env requirements for these tasks to be unpickled and worked on as well. This feature will make the workflow_executor obsolete, allowing Covalent to handle everything automatically without requiring users to follow specific patterns. This is very important as unpickling large result objects may put stress on memory, sometimes requiring the workflow_executor to be a high-memory machine.

Hope this helps clear things on your end, if not, lets keep the discussion flowing.

@cjao anything to add more ?
@araghukas usually runs into this setup and can maybe give insights on how he uses it, if it is different than this patter.

2 replies

santoshkumarradha May 2, 2023
Maintainer

On a side note, the workflow_executor is also used for building dynamic workflows with sublattices 🌟

Andrew-S-Rosen May 2, 2023
Author

Thank you so much for this super clear reply! This all makes a lot of sense and I now understand what's going on underneath the hood (pretty much what I suspected). I hadn't seen the workflow_executor argument, but that pretty much solves it! And the longer term solution of task packing sounds even better. All very awesome to hear!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the recommended practice for dealing with functions that have multiple outputs (or list inputs)? #1589

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

What is the recommended practice for dealing with functions that have multiple outputs (or list inputs)? #1589

Andrew-S-Rosen Apr 21, 2023

Question

Example: Multiple Outputs

Related Question: List Inputs

The Underlying Issue

Replies: 1 comment · 2 replies

santoshkumarradha May 2, 2023 Maintainer

The Underlying Issue

Current Solution: The Workflow Executor

How to Work Around the Problem

santoshkumarradha May 2, 2023 Maintainer

Andrew-S-Rosen May 2, 2023 Author

Andrew-S-Rosen
Apr 21, 2023

Replies: 1 comment 2 replies

santoshkumarradha
May 2, 2023
Maintainer

santoshkumarradha May 2, 2023
Maintainer

Andrew-S-Rosen May 2, 2023
Author