Improvements to Capability inference and task execution #32

Yadunund · 2024-01-22T13:54:55Z

Yadunund
Jan 22, 2024
Maintainer

@koonpeng originally drafted the idea below (copy pasting from elsewhere)

The Old (Current) Architecture

When the workcell orchestrator receive a Task, it looks at the type field to determine which capability is responsible for handling that task. Capabilities must call a function to register themselves as the handler for a task type. The task type also controls the behavior tree (BT) used to execute the task.

Problems

Because task handlers need to be registered by a capability, handling new task types always require a new capability.
- The (dummy) generic capability attempts to workaround this by registering all task types for the BTs it can find.
The BT is provided by the user and may contain BT nodes not registered by the capability that handles the task type.
- This is actually a side effect that allows the generic capability to work. Because the generic capability does not register any BT nodes, it must use BT nodes registered by other capabilities to do any meaningful work.
There is a circular "information flow" as capabilities determines the task types supported, but task types determines the BT used, which determines the capabilities required. Notice that a capability need to register itself as the handler for a task, but the BT used to execute the task may not even use that capability!

Capability 2.0

The new capability system attempts the solve the problems of the old system mainly by making the dependencies between task types, BTs and capabilities to be more strict and determined. In the new system,

Capabilities no longer register task types that it supports, they will only register BT nodes.
The task types supported are now solely based on existences of a BT with the same name.
- As a result, generic capability is no longer needed and will be removed.
The capabilities required to perform a task is determined by the BT nodes in the BT used to execute the task.
Capabilities can no longer depend on other capabilities. This is no longer needed as capabilities needed to perform a task is determined from the BT (before we need this as a task can only be registered to a single Capability).
- As a result, CapabilityGroup is not longer needed and will be removed.

Essentially, we break the link between capabilities and task types and make the information flow more straight forward. Breaking this link does make us lose some information needed by some of the features, the next few sections will go over the impact of this change and how we can workaround it.

How do we determine if a task can be performed?

In the old system, because a capability register itself as the handler for a task type, we can use that to determine which capability to check if a task can be performed and if we should fallback to the generic capability.

One way to implement that in the new system is to go through all the capabilities used in the BT and check that they can perform the task. However, this present a problem as a BT may only use a subset of the nodes registered by a capability. With the current way that Capability::can_perform_task is tied to the capability and not to the BT node, we cannot get a reliable answer. In the initial version of the new system, this function will be removed to keep the changes minimal, a workcell will always be able to perform a task if there is a corresponding BT. A workaround for the phase 2 POC is to have the MES node transform the task type depending on the label type needed. See the section on future plans for proposals on how we can implement this function more reliably.

Capability 2.1 and beyond

How do we determine if a task can be performed?

Instead of checking if a capability can perform a task, we need to check if a BT node can perform a task. In order to do that we can implement a new type of BT node, tentatively called BTNodeWithConstraint. It will contain a virtual method that returns if a task can be performed.

However, that is another consideration which is that sometimes whether a BT node can be performed cannot be determined only by the task data. In some cases, the input port of the node may be crucial in determining if it can be performed. For these cases, the only way to (more) reliably know if something can be performed is to dry run the BT. However, the downside is that this can be slow and since checking if a task can be performed is part of the bidding process, it may slow down the whole system.

Perhaps the better solution would be to leave the responsibility of such constraints checks to the user. We allow workcells to be launched with certain constraint keys defined in ROS params. A new constraint field in tasks will also be needed which lists the constraint keys required for the task. Then we simply just check that the constraint keys are met. We probably should implement something like this regardless of other solutions choosed as I can imagine some kind of deny list being very useful to users.

Input ports or task data?

In general, a BT node can receive inputs during either the task data or BT input ports. Mixing these sources can hurt compatibility between capabilities and led to more confusion.

The advantage of receive inputs from task data is that it simplifies writing the BT. However, the major disadvantage of this is that the structure of the data is fixed. Take DispenseItem BT node for example, if it receives the item to dispense from the task data, the data must be a string, what if we want to dispense multiple items? One way around it would be to allow an array, but then it stills needs to know which element in the array to use, this could be received from a input port. However, the problem with this is that we are leaving the ability to dispense multiple items in a task up to the author of the capability, if the BT node cannot destructure an array, then there is no way to create a task that dispense multiple items.

Another option would be to allow tasks to break down into sub tasks, in the first implementation, we can have a BT node that loads and run another BT depending on the sub task type, however the downside of this is that we cannot check if this sub task can be performed as this container BT node is not tied to any capability, more sophisticated solutions may involve recursively going down the list of sub tasks and discovering the capabilities used, and maybe even involve some bidding process with the sub tasks. This also ties into the future plan to unify system and workcell orchestrators.

On the other hand, we can instead standardize inputs to be received from BT ports. In order for that to work, we need to have a BT node that can destructure a yaml and expose it to the blackboard. This also allows us to break the dependency on the structure of task data, the data can be in any structure and it is up to the BT author to pass the correct field to the BT node. This also ties in to how do we determine if a task can be performed, since all of the inputs would come from BT input ports, we need to do a dry run to determine if a task can be performed.

Yadunund · 2024-01-22T13:55:20Z

Yadunund
Jan 22, 2024
Maintainer Author

I agree with some of the points here but would strongly prefer if BT nodes did not do any task validation. This would make them tied to hardware and potentially not reusable across deployments. I see BT nodes as building blocks that are application independent that simply perform an operation on the inputs to return an output. Most of our action nodes should just be packaging input data into ROS 2 msgs and making necessary action/service calls to get results.
Right now, many of our BT nodes rely on parsing work order data stored in the blackboard. This also adds to the reusability issue of the BT nodes. So would adding a virtual member to the BT node base class that all BT nodes must implement.

Here's my proposal to handle all of this:

BT plugin system

All BT nodes will be be additionally defined such that they can be instantiated via the BT plugin system (different from pluginlib). These BT nodes can be registered to the factory using this macro. See example from nav2.

Hardware registration will register BT nodes

We will implement a registration step for all hardware components in a workcell. They will register their presence with the workcell orchestrator and as part of the registration they will pass a list of BT plugin library names to the orchestrator. The orchestrator will then register these plugins similar to how nav2's BT engine does it as seen here, ie, by relying on the BT plugin system.
The workcell orchestrator maintains a list of all the hardware nodes that have registered. We could either reuse or enhance the IsTaskDoable service in a way that the workcell orchestrator will query all the registered hardware nodes to see if the task is doable (see below on how this check will be performed). The hardware nodes will thus receive the task data during this checking step also receive all the data in the form of a string payload field in any Service/Action request msgs.

How hardware will check capability

We will define two classes in nexus_capabilities:

CapabilityReasoner: A pure abstract class with a virtual member that will check whether a task is doable. Deployments can implement this class on a per hardware basis or have a single implementation for all hardware nodes. Either way, the member will parse task data to determine whether the task can be done.
HardwareNode: A class which itself derives LifecycleNode and which will inturn be derived by hardware facing nodes. This class is a concrete class which will setup all interfaces needed for registration. It will rely on two fixed ROS params for configuration:
bt_plugins: Which will retrieve a list of strings where each is the library name for a registered BT plugin.
reasoner_plugin: The fully qualified name of the CapabilityReasoner implementation to load via pluginlib and call configure().
This class will define a service callback for registration and checking if a task can be done. The former will pass the list of bt_plugins to the workcell orchestration which will register them as described above. The latter will implement a callback which will query the reasoner for task doability.

Hardware facing nodes will inherit HardwareNode regardless of the type of hardware they are interfacing with. ie, Both TransporterNode and RobotControllerNode inherit HardwareNode but each of these will load different plugins for communicating to the specific hardware. This way these nodes can be reused regardless of the hardware while also making both the communication plugin AND capability reasoning dynamically configurable via pluginlib.

We would get rid of the Capability base class that we have now.

How `task_type` in WorkOrder Request will be handled

Two approaches come to mind.

Treat task_type field purely as the name of BT to load. The actual capability checking will happen by calling IsTaskDoable service for each hardware node registered and passing the entire work order data to the hardware during this step. If the BT with name task_type exists and all the hardware can perform the task, the workcell says it can perform the task.

Con: It is possible that not all hardware needed for the BT nodes in the BT have registered but the orchestrator orchestrator will still respond favorably. The solution here would be to leave the responsibility to the SI to ensure BT names are unique within each workcell and map to the process that the workcell can perform.

Replace or augment task_type to represent a list of "service types". Eg task_type = {"manipulation", "gripper", "dispenser", "detection"}. The various hardware nodes would register their capabilities as one of the same categories (this could tie in the product catalog in the workcell builder). The workcell orchestrator will only check IsTaskDoable for all the hardware nodes if the registered capabilities matches the list in task_type. This step is the "service type checker", the IsTaskDoable service call would be the "constraint checker".

Orchestrators can load other BT nodes via the BT plugin system

All other decorator, sequence or action nodes can be loaded via the same BT plugin system. The orchestrator can have its own bt_plugins ROS param to load these as discussed above.
This way BT nodes can be reused across deployments. More importantly we can define custom BT nodes in other private pkgs but have the workcell load them without relying on any specific capability to do so. This would also address the need to always define a capability if we want to register certain BT nodes.

1 reply

koonpeng Jan 24, 2024
Collaborator

I'm still not quite sure if it is better to have hardware register themselves or have them be discoverable and have the orchestrator auto-register them. The downside of the former is that we naturally need an unregister endpoint as well. In any case, either of these will be a great additional.

Right now, many of our BT nodes rely on parsing work order data stored in the blackboard. This also adds to the reusability issue of the BT nodes. So would adding a virtual member to the BT node base class that all BT nodes must implement.

I agree with this, not giving BT nodes direct access to the task data is a good idea.

We will implement a registration step for all hardware components in a workcell. They will register their presence with the workcell orchestrator and as part of the registration they will pass a list of BT plugin library names to the orchestrator. The orchestrator will then register these plugins similar to how nav2's BT engine does it as seen here, ie, by relying on the BT plugin system.

I don't think it is a good idea to have the hardware control what BT libs are loaded, the hardware doesn't know what BT libs are available, and hardware nodes may be running on different PCs, making deployments a lot more complex. Also in the case that the hardware node is provided directly by the vendor, it would "lock" the BT lib that we can use, even in the case that the hardware node provides a way to configure this, I don't think it is a good idea to manage configs at the hardware level. Instead, I think hardware should be identifiable by their capabilities, then the workcell orchestrator chooses what BT libs to load based on some configuration. This way, we can onboard new hardware with zero config changes. Alternatively, the workcell orchestrator can just load BT libs without considering the hardware available, it would just error out when attempting to execute a BT node if there are no hardware that can handle it.

CapabilityReasoner: A pure abstract class with a virtual member that will check whether a task is doable. Deployments can implement this class on a per hardware basis or have a single implementation for all hardware nodes. Either way, the member will parse task data to determine whether the task can be done.

I think a deployment should not have to write any code. We should have a way to provide configs to customize the reasoner to some degree, only the most extreme case should we require a custom plugin. Also in most cases, deployments should not have to modify the reasoner when they change hardware.

Treat task_type field purely as the name of BT to load. The actual capability checking will happen by calling IsTaskDoable service for each hardware node registered and passing the entire work order data to the hardware during this step. If the BT with name task_type exists and all the hardware can perform the task, the workcell says it can perform the task.

I don't see a way to make this work in practice. In order for a task to be possible, all hardware must report that they can perform the task, even if the task is not related to the hardware. It is very reasonable for a workcell to perform a task without using all it's hardware. The only way I see this working is if the reasoner is tied to the workcell instead of the hardware, if so, it would then make more sense to make CapabilityReasoner it's own node instead of plugins to HardwareNode.

Replace or augment task_type to represent a list of "service types". Eg task_type = {"manipulation", "gripper", "dispenser", "detection"}. The various hardware nodes would register their capabilities as one of the same categories (this could tie in the product catalog in the workcell builder). The workcell orchestrator will only check IsTaskDoable for all the hardware nodes if the registered capabilities matches the list in task_type. This step is the "service type checker", the IsTaskDoable service call would be the "constraint checker".

In this alternative solution, how do we determine which BT to load?

I also don't quite understand the relationship between the reason and task data, we don't want BT nodes to parse task data as that would hurt re-usability. The would mean that the params we send to a hardware node may not be related to the task data so the reasoner cannot possibly determine if the task can be done.

e.g. With the BT <DispenseItem item="coke" />, the task data does not contain what item we want to dispense UNLESS we have the BT node parses task data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to Capability inference and task execution #32

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Improvements to Capability inference and task execution #32

Yadunund Jan 22, 2024 Maintainer

The Old (Current) Architecture

Problems

Capability 2.0

How do we determine if a task can be performed?

Capability 2.1 and beyond

How do we determine if a task can be performed?

Input ports or task data?

Replies: 1 comment · 1 reply

Yadunund Jan 22, 2024 Maintainer Author

Here's my proposal to handle all of this:

BT plugin system

Hardware registration will register BT nodes

How hardware will check capability

How task_type in WorkOrder Request will be handled

Orchestrators can load other BT nodes via the BT plugin system

koonpeng Jan 24, 2024 Collaborator

Yadunund
Jan 22, 2024
Maintainer

Replies: 1 comment 1 reply

Yadunund
Jan 22, 2024
Maintainer Author

How `task_type` in WorkOrder Request will be handled

koonpeng Jan 24, 2024
Collaborator