Skip to content

Commit 3cda641

Browse files
committed
Added new document for the runtime, and added comments.
1 parent 046e113 commit 3cda641

File tree

6 files changed

+1006
-0
lines changed

6 files changed

+1006
-0
lines changed

docs/Runtime.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Glow Runtime
2+
3+
The Glow runtime is responsible for handling adding and running networks on Glow.
4+
Below is a high level view of the runtime architecture. It consists of five key components: HostManager, DeviceManager, Partitioner, Provisioner, and Executor.
5+
6+
# Data Structures:
7+
There are a few key data structures used by the runtime and can be found in RuntimeTypes.h, they are discussed below.
8+
9+
### DeviceInfo:
10+
This structure contains information about the device that the partitioner will use to partition a network. This inlcudes things like available memory and computation capability.
11+
12+
### DeviceConfig:
13+
A base class used in configuring a DeviceManager. It is meant to contain information that allows the DeviceManager to uniquely identify the device and initialize it.
14+
15+
### DAG
16+
When a network is partitioned, its partitions and their relations are modeled in a DAG. The DAG contains the information for the entire network.
17+
18+
### DAGNode
19+
The DAGNode is a single node in the DAG and contains everything needed to run a partition and model the partitions's dependencies.
20+
21+
# Components
22+
The runtime is composed of a few key elements: HostManager, Partitioner, Provisioner, Executor, and DeviceManager.
23+
24+
![](glow_runtime.svg)
25+
26+
### Host Manager:
27+
The HostManager is the container for the other components. It serves as the interface externally, handling network init and run requests. The HostManager routes a request through the other components and stores the executionDAG for each network family member.
28+
29+
### Partitioner:
30+
31+
This component is responsible for dividing up the provided network into sub-networks that can be run on multiple devices. It receives a Module and DeviceInfo from the HostManager and lowers the module. Then it does the partitioning based on hardware constraints and heuristics to optimize execution time. It outputs a list of executionDAGs, one per network.
32+
33+
### Provisioner:
34+
35+
The Provisioner takes in the list of executionDAG and assigns sub-function to specific devices. The Provisioner compiles each sub-function and stores them in a map, it then passes a pointer to the compiledFunction and a Module reference to the DeviceManager to initialize the function on the device. It fills in the remaining fields of the excecutionDAGs and returns them in a list to the HostManager.
36+
37+
### Executor:
38+
39+
The Executor handles the execution of the network. It walks the executionDAG calling execution of each sub network in accordance with their dependencies. This handles allocation of contexts for the sub-networks, and moving one network's outputs to the inputs of another network.
40+
41+
### Device Manager:
42+
43+
The DeviceManager is an abstraction for the device which runs on the host. The manager handles initializing the device, collecting constants and preparing the device for execution, and executing a network on the device. It also handles unloading networks from the device. There is a backend specific DeviceManager per backend type.
44+
45+
# Device API:
46+
47+
There is a pairing between the backend and the device manager. The backend is provided a module which contains the computation graph. It returns a backend specific compiled function which inherits from compiledFunction and contains the instructions to run the network on the device. The compiledFunction will serve as a container/abstraction for device specific executable code. The specific format is left to the backend implementation.
48+
49+
The backend specific device manager inherits from DeviceManager and consumes the compiledFunction generated by its matching backend. The DeviceManager knows how to copy the device specific code to the device and initialize it for execution. The DeviceManager also handles execution which means loading the inputs to the device waiting for the device to signal that the outputs are ready, and copying outputs from the device.
50+
51+
![](backend_dm_api.svg)
52+
53+
# Network Initialization:
54+
Below we have a diagram illustrating the process of adding a network to the Runtime.
55+
- A Module containing some functions to be added, is provided to the HostManager.
56+
- The HostManager passes this Module along to the Partitioner which partitions the network.
57+
- The DAGs output from the Partitioner are then passed to the Provisioner which handles actual device allocation.
58+
- The DeviceManager handles preparing the device to run the function. This includes allocating memory on the device, copying constants from the Module to the device, and loading the function on the device.
59+
60+
![](network_init.svg)
61+
62+
# Network Execution:
63+
For execution we have a similar diagram stepping through the network execution process.
64+
- The HostManager is provided a network name and ExecutionContext.
65+
- The HostManager passes the ExecutionContext on to the Executor, along with the DAG for the network.
66+
- The Executor calls into the DeviceManager to kick off execution for all partitions which have no unment dependancies. It procedes to walk the DAG and execute each partition as its dependancies are met.
67+
68+
![](network_run.svg)

0 commit comments

Comments
 (0)