-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process sets - do we need a more precise definition for MPI - 5 #24
Comments
So things have been evolving with other groups - I can summarize them here in case it helps. The primary driver really is to dissociate processes (i.e., executing pieces of code) from resources (i.e., things that processes use to execute). Reason being that they wanted to define a process set and a collection of resources as separate entities, and then create various combinations of them at some later time. A process set is therefore defined ala PMIx - it is a given collection of application definitions, each application consisting of some specified number of instances of executable code. We allow the specification to be abstract - e.g, instead of saying that the process set consists of N instances of a given application executable, you can say that it consists of M instances per resource type. You can name them for ease of reference - and we do allow dynamic definitions. A resource set consists of a collection of allocated resources. The scheduler hands out a default resource set when the allocation is initially made, but the user may subdivide that into as many resource sets as they like. The user can also define abstract resource sets - i.e., resource sets that do not consist of specific resources (e.g., all of nodeA, 2 GPUs from nodeB) but instead specify an abstracted collection of resources (e.g., three dedicated nodes, 2 GPUs from a non-dedicated node). You can name these as well. A compute set is defined by combining a process set with a resource set - and as you'd expect, you can name these for reference. This defines an executable unit that can be launched by the RTE via something like the PMIx_Spawn API, where the RTE is responsible for mapping the process set onto the resource set. The final result is called a compute block - i.e., a compute set that has has been mapped, launched and is executing. Traditional programming models simply asked to launch a process set, usually described on the cmd line (as opposed to formally calling it a process set). PMIx_Spawn knows that a request to launch a process set with no specified resource set is equivalent to using the default resource set, so a mechanism for handling current codes is easy to support. In some programming models, it is really convenient to think in terms of compute sets as opposed to the traditional application since the compute set is capable of performing a complex task (e.g., modeling propagation of a specific crack), essentially acting like an object-based module. We are therefore exploring how to best reflect these definitions in PMIx - e.g., passing them to the PMIx_Spawn API for launch, or to the PMIx_Connect API to couple compute blocks together. All still in its infancy, so nothing written in stone. Hope that helps provide some thoughts for your discussion. |
My summary / perspective of the previous message (Thanks Ralph!):
Now to MPI: It's probably best to stick to the same abstract definition of PMIx since MPI shouldn't be about the particular computing resources used. There's a first shot: A "process set" refers to a set of executions of MPI processes. |
I think that makes sense. You might also want to define an "execution block" or "execution set" which would be the combination of the processes and the resources that are allocated for their use. Can't say "that they are using" as that can be ephemeral - but "allocated for their use" should have some clearer meaning to both MPI and the runtime. |
The MPI-4 standard does not define precisely what a process set is, but rather has this wording
and
An external agent (PMIx) being used by some MPI implementations defines a process set as:
In the PMIx world, there is not a direct association of system resources with a process set. System resources in PMIx land aren't defined in terms of process sets.
We may wish to bear this in mind as we consider MPI methods for requesting additional resources (in PMIx land an allocation).
The text was updated successfully, but these errors were encountered: