New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Threading Model #388

Draft

PhilMiller wants to merge 1 commit into kokkos:main from PhilMiller:threading

Contributor

PhilMiller commented May 24, 2023 •

edited

Loading

Document what semantics we actually have around use of multiple threads calling Kokkos

The foundational principles I think we have are that

We should build on the C++ memory model, since we're directly interacting with it at points (particularly, View::operator() from host, and equivalent memory access in buffers that we deep_copy to/from)
An execution space instance is very similar to a C++ thread in terms of relevant semantics
All the stuff Kokkos does can be decomposed in terms of a few fundamental operations that play key semantic roles


          Very rough draft

1ff2d27

PhilMiller requested review from dalg24 and crtrott

May 24, 2023 00:40

PhilMiller assigned msimberg and masterleinad and unassigned msimberg and masterleinad

PhilMiller requested review from msimberg, masterleinad, nliber, nmm0 and keitaTN

May 24, 2023 00:40

Contributor Author

PhilMiller commented May 24, 2023

Tagged @msimberg because the HPX backend exposes fairly unique behaviors that I definitely don't understand, so it will probably require its own treatment here.

Tagged @keitaTN and @nmm0 because of their work (as I recall it) on describing an operational semantics for Kokkos

masterleinad reviewed

View reviewed changes

Contributor

masterleinad left a comment

I think the most interesting question to answer still is if we serialize kernels on all backends for the same execution space instance or not.

docs/source/multiple-threads.rst


		A multi-threaded program structured such that there is a happens-before relationship between each call to perform a Fundamental Operation will behave equivalently to a single-threaded program that performs the same sequence of Fundamental Operations. (Note: This is analogous to ``MPI_THREAD_SERIALIZED``)

		.. Do we actually want to guarantee that every Fundamental Operation is serializing? Should that just mean that we don't require call sites to have happens-before relationships, or should they also internally create such happens-before relationships? I.e. that the calling threads synchronize-with each other at those points?

Contributor

masterleinad May 24, 2023

That's a key question. My understanding is that we want to serialize parallel dispatch to the same execution space instance but I don't think we want to promise anything with respect to data access outside of kernels.

docs/source/multiple-threads.rst


		Global Synchronization creates a happens-before relationship between the completion of every Fundamental Operation on any Execution Space Instance that happens-before the Global Synchronization and the thread that performs the Global Synchronization.

		.. Should the above actually be synchronizes-with?

Contributor

masterleinad May 24, 2023

Is there really much of a difference when we talk about fence?

docs/source/multiple-threads.rst

+              * Managed Construction
+                Managed construction of a Kokkos View performs a *Memory Allocation*, potentially followed by a *Parallel Dispatch* to initialize the memory (depending on whether ``WithoutInitializing`` was passed), potentially followed by a *Synchronization* (if no execution space instance was passed, so that allocation and initialization *happen-before* any subsequent operation that may reference the ``View``'s memory').
+                .. Do we want that to be *Global Synchronization* or *Local Synchronization*?

Contributor

masterleinad May 24, 2023

We effectively do a device-wide (or at least execution space-wide) synchronization at the moment, see https://github.com/kokkos/kokkos/blob/5d81422daea73f5a2a69771cc0dfafc19f785003/core/src/Cuda/Kokkos_CudaSpace.cpp#L160-L205. The intent is to make sure that memory can't be accessed before allocation is complete and thus it should be (IMHO) enough to fence the active execution space instance on the current thread.

docs/source/multiple-threads.rst

Comment on lines +24 to +30

+              * *Initialization*
+              .. Not just Kokkos::init, but also whatever device-specific or thread-specific stuff we have Legion doing now
+              * *Finalization*
+              .. Ditto Initialization

Contributor

masterleinad May 25, 2023

Backends can still only be initialized or finalized once. I'm not quite sure if it's worth mentioning initialization/finalization then. At the very least, we need to clarify what we mean here (execution space instance initialization/finalization maybe sensible).

docs/source/multiple-threads.rst

Comment on lines +53 to +55

		* Data Access
		``View::operator()``, to memory that is accessible from the host.

Contributor

masterleinad May 25, 2023

Not quite sure if we want to promise anything about data access outside of kernels.

Contributor Author

PhilMiller Jun 7, 2023

I think we have to, or else we can't suitably address either usage of unmanaged views, or UVM

docs/source/multiple-threads.rst

Comment on lines +108 to +110

+              * Metadata Query
+              * Element Access
+                Element Access performs a Data Access operation.

Contributor

masterleinad May 25, 2023

Not quite sure if we need these.

docs/source/multiple-threads.rst

+              Backend-Specific Details
+              ------------------------
+              .. Local or Global synchronizations below?

Contributor

masterleinad May 25, 2023

It might be enough to group backends into synchronous and asynchronous backends clarifying that kernels submitted by multiple kernels are serialized (if we decide to make that promise).

docs/source/multiple-threads.rst

		* ``CUDA`` and ``HIP``

		* ``HPX``

Contributor

masterleinad May 25, 2023 •

edited by PhilMiller

Loading

We should talk more about parallel dispatch and the behavior of independent threads (without a happens-before relationship between them) accessing the same data.
Possibly also clarifying where we promise that dispatch implies fences (linking to API for parallel_for, parallel_reduce, parallel_scan).

cz4rs self-requested a review

June 23, 2023 07:42

masterleinad mentioned this pull request

Allow "parallel dispatch" to Serial execution space even within a parallel kernel kokkos/kokkos#223

Open

Contributor

ajpowelsnl commented Apr 1, 2024

TODO:

Champions: @masterleinad , @ajpowelsnl
Request reviews for:
- #4385
- #6051
- #6151
Discuss Kokkos "thread safety" concepts with team
Decide where to document "thread safety" behavior , using this PR as a starting point

ajpowelsnl assigned ajpowelsnl and masterleinad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

masterleinad masterleinad left review comments

dalg24 Awaiting requested review from dalg24

crtrott Awaiting requested review from crtrott

msimberg Awaiting requested review from msimberg

nliber Awaiting requested review from nliber

nmm0 Awaiting requested review from nmm0

keitaTN Awaiting requested review from keitaTN

cz4rs Awaiting requested review from cz4rs

At least 1 approving review is required to merge this pull request.

Labels

None yet