Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Multithreading] PROXY_TO_PTHREAD & MAIN_THREAD_EM_ASM causes perf degradation #22570

Open
ravisumit33 opened this issue Sep 15, 2024 · 10 comments

Comments

@ravisumit33
Copy link
Contributor

ravisumit33 commented Sep 15, 2024

Please include the following in your bug report:

Version of emscripten/emsdk:

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.56 (cf90417346b78455089e64eb909d71d091ecc055)
clang version 19.0.0git (https:/github.com/llvm/llvm-project 34ba90745fa55777436a2429a51a3799c83c6d4c)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: ~/emsdk/upstream/bin

I am trying to port my application from single-threaded to multi-threaded environment. I cannot ensure max number of threads required at a time by my application, thus I finalized using PROXY_TO_PTHREAD. In single-threaded mode, my application used to work like below:

  1. C++ main function does some initialization. After main function exits, we keep the runtime alive.
  2. We have exposed a C++ function to process events coming from the UI.

To port this architecture into multi-threaded environment I used PROXY_TO_PTHREAD to create a proxied main thread and kept that thread alive for further processing. I used proxying to proxy events coming from UI to this detached thread. Once done, this thread called MAIN_THREAD_EM_ASM to send the response back to the main application thread. Also, this is the only MAIN_THREAD_EM_ASM that the detached thread does. Rest is C++ execution without waiting on anything else.

Functionality wise, this model worked well. But when doing performance analysis I figured out that I had a degradation of around 200-400 ms. Upon profiling, I could see that detached thread completed work in time but was waiting for around 200-400 ms for the MAIN_THREAD_EM_ASM to complete i.e. for main application thread to receive the response. Also, the main application thread was completey idle around this time. This can be seen in the below screenshot.

Untitled design (1)

Is this performance degradation expected? Is there any other way I could model my app to get away with this? How can I minimise the time taken by the detached thread to send back the response?

@ravisumit33 ravisumit33 changed the title [Multithreading] PROXY_TO_PTHREAD causes perf degradation [Multithreading] PROXY_TO_PTHREAD & MAIN_THREAD_EM_ASM causes perf degradation Sep 15, 2024
@ravisumit33
Copy link
Contributor Author

@sbc100 @kripken Any thought on this?

@sbc100
Copy link
Collaborator

sbc100 commented Sep 19, 2024

To be clear this is not some kind of regression? i.e. you are not claiming that some previous version of emscripten had a faster version of MAIN_THREAD_EM_ASM?

As far as I know there are no delays built into the proxying system. The call to MAIN_THREAD_EM_ASM should use a postMessage to wake the main which should then use a shared memory futex to wake the secondary thread once its done.

@tlively are you aware of any reason for such a delay?

@ravisumit33 perhaps you could share a example of simple program that demonstrates the delay you are talking about?

@sbc100
Copy link
Collaborator

sbc100 commented Sep 19, 2024

Are you doing anything on the main UI thread that is likely to be blocking it? i.e. are you doing synchronous proxying to your background thread? i.e. can you give more details on what you mean by "I used proxying to proxy events coming from UI to this detached thread"?

@ravisumit33
Copy link
Contributor Author

Sorry to not provide complete details about the issue. I am doing an async proxy to the detached thread. My main application thread isn't the main UI thread. I instantiate wasm in a web-worker.

@ravisumit33
Copy link
Contributor Author

To be clear this is not some kind of regression? i.e. you are not claiming that some previous version of emscripten had a faster version of MAIN_THREAD_EM_ASM?

As far as I know there are no delays built into the proxying system. The call to MAIN_THREAD_EM_ASM should use a postMessage to wake the main which should then use a shared memory futex to wake the secondary thread once its done.

@tlively are you aware of any reason for such a delay?

@ravisumit33 perhaps you could share a example of simple program that demonstrates the delay you are talking about?

I will try to reproduce in a simple program. Just to be clear, delay isn't in proxying from main application thread to the detached thread. Delay comes in receiving the response from the background (detached) thread which is sending the response back in a synchronous way (MAIN_THREAD_EM_ASM).

@sbc100
Copy link
Collaborator

sbc100 commented Sep 19, 2024

So you have the following JS contexts:

0: The main browser UI thread
1: The worker that starts your wasm program
2. The worker that runs the main function inside a pthread (due to PROXY_TO_PTHREAD).

Is that correct?

@sbc100
Copy link
Collaborator

sbc100 commented Sep 19, 2024

I instantiate wasm in a web-worker

I think think this aspect could be a clue, since its not the most common setup. Can you explain a little more about this setup? I assume you create this worker using the normal new Worker API and communicate with it solely through postMessage to/from the main UI browser thread? (i.e. the main UI browser thread doesn't do any shared memory stuff?)

@ravisumit33
Copy link
Contributor Author

Yes list of JS contexts is correct. I create the worker instantiating wasm using new Worker API as you mentioned and communicate with it through postMessage from the main UI browser thread. The main UI browser thread doesn't do any shared memory stuff.

@ravisumit33
Copy link
Contributor Author

ravisumit33 commented Sep 19, 2024

I have highlighted the delay in red rectangle below. As can be seen background thread (below one) is just wating till the main application thread (above one) has received the response. Also, main application thread is idle during the delay.
Untitled design (2)

@tlively
Copy link
Member

tlively commented Sep 20, 2024

Instead of using MAIN_THREAD_EM_ASM to communicate the results back, can you use emscripten_proxy_callback, emscripten_proxy_callback_with_ctx, emscripten_proxy_promise, or emscripten_proxy_promise_with_ctx? I don't know where the pause could be coming from, but these would be more direct methods of reporting the results.

An example program that demonstrates the issue would certainly be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants