Approach for enabling multi client connection #104

kdcyberdude · 2024-09-18T03:04:47Z

I'd like to explore the best approach for managing multi-client connections in both single and multi-GPU environments.

Often, GPUs are underutilized by a single client, especially when smaller models are in use (e.g., Wav2Vec 2.0 instead of Whisper), models are accessed via APIs (such as GPT-4), or clients remain idle for extended periods. In these cases, I believe it should be possible for multiple clients (at least 3-4) to connect simultaneously and more efficiently utilize the available GPU resources.

I want to discuss how to architect a system where a single model can handle inference requests from multiple clients concurrently, ensuring GPU resources are optimized.

My current thought is that each client should have its own dedicated VAD thread, while the STT, LLM, and TTS threads should be shared across clients. These shared threads could use a queue to handle pending requests, batching them together to process the next group efficiently.

I'd love to hear your thoughts on this approach or any potential improvements.

The text was updated successfully, but these errors were encountered:

NEALWE · 2024-09-21T23:31:01Z

I have the same idea, and i think we should move the vad part into client and change the stream form into audio file. That will be much more easy, right?

kdcyberdude · 2024-09-22T08:11:00Z

@NEALWE, this approach should work and be relatively straightforward to implement. However, there may be a potential trade-off in terms of latency.

Also, it’s important to note that this won’t follow a streaming model. Instead, it will function by making sequential calls to an endpoint for asr->llm->tts

NEALWE · 2024-09-24T06:16:08Z

I mean, you can add user_id into the input_sequences and keep it out. Then according the user_id to send the response to the correct client. However, the latency won't be handled very well, I've tried.

NEALWE · 2024-09-24T06:17:55Z

@kdcyberdude of course, you can run another python script and take over 2 more ports to serve another client, but it will be dangerous when there're a lot of people online.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Approach for enabling multi client connection #104

Approach for enabling multi client connection #104

kdcyberdude commented Sep 18, 2024 •

edited

Loading

NEALWE commented Sep 21, 2024

kdcyberdude commented Sep 22, 2024

NEALWE commented Sep 24, 2024

NEALWE commented Sep 24, 2024

Approach for enabling multi client connection #104

Approach for enabling multi client connection #104

Comments

kdcyberdude commented Sep 18, 2024 • edited Loading

NEALWE commented Sep 21, 2024

kdcyberdude commented Sep 22, 2024

NEALWE commented Sep 24, 2024

NEALWE commented Sep 24, 2024

kdcyberdude commented Sep 18, 2024 •

edited

Loading