Seeking for hep about performance tuning #915

alanhe · 2022-02-18T12:29:22Z

alanhe
Feb 18, 2022

Hi,

I'm writing a service to convert data pushed by clients in bidirectional streaming mode.

My code is like:

#[async_trait::async_trait]
impl XXXFacade for XXXServer {
    type ResponseStream = Pin<Box<dyn Stream<Item = Result<Response, Status>> + Send>>;
    async fn handle_request(
        &self,
        request: tonic::Request<tonic::Streaming<Response>>,
    ) -> Result<tonic::Response<Self::ResponseStream>, tonic::Status> {
        tokio::spawn(async move {
            let stream = request.into_inner();
            while let Some(req) = stream.next().await {
                // process request obj.
            }
            Ok::<_, anyhow::Error>(())
        });

        Ok(Response::new(
            Box::pin(client_rx) as Self::routeRequestStream
        ))
    }
}

tokio::spawn(async move {
    Server::builder()
        .timeout(Duration::from_secs(30))
        .add_service(XXXServer::new(self))
        .serve_with_incoming_shutdown(
            TcpListenerStream::new(listener),
            shutdown_rx.map(drop),
        )
        .await
        .unwrap();
});

My problem is: I'm not able to get full CPU usage.

I set up a few client pods and generate a total of 100~200 connections to push as much synthetic mock data as possible. When I run the service on a pod of 4-core, the maximum CPU usage is around 300%~350%. I'm not able to take it further.
Then, I switch it to a 8-core pod, the maximum CPU usage stuck at 300%~350%. The maximum throughput does not change much.

When I check top -H -p 1. I see tokio starts 8 tokio-runtime-workers and the work load spread evenly on the workers.

I'm expecting 800% CPU usage on a 8-core pod for my test. I don't know how to.

Are there any settings I can try?

I'm using the latest version of everything:

rust 1.58.1
tokio 1.17.0
tonic 0.6.2
hyper 0.14.17

Answered by alanhe

Feb 21, 2022

Unfortunately, the off-cpu perf I get isn't very useful.

I keep the clients settings the same, and change the server code.

when I delete everything in // process request obj. (from code sample above, to poll requests and simply drops them), the throughput can go as high as 900 MB/s.
when I add my code back, the throughput drops to about 330 MB/s. My code converts the request data and push them to an async_channel::bounded(1024). I print the size of the channel every second to be sure that it's not full.

I think it shows the server has reached some limits so the clients been back pressured. But in either case, the cpu usage is a little higher than 300%. Don't know why I'm not able to us…

View full answer

LucioFranco · 2022-02-18T19:25:25Z

LucioFranco
Feb 18, 2022
Maintainer

I would run a flamegraph and potentially an off-cpu flamegraph to see what is stopping the cpu from being saturated.

0 replies

alanhe · 2022-02-21T12:35:09Z

alanhe
Feb 21, 2022
Author

Unfortunately, the off-cpu perf I get isn't very useful.

I keep the clients settings the same, and change the server code.

when I delete everything in // process request obj. (from code sample above, to poll requests and simply drops them), the throughput can go as high as 900 MB/s.
when I add my code back, the throughput drops to about 330 MB/s. My code converts the request data and push them to an async_channel::bounded(1024). I print the size of the channel every second to be sure that it's not full.

I think it shows the server has reached some limits so the clients been back pressured. But in either case, the cpu usage is a little higher than 300%. Don't know why I'm not able to use all the CPUs.

2 replies

LucioFranco Feb 21, 2022
Maintainer

Yeah seems like something in your process logic, likely either bottlenecking the server or blocking the event queue. I would also consider switching to the tokio::sync types over async_channel. Its hard to diagnose whats going on without the full code but I would look into what you might be waiting on. From the off cpu it looks like you may be spending a lot of time on the workers not doing much.

alanhe Feb 23, 2022
Author

It is correct. async_channel is slow.

alanhe · 2022-02-22T09:37:10Z

alanhe
Feb 22, 2022
Author

I might know what's going on. When I add test loads, at some point of time, the client will get a CANCELLED error from the server.
It then try to reconnect, but due to some bugs in my code, the client will not push data again even new TCP connection is made.

Some clients stop pushing data, that's why the throughput cannot go higher.

The error I get is:

Client (java)

io.grpc.StatusRuntimeException: CANCELLED: Client sendMessage() failed with Error
	at io.grpc.Status.asRuntimeException(Status.java:535) ~[grpc-api-1.44.0.jar!/:1.44.0]
	at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:479) [grpc-stub-1.44.0.jar!/:1.44.0]
	at io.grpc.internal.DelayedClientCall$DelayedListener$3.run(DelayedClientCall.java:463) [grpc-core-1.44.0.jar!/:1.44.0]
	at io.grpc.internal.DelayedClientCall$DelayedListener.delayOrExecute(DelayedClientCall.java:427) [grpc-core-1.44.0.jar!/:1.44.0]
	at io.grpc.internal.DelayedClientCall$DelayedListener.onClose(DelayedClientCall.java:460) [grpc-core-1.44.0.jar!/:1.44.0]
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:562) [grpc-core-1.44.0.jar!/:1.44.0]
	at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70) [grpc-core-1.44.0.jar!/:1.44.0]
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:743) [grpc-core-1.44.0.jar!/:1.44.0]
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:722) [grpc-core-1.44.0.jar!/:1.44.0]
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) [grpc-core-1.44.0.jar!/:1.44.0]
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) [grpc-core-1.44.0.jar!/:1.44.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_221]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_221]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.51.Final.jar!/:4.1.51.Final]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_221]

Server (I guess it is because the client reconnected)

status: Unknown, message: "error reading a body from connection: stream error received: stream no longer needed", details: [], metadata: MetadataMap { headers: {} }

I'm still try to figure out where the CANCELLED is from.

0 replies

alanhe · 2022-02-22T13:26:37Z

alanhe
Feb 22, 2022
Author

Then I change my client to manual flow control (by referencing this: https://github.com/grpc/grpc-java/blob/master/examples/src/main/java/io/grpc/examples/manualflowcontrol/ManualFlowControlClient.java)

The CANCELLED error seems to go away, but the server's throughput does not improve much.

I suspect there's a server IO congestion.

Does tonic (or hyper) has any settings to tune incoming IO performance?

1 reply

LucioFranco Feb 22, 2022
Maintainer

most of the settings are here https://docs.rs/tonic/latest/tonic/transport/struct.Server.html

alanhe · 2022-02-23T11:58:01Z

alanhe
Feb 23, 2022
Author

My bad. async_channel#send() is the bottleneck, which is a big surprise to me.
I change my code to shard requests to more than one queues. And then the throughput & CPU usage can increase as I expect them to be.

Thanks for looking into my question!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeking for hep about performance tuning #915

{{title}}

Replies: 5 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Seeking for hep about performance tuning #915

alanhe Feb 18, 2022

Replies: 5 comments · 3 replies

LucioFranco Feb 18, 2022 Maintainer

alanhe Feb 21, 2022 Author

LucioFranco Feb 21, 2022 Maintainer

alanhe Feb 23, 2022 Author

alanhe Feb 22, 2022 Author

alanhe Feb 22, 2022 Author

LucioFranco Feb 22, 2022 Maintainer

alanhe Feb 23, 2022 Author

alanhe
Feb 18, 2022

Replies: 5 comments 3 replies

LucioFranco
Feb 18, 2022
Maintainer

alanhe
Feb 21, 2022
Author

LucioFranco Feb 21, 2022
Maintainer

alanhe Feb 23, 2022
Author

alanhe
Feb 22, 2022
Author

alanhe
Feb 22, 2022
Author

LucioFranco Feb 22, 2022
Maintainer

alanhe
Feb 23, 2022
Author