-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of UDPMux: Add BatchIO and multiple ports options #608
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportPatch coverage is
📢 Thoughts on this report? Let us know!. |
3338e2a
to
9330afe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great work @cnderrauber !!!
9330afe
to
efe2551
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool to see this all come together. Nicely done!
just a few suggestions on comments
d253938
to
0ad5b09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cnderrauber,
great to see progress in this direction :)
A few points:
- This PR breaks the API of
UDPMux
. So we need to bump the major version of the module. Thats something I am open to, but I think we should batch this with other API-breaking changes (like Fix type of CandidatePairState enum #571). - Cant you simply use
AllConnsGetter
to get the count of connections? - I do not the the batched connection interface being used anywhere yet. Do you have or are planning to open a dedicated PR for this?
- It would be good if we can also add those new features to the standard
UDPMux
. Or is there a reasoning behind adding this only to theMultiUDPMux
?
Best regards,
Steffen
I think it can't be used. The
I can't get the point of this question, the batched connection interface is in the pion/transport repo, what do you mean a dedicated PR?
The UDPMux can have the batch feature simply by pass a PacketConn to its constructor |
Sorry. I dont see where the batched connection interface is used in pion/ice. I assume we need another PR to actually use it? Or did I missed something?
Alright. Thanks for the explanation. |
0ad5b09
to
f7b9660
Compare
Improve performance of UDPMux by BatchIO and load balance on ports
f7b9660
to
280be41
Compare
Merge master branch
Fix panic lint
9ca1867
to
6416274
Compare
}() | ||
|
||
// Skip IPv6 test on i386 | ||
const ptrSize = 32 << (^uintptr(0) >> 63) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should check against runtime.GOARCH
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they are identical, and if one day there is a new GOARCH, the test will be failed.
Co-authored-by: Steffen Vogel <[email protected]>
Co-authored-by: Steffen Vogel <[email protected]>
Co-authored-by: Steffen Vogel <[email protected]>
Co-authored-by: Steffen Vogel <[email protected]>
Hi @cnderrauber, (This is a reply to #608 (comment), but I think its a more general concern which deserves its own comment.) I am sorry to raise objections about this PR. But I am afraid that this PR introduces even more complexity into the UDPMux logic. And given the current state, we already have three different types for UDP muxing and we have not yet achieved some of the initial goals to allow single-port muxing for relayed candidates. I was under the impression that the PR attempt to introduce a batched connection interface to I am not really that happy with hiding the packet batching/queuing inside the mux. I cleaner implementation would expose a batched connection interface to the user of pion/ice (in the form of Because of the implementation inside the UDP muxing logic, the port balancing and queuing is also limited to configurations which use the mux. How about configuration without a UDPMux? Or relayed candidates? |
That's true we already have three different types of UDPMux but actually they are all based on the UDPMuxDefault so they will all get benefits from the change. And I can't see why this will block other goals like
Port balancing and Batch are two standalone options to improve the mux throughput that user can choose to enable anyone of them or both. A Mux only listens on a single port without batch will have a poor performance that can't be used in a large-scale production environment.
So port balance and batch should limit to the mux. |
Don't be sorry! Appreciate you bringing up these points. It's worth discussing the tradeoffs that are being made. Let me share a bit more color here on the intent. The goal of the PR is to improve performance of Pion when writing out to clients. In benchmarking, we discovered the bottleneck to be syscalls made during the Benchmarks have shown that it's improved write performance by over 100%, compared to using standard pion/ice connections without batching. Also, by batching packets to multiple clients together, it helps to keep jitter under control, versus queuing sufficient packets before calling On the point of why it would use multiple ports. We've discovered additional bottlenecks writing out using a single UDP port. Locks within the write path would limit the number of cores we can effectively utilize. By allowing multiple ports to be used, we were able to get around some of these bottlenecks. Now, this is an option that isn't enabled by default. And for those that don't care about squeezing these bits of performance out of the system, the standard UDPMux is still available. I hear you on the complexity it introduces in the code, though I hope what's in this PR is a reasonable set of tradeoffs that doesn't make the interface anymore difficult to use. We are opening the PR here because we thought the work could benefit everyone using Pion. But we'd also respect your feedback if you feel that these changes should be kept in our own repo instead. |
Hi @davidzhao, Thanks for the detailed description of the motivation behind this PR. I fully agree with you on these points. Batching can really help here to increase the throughput. I would also love to profit from this feature in my own app. There is also a very good article by Tailscale where they describe how they have been able to accelerate the WireGuard Go implementation to achieve over 10Gbps throughput in Userspace by using batching: https://tailscale.com/blog/more-throughput/ I am only arguing about the API which we use to realize it. I would love to have a batching API also for Have you thought about abandoning the use of muxes? Using dedicated connections per ICE agent could help you with the load balancing as you are dealing with a separate per connection? Regarding the port balancing: Have you considered using |
Apologies for the intrusion but port balancing on the same address/port through |
We are also load balancing on a single port using Having batching as well would be nice, but I echo the earlier comments regarding the API. It would be nice if the batching wasn't coupled so tightly to the "multi udp mux" variant, but rather something that played nicely with the default UDP mux too. |
@kcaffrey @streamer45 any interest in adding port balancing into the standard UDPMux? It should be straight-forward to integrate with batching at that point. It seems that the primary objection to the PR is the implication that batching will only be available to UDPMuxMulti and nothing else. That's not the case here. Most of the work had been put into Am I misunderstanding the concerns? |
Hello! Just checking on the progress of this PR. We our building a stack (consisting of an SFU) based on Pion, and have encountered similar performance bottlenecks on the writeTo pathway and were actually considering forking and implementing batching ourselves before stumbling on this PR. We would really appreciate to know about the progress here as indeed we expect huge performance gains from this. Meanwhile we will port this code and use it as the udp muxer.And thanks for implementing this awesome PR |
@AshishKumar4 the PR was ready to merge months ago, but several folks took issues with certain design decisions. We could merge as is, but in trying to be good citizens, we are holding off until @stv0g or others give the 👍. Until then, I'm afraid this PR is stuck here. |
@davidzhao Apologies for the late reply. Yeah that should be doable and likely to be more performant than having to go through the
For what it's worth, as I am a mere bystander here, I am supportive of these changes. I was purely adding a data point above. Overall I think it's inherently challenging to achieve high performance without introducing some complexity somewhere and leaving it all on the higher layers to deal with (e.g. app side) is not always feasible. So I find giving the option to enable/disable these sort of enhancements at this level to be a very reasonable compromise. |
Performance improvements:
Add BatchIO option to NewMultiUDPMuxPort(s), so the UDPMux can use batch write to improve write throughput.
Add multiple ports support to MultiUDPMuxDefault, this option will listen to multiple ports on a single interface for traffic load balance.