-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tracking issue] go-libp2p resource manager critical post release fixes #9442
Comments
@ajnavarro : are there any other critical followups with resource manager we should be tracking? |
(I pasted a comment here that I meant to have in #9432. I moved it over: #9432 (comment) ). |
@BigLep Adding libp2p/go-libp2p#1928 to the list |
After fixing some problems on the routing system when using the parallel router, I'll continue with #9438 |
@ajnavarro : I updated the issue description with the latest info and started a PR with some dedicated libp2p resource management docs: #9468 |
I updated the main issue comment adding all the work that is being done for RM. Pending work to be reviewed and merged: |
A new one to the bucket: |
Two more added (theme 13) around zero handling: |
Added some deeper possible solutions after talking with @MarcoPolo about how to improve RM integration on Kubo: #9580 |
I just stumbled into theme 3 in #9432, I'll repeat what I said there:
Wrapping it in quotes to show that it is "verbatim" would probably help as well, to deal with errors foreign to kubo. Printing the remote node version (if its known) would maybe also help debugging it further. |
In the latest round of consolidating/organizing the resource manager/accountant work, I'm closing this in favor of #9650 |
This is the tracking issue for the streams of work that need to get corrected/fixed as result of the Kubo 0.17 release with libp2p resource manager enabled by default: #8761
Must complete before 0.18 RC
Theme 2: reports of default limits not working well for users
Theme 3: improve RM errors coming from other peers
This is related to reports of a disabled go-libp2p resource manager still managing resources. In fact this is an error message from a remote go-libp2p peer exceeding their own resource limits and when the message is printed on the local peer its hard to differentiate between the do.
Things we can do:
Theme 4: confusion around "magic values"
This is about how "4611686018427388000" looks like a random number when it is actually our "infinity".
Things we can do:
Options not on the table:
ipfs swarm limits all
to go from "4611686018427388000" to "infinity" on the output because that isn't valid JSON.Theme 5: be clearer on startup about what limits are being set and why
Add a log message like:
Theme 6: Wrong tone about resource manager getting in the way (being a bug) vs. being a feature
UX angle about being clear that in general this is a feature not a bug:
Theme 7: clarity around the "error message" meaning
There isn't clarity around what
"system: cannot reserve inbound connection: resource limit exceeded"
. means. For this example, it means Swarm.ResourceMgr.Limits.System.ConnsInbound is exceeded.Things we can do:
Reverse engineer the message based on https://github.com/libp2p/go-libp2p/blob/master/p2p/host/resource-manager/scope.go so we can map it back toSwarm.ResourceMgr.Limits.$scope.$limit
. If we do that, we can then print what the limit value is.Theme 8: Provide actionable advice when resource limits are hit
When a resource limit is hit, we point users to https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr. We could provide a better documentation path for how someone debugs this situation.
Theme 9: have the ConnMgr limits be set under Swarm.ResourceMgr.Limits.System.ConnsInbound
As discussed in #9468, this allows low priority idle connections to get cleaned up to make space for higher priority connections
Things we can do:
Theme 10: resource manager doing its job of protecting a node is alarming
This was broughtup throughout #9432, but generally users have found the resource manager "ERRORs" spammy. That they are printed as ERRORs also runs counter to our narrative about this being a feature. Should a feature doing its job be an error?
Things we can do:
Theme 11: fix bugs in the swarm stats command
Theme 12: remove additional footguns around (soft) ConnMgr and (hard) ResourceMgr limits and their interactions
Theme 13: clarify and improve handling of zeroes
ipfs swarm limit <scope> --reset
is setting to zero all other scopes. #9559Ideally completing before the 0.18 final release
Theme 1: usability issues in entering config
It's too easy for someone to enter invalid config and not get any feedback that they have done so.
Theme 11: Improve usability in the swarm stats command
The text was updated successfully, but these errors were encountered: