binance: Retry keep alive. #2958

JoeGruffins · 2024-09-06T07:57:17Z

Untested. Attempting to solve a problem with stuck books.

buck54321

Let's do something like this so that we can test.

Also, please look into the LIST_SUBSCRIPTIONS websocket message to see if we can leverage that in a periodic check to catch discrepancies.

JoeGruffins · 2024-09-09T08:35:33Z

Looking into querying the subscriptions.

JoeGruffins · 2024-09-11T09:03:21Z

Sorry I have not tested this on testnet/mainnet yet, but I should be able to use on testnet correct?

buck54321 · 2024-09-12T04:29:51Z

Sorry I have not tested this on testnet/mainnet yet, but I should be able to use on testnet correct?

Maybe? I haven't found testing mm on testnet very useful.

martonp

Maybe a better technique would be if an orderbook has not received an update within a certain period, maybe 30 seconds, then resync the orderbook. Currently we will only know that an orderbook is stale every time checkSubs is run, so it could remain stale for 30 mins.

buck54321 · 2024-09-20T12:21:08Z

Maybe a better technique would be if an orderbook has not received an update within a certain period, maybe 30 seconds, then resync the orderbook. Currently we will only know that an orderbook is stale every time checkSubs is run, so it could remain stale for 30 mins.

This would result in tons of unnecessary reloads on low-volume markets. checkSubs interval can be much shorter though.

martonp

I've been using this diff to test on mainnet. I've turned off my internet connection for 20 mins, then turned it back on, and the orderbooks immediately resync without hitting the new code here. The code looks good otherwise and works fine with the simnet test. Would be nice to be able to reproduce the issue though.

client/mm/libxc/binance.go

JoeGruffins · 2024-09-23T06:56:14Z

@martonp does the order book desync look ok? https://github.com/decred/dcrdex/compare/1a223db8b8ae83b3e4da70b6ddd1adbf421ef027..e3cf2a8358ff0ac4a6e6e61507439a2b1478f56b

martonp

When I disconnect for > 3 minutes, then reconnect, there seems to be a deadlock when calling VWAP. I'm looking into it.

martonp · 2024-09-24T17:15:56Z

client/mm/libxc/binance.go

@@ -288,6 +291,11 @@ func (b *binanceOrderBook) Connect(ctx context.Context) (*sync.WaitGroup, error
 if retry != nil { // don't hammer
 continue
 }
+ case <-b.disconnectedChan:


It looks like if for some reason the Websocket has not reconnected, but the snapshot request successfully completes, the orderbook will appear synced, but no messages will be coming through.

I think the order book stuff doesn't happen over websocket? It's a separate secure http request? So we should stop the books from looking synced until the websocket looks up again?

How about the last change https://github.com/decred/dcrdex/compare/7833026d42ae23acc577fd07548c97c2158e580f..adef014da6fd9c56e84197ce13a1a5e3a24d0147

So, not allowing sync again until the websocket tells us it is up. If the connect messages came out of order or we missed one that would be bad though, unsure if that can happen.

JoeGruffins · 2024-09-25T08:20:12Z

I don't think this test fail is due to my changes....

martonp · 2024-09-25T09:12:07Z

client/mm/libxc/binance.go

+ // will not place new orders.
+ connected := cs == comms.Connected
+ bnc.booksMtx.RLock()
+ defer bnc.booksMtx.RLock()


Suggested change

defer bnc.booksMtx.RLock()

defer bnc.booksMtx.RUnlock()

martonp · 2024-09-25T10:38:11Z

client/mm/libxc/binance.go

+ select {
+ case reconnectC <- struct{}{}:
+ default:
+ }


What's the reason for this?

I'm not sure. I think this is a batch of changes I got from @buck54321

A natural reconnect shouldn't need a new connection I guess.

martonp · 2024-09-25T16:19:51Z

There were a few issues I fixed.. consider these changes: 68008b3

You can test using the TestVWAP in the diff I posted above: #2958 (review)

I think another good change would be to trigger a reconnect if the list subscriptions request fails, because this probably means that the connection is broken.

JoeGruffins

@martonp I added your changes.

The changes in wsconn.go, why not just add the reconnect timer to keepAlive? As it is now we have to wait for a read or a write for the scheduled reconnect to happen. So, it will happen after we want it to, or not at all if we never have another read or write.

buck54321 · 2024-09-26T07:19:20Z

client/mm/libxc/binance.go

+ PingWait: time.Minute * 4,
+ EchoPingData: true,
+ ReconnectSync: func() {
+ bnc.log.Debugf("Binance reconnected")


OK not to trigger a full reconnect here, but maybe a call to checkSubs would be prudent.

buck54321 · 2024-09-26T07:20:56Z

client/comms/wsconn.go

 type WsCfg struct {
- // URL is the websocket endpoint URL.
- URL string
-


Why does this need to be a separate argument to the constructor?

I wanted to URL to only be stored in one place, be able to be updated, and I didn't want to add a mutex for the config or a field in the config. If we don't update the URL, when there is a read error, and it reconnects, it will only reconnect to the first market that was subscribed to.

I wanted to URL to only be stored in one place

I don't really see this as a compelling reason to change the function signature for every consumer. You can still have an unexported field that you update. When we have structs for config settings, we generally try to keep everything contained to the struct.

martonp · 2024-09-28T15:17:55Z

I made some updated to that commit: a15e541

The changes in wsconn.go, why not just add the reconnect timer to keepAlive? As it is now we have to wait for a read or a write for the scheduled reconnect to happen. So, it will happen after we want it to, or not at all if we never have another read or write.

You're right, but I don't think the timer should be in keepAlive, because we need the read loop to end. I've updated it so that we don't have to wait for a read.

JoeGruffins · 2024-09-30T08:33:25Z

Thanks those changes look good to me. Added them. They replaced the last commit.

buck54321 · 2024-10-05T01:28:22Z

client/comms/wsconn.go

We're making a lot of changes to shoehorn the auto-reconnect into the read/readRaw loops. Can you tell me what functionally is the difference between the proposed changes for this file (+112/-63) and this alternative take (+51/-6)

In that change, won't there be an additional read loop running after each reconnect? When we reconnect from handleReadError, the read/readRaw loops will return, but that won't happen during a reconnect. It would work if there was a new context created each time read/readRaw is called, and it is cancelled before sending a struct to reconnectChan.

Ah. Instead of using conn.reconnectCh <- struct{}{}, we could

conn.wsMtx.Lock() if conn.ws != nil { conn.ws.Close() } conn.wsMtx.Unlock()

I think, right?

Sending to reconnectCh already calls connect which closes the old connection so I guess the read loop would error out already.

This latest change doesn't work.. after the first AutoReconnect it just starts reconnecting over and over every second.

How about this one: martonp@bd18aaa

There is a generic read function to avoid duplication.

With this commit you can use TestVWAP to test: martonp@f6029d7

This latest change doesn't work.. after the first AutoReconnect it just starts reconnecting over and over every second.

Oh yeah it does, hmm

Added marton changes.

There was nothing wrong with the read loops and we don't need to spawn a new goroutine for every message. Why are we making that change?

The reason for the read loop change is to be able to end the read loop whenever we do an auto reconnect. The reason for the new goroutine is to be able to end the read loop without having to wait for the ReadMessage or ReadJSON call to return.

If we just send a message on reconnectCh, the old read loop will still be running, and it will attempt to reconnect again whenever it encounters an error. We could solve this by creating new contexts for each call to read, but I think it's the most simple to only have one read loop running at one time, and only send a message on reconnectCh whenever the read loop is being ended either due to an error or an auto reconnect.

JoeGruffins · 2024-10-10T08:33:25Z

testbinance panic:

2024-10-10 17:10:45.032 [ERR] TB: Error getting deposit confirmations for ETH -> 0x41e71205021391160dc30bd5840f8a93e9d2bdcaf2fbe5c60f50c75d2fc7cc7a: TransactionReceipt error: not found
2024-10-10 17:25:17.298 [ERR] TB: Error getting deposit confirmations for BTC -> f7e8cb5a3ba91149970f97d3d8c9b9b645442514849a55a139d01b3d8495f129: gettransaction error with output = "error code: -5\nerror message:\nInvalid or non-wallet transaction id\n", err = exit status 5
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x8016f8]

goroutine 70 [running]:
main.(*fakeBinance).updateOrderBalances(0xc0001243c0, {0xc000144df6?, 0xd659c0?}, 0x0, 0x3fb999999999999a, 0x3fa43731c574e9b7, 0x2)
        /home/joe/git/dcrdex/client/cmd/testbinance/main.go:968 +0xf8
main.(*fakeBinance).run.func2()
        /home/joe/git/dcrdex/client/cmd/testbinance/main.go:408 +0x4b3
created by main.(*fakeBinance).run in goroutine 1
        /home/joe/git/dcrdex/client/cmd/testbinance/main.go:369 +0xd1

JoeGruffins · 2024-10-10T08:58:38Z

Removed the wsconn changes for now. They should maybe be a separate pr. Attempting to fix the other issues pointed out by @martonp https://github.com/decred/dcrdex/compare/d089412d7cd1f74acae443b4728d581dad35efb3..1ea349eed8446916cd75718efdf265400db9925d

* handle ws reconnect signal * binance: Retry keep alive. * testbinance: Add flappy websocket. * binance: Check market depth subs. * binance: Desync books on disconnect. --------- Co-authored-by: Brian Stafford <[email protected]>

JoeGruffins force-pushed the binancewatchlistenkey branch 3 times, most recently from f436428 to 7b64763 Compare September 6, 2024 08:11

buck54321 reviewed Sep 7, 2024

View reviewed changes

martonp reviewed Sep 20, 2024

View reviewed changes

martonp reviewed Sep 21, 2024

View reviewed changes

client/mm/libxc/binance.go Outdated Show resolved Hide resolved

client/mm/libxc/binance.go Outdated Show resolved Hide resolved

JoeGruffins force-pushed the binancewatchlistenkey branch 3 times, most recently from 56ed9ba to e3cf2a8 Compare September 23, 2024 06:54

martonp reviewed Sep 24, 2024

View reviewed changes

JoeGruffins force-pushed the binancewatchlistenkey branch 2 times, most recently from 7833026 to adef014 Compare September 25, 2024 08:05

martonp reviewed Sep 25, 2024

View reviewed changes

JoeGruffins force-pushed the binancewatchlistenkey branch 2 times, most recently from d2dc7fb to b97d591 Compare September 26, 2024 06:17

JoeGruffins commented Sep 26, 2024

View reviewed changes

buck54321 reviewed Sep 26, 2024

View reviewed changes

JoeGruffins force-pushed the binancewatchlistenkey branch from b97d591 to a15e541 Compare September 30, 2024 07:50

buck54321 reviewed Oct 5, 2024

View reviewed changes

buck54321 and others added 4 commits October 8, 2024 15:09

handle ws reconnect signal

8214744

binance: Retry keep alive.

c0d2403

testbinance: Add flappy websocket.

bc30c1b

binance: Check market depth subs.

bbc1dc3

JoeGruffins force-pushed the binancewatchlistenkey branch 3 times, most recently from cd816d9 to d089412 Compare October 10, 2024 05:58

binance: Desync books on disconnect.

1ea349e

JoeGruffins force-pushed the binancewatchlistenkey branch from d089412 to 1ea349e Compare October 10, 2024 08:57

buck54321 approved these changes Oct 10, 2024

View reviewed changes

buck54321 merged commit 5f4a258 into decred:master Oct 10, 2024
5 checks passed

buck54321 mentioned this pull request Oct 11, 2024

binance: update book stream url #3015

Merged

binance: Retry keep alive. #2958

binance: Retry keep alive. #2958

Conversation

JoeGruffins commented Sep 6, 2024 • edited Loading

buck54321 left a comment

Choose a reason for hiding this comment

JoeGruffins commented Sep 9, 2024

JoeGruffins commented Sep 11, 2024

buck54321 commented Sep 12, 2024

martonp left a comment

Choose a reason for hiding this comment

buck54321 commented Sep 20, 2024

martonp left a comment

Choose a reason for hiding this comment

JoeGruffins commented Sep 23, 2024

martonp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoeGruffins commented Sep 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martonp commented Sep 25, 2024

JoeGruffins left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martonp commented Sep 28, 2024 • edited Loading

JoeGruffins commented Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

buck54321 Oct 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martonp Oct 10, 2024 • edited Loading

Choose a reason for hiding this comment

JoeGruffins commented Oct 10, 2024

JoeGruffins commented Oct 10, 2024

JoeGruffins commented Sep 6, 2024 •

edited

Loading

martonp commented Sep 28, 2024 •

edited

Loading

JoeGruffins commented Sep 30, 2024 •

edited

Loading

buck54321 Oct 6, 2024 •

edited

Loading

martonp Oct 10, 2024 •

edited

Loading