Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when there are no remote hosts #206

Open
erikness-doordash opened this issue Aug 21, 2023 · 3 comments
Open

Panic when there are no remote hosts #206

erikness-doordash opened this issue Aug 21, 2023 · 3 comments

Comments

@erikness-doordash
Copy link
Contributor

erikness-doordash commented Aug 21, 2023

We get a panic in our Rust Envoy filter when we try to open a new gRPC stream to any hostname that DNS can’t find any hosts for. Reproduced with Envoy versions 1.25.2 and 1.28.2, and the latest Rust proxy-wasm version 0.2.1.

Panic happens when:

  1. we point the cluster config at a fake hostname (dns domain is that not resolvable)
  2. we point the cluster config at a DNS hostname that returns 0 healthy hosts (e.g. service scaled down to zero)

There is NOT a panic if there is a server found and it returns gRPC UNAVAILABLE.

Detailed order of events:

  1. We call Context.open_grpc_stream() from this library
  2. Proxy WASM calls proxy_grpc_stream() on the Envoy host
  3. Envoy host receives gRPC unavailable
  4. Proxy WASM receives Status::InternalFailure (host side code / proxy-wasm-rust-sdk code)
    1. As a result the stream’s token ID is not saved to the Proxy WASM Dispatcher.grpc_streams HashMap
  5. Proxy WASM gets a call from host to its method Dispatcher.on_grpc_receive_initial_metadata() with token number of the stream that couldn’t get created
  6. Token ID not in hashmap, PANIC

Similar issue: #153

@erikness-doordash
Copy link
Contributor Author

erikness-doordash commented Aug 21, 2023

PANIC SCENARIO LOGS

starting main dispatch loop
async DNS resolution complete for [HOSTNAME]
DNS refresh rate reset for [HOSTNAME], (failure) refresh rate 5000 ms
initializing Primary cluster [SERVICE_NAME] completed
init manager Cluster [SERVICE_NAME] contains no targets
init manager Cluster [SERVICE_NAME] initialized, notifying ClusterImplBase
adding TLS cluster [SERVICE_NAME]
cm init: init complete: cluster=[SERVICE_NAME] primary=0 secondary=0

[vm->host] env.proxy_grpc_stream(1053140, 17, 1053157, 55, 1053212, 21, 1225984, 68, 1047956)
[C0][S5454292189521443685] cluster '[SERVICE_NAME]' match for URL '[PROTOBUF_API_NAME]'
no healthy host for HTTP connection pool
async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'no healthy upstream'
[host->vm] proxy_on_grpc_receive_initial_metadata(1, 2, 0)
[vm->host] env.proxy_log(5, 1226064, 141)
wasm log [VM_NAME]: panicked at 'invalid token_id', /Users/[USER]/.cargo/registry/src/github.com-1ecc6299db9ec823/proxy-wasm-0.2.1/src/dispatcher.rs:431:27
[vm<-host] env.proxy_log return: 0
Function: proxy_on_grpc_receive_initial_metadata failed: Uncaught RuntimeError: unreachable
Proxy-Wasm plugin in-VM backtrace:
  0:  0xd1339 - __rust_start_panic
  1:  0xd10a2 - rust_panic
  2:  0xd1071 - _ZN3std9panicking20rust_panic_with_hook17h911ba172265603dfE
  3:  0xd0137 - _ZN3std9panicking19begin_panic_handler28_$u7b$$u7b$closure$u7d$$u7d$17h7b07dd737a4d8757E
  4:  0xd0061 - _ZN3std10sys_common9backtrace26__rust_end_short_backtrace17ha54585d732ac30f6E
  5:  0xd08ed - rust_begin_unwind
  6:  0xd657a - _ZN4core9panicking9panic_fmt17h6958314cbbe045b2E
  7:  0xdca87 - _ZN4core9panicking13panic_display17hc4d1a67c733e6371E
  8:  0xdc6e7 - _ZN4core9panicking9panic_str17h49f73175e64c8cc3E
  9:  0xdc6b5 - _ZN4core6option13expect_failed17hafbfcb5da25318c1E
item added to deferred deletion list (size=1)
[vm<-host] env.proxy_grpc_stream return: 10
[vm->host] env.proxy_log(2, 1225984, 70)
wasm log [VM_NAME]: grpc stream failed to open, error InternalFailure

@erikness-doordash
Copy link
Contributor Author

erikness-doordash commented Aug 21, 2023

GOOD SCENARIO LOGS (server returns gRPC UNAVAILABLE, no panic)

starting main dispatch loop
async DNS resolution complete for [HOSTNAME]
transport socket match, socket default selected for host with address [IP_ADDR]:[PORT]
DNS hosts have changed for [HOSTNAME]
initializing Primary cluster [SERVICE_NAME] completed
init manager Cluster [SERVICE_NAME] contains no targets
init manager Cluster [SERVICE_NAME] initialized, notifying ClusterImplBase
adding TLS cluster [SERVICE_NAME]
membership update for TLS cluster [SERVICE_NAME] added 1 removed 0
cm init: init complete: cluster=[SERVICE_NAME] primary=0 secondary=0

[vm->host] env.proxy_grpc_stream(1053140, 17, 1053157, 55, 1053212, 21, 1225984, 68, 1047956)
[C0][S5073346782053791857] cluster '[SERVICE_NAME]' match for URL '[PROTOBUF_API_NAME]'
[C0][S5073346782053791857] router decoding headers:
':method', 'POST'
':path', '[PROTOBUF_API_NAME]'
':authority', '[SERVICE_NAME]'
':scheme', 'http'
'te', 'trailers'
'content-type', 'application/grpc'
'x-envoy-internal', 'true'
'x-forwarded-for', '[IP_ADDR_2]'
queueing stream due to no available connections
trying to create new connection
ConnPoolImplBase 0x5d1be7fed80, ready_clients_.size(): 0, busy_clients_.size(): 0, connecting_clients_.size(): 0, connecting_stream_capacity_: 0, num_active_streams_: 0, pending_streams_.size(): 1 per upstream preconnect ratio: 1
creating a new connection
[C0] connecting
[C0] connecting to [IP_ADDR]:[PORT]
[C0] connection in progress
Codec does not have Metadata frame support.
[C0] updating connection-level initial window size to 268435456
not creating a new connection, shouldCreateNewConnection returned false.
[vm<-host] env.proxy_grpc_stream return: 0
[vm->host] env.proxy_log(2, 1225984, 45)
wasm log [VM_NAME]: grpc stream opened, stream_id=2

[vm->host] env.proxy_grpc_send(2, 1225312, 11, 0)
[C0][S5073346782053791857] buffering 16 bytes
[vm<-host] env.proxy_grpc_send return: 0

[C0][S5073346782053791857] upstream headers complete: end_stream=true
[C0][S5073346782053791857] resetting pool request
async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'not available'

wasm log [VM_NAME]: on_grpc_stream_close(): stream_id=2, status_code 14

@pi-pi-miao
Copy link

放出来代码,看看,有可能是代码写法的问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants