Dealing with unhandled promise rejections in Kibana nodejs process #128016
Replies: 11 comments
-
Pinging @elastic/kibana-operations (Team:Operations) |
Beta Was this translation helpful? Give feedback.
-
Pinging @elastic/kibana-core (Team:Core) |
Beta Was this translation helpful? Give feedback.
-
Doesn't the following prevent unhandled promise rejections from crashing Kibana in 7.16+ https://github.com/elastic/kibana/blob/main/config/node.options#L9? |
Beta Was this translation helpful? Give feedback.
-
@kobelb I missed that we run Kibana with this flag. |
Beta Was this translation helpful? Give feedback.
-
Completely agree. Using @typescript-eslint/no-floating-promises as you've described to eliminate these logic bugs makes total sense to me. |
Beta Was this translation helpful? Give feedback.
-
Hum, are we using this only in production? When trying to start Kibana and ES locally on
|
Beta Was this translation helpful? Give feedback.
-
When we considered whether to crash on unhandled promise rejections with the Node v16 upgrade I created the following dashboard which gives us a good enough metric in my opinion: Also encouraging to see we're already moving in the right direction it seems... |
Beta Was this translation helpful? Give feedback.
-
Yes - node.options is interpreted by bin/kibana. |
Beta Was this translation helpful? Give feedback.
-
Related: we're prioritizing https://github.com/elastic/kibana/issues/117934 so we can lint for unhandled promises. |
Beta Was this translation helpful? Give feedback.
-
Converting this to a discussion since there is no concrete action item here and doesn't need to be tracked as an incomplete task. Happy to move back if you disagree. |
Beta Was this translation helpful? Give feedback.
-
#181456 might finally fix this 🎉 |
Beta Was this translation helpful? Give feedback.
-
Node.js starting from
v16
terminate a node process in the event of an unhandled promise rejection. Kibana in the versions<7.16
log a warning about a promise rejection not being properly handled. Starting from Kibanav7.16+
, it leads to the Kibana server crashing.If we do not begin to prevent this problem, we can face a situation where changing the Kibana code will increase the instability of the Kibana server.
Liniting
The best thing we can do is to catch the problem as early as possible in the development process to eliminate dangerous patterns on the stage of writing the code.
@typescript-eslint/no-floating-promises rule enforces error handling for created promises
To use this rule, it is necessary to provide a plugin-specific
tsconfig.json
file. Starting from #114184, plugins can declare plugin-specific linting configuration. So we can enable the rule gradually in the repository.Testing
There were several complaints that the Kibana server crashed on CI due to a network error:
kibana/x-pack/plugins/monitoring/server/static_globals.ts
Line 77 in 641b622
kibana/x-pack/plugins/security_solution/server/lib/detection_engine/signals/single_search_after.ts
Line 90 in 641b622
Bulk Indexing of signals failed: TypeError: exc.message.includes is not a function name:...
)It's hard to ensure Kibana plugins test all the possible "un-happy paths" during unit testing.
Instead of relying on the developers to do the right thing, we can add elements of chaos testing to a sub-set of functional tests to make sure Kibana is stable and resilient in the case of intermittent network problems.
Monitoring
Right now, we don't have any particular numbers on how often it leads to the Kibana server crash in the production. We are only aware of cases when users create a bug report in the repo or reach out to support to complain about runtime crashes.
Enabling APM on Cloud should help us to start monitoring the number of runtime crashes. @lizozom is it on your radar after we enable monitoring on Cloud in #117492?
Beta Was this translation helpful? Give feedback.
All reactions