-
Notifications
You must be signed in to change notification settings - Fork 977
refactor: confusing timeout handling #3130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v4
Are you sure you want to change the base?
Conversation
… Brought in nav hooks into the nav timeout and changed the combined timeout description
…rry a timeout if the input is zero or less this is in essence the refactoring of the combined timeout without getting rid of it.
…w test file for http crawler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution!
I have a few questions and code quality ideas for now. Other contributors have worked with the timeouts deeper in the past and might have better insight into this part of Crawlee.
try { | ||
crawlingContext.response = (await this._navigationHandler(crawlingContext, gotoOptions)) ?? undefined; | ||
// Wrap the entire navigation phase in one timeout | ||
await addTimeoutToPromise( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random thought - it would make more sense to me to do this timeout one level higher in _runRequestHandler
(where _handleNavigation
is called - i.e. here). The syntax would require less nesting and would align with how we deal with timeouts in user defined request handlers - see below:
crawlee/packages/browser-crawler/src/internals/browser-crawler.ts
Lines 506 to 510 in 2b74503
await addTimeoutToPromise( | |
async () => Promise.resolve(this.userProvidedRequestHandler(crawlingContext as LoadedContext<Context>)), | |
this.requestHandlerTimeoutInnerMillis, | |
`requestHandler timed out after ${this.requestHandlerTimeoutInnerMillis / 1000} seconds.`, | |
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would align with how you guys are currently handling the user request handler but the current implementation makes it so _handleNavigation has built in protection if you ever decide to reuse it and _runRequestHandler stays slimmer. If you believe it to be necessary I can make the change. I will flatten _handleNavigation to improve readability.
Co-authored-by: Jindřich Bär <[email protected]>
…eout tests up to standard (maintainer suggestions)
…eout tests up to standard (maintainer suggestions) pt2
Problem:
Confusing timeout logic that causes a user sees a 60s request handler timeout, but the errors will mention 130s (or 100s for HTTP crawlers where navigation timeout is only 30s).
What this PR changes
Impact / Compatibility
Closes #2951
Contributors:
@ezequiel38
@anghel9