Skip to content

Conversation

anghel9
Copy link

@anghel9 anghel9 commented Aug 12, 2025

Problem:

Confusing timeout logic that causes a user sees a 60s request handler timeout, but the errors will mention 130s (or 100s for HTTP crawlers where navigation timeout is only 30s).

What this PR changes

  • Refactors _handleNavigation in http-crawler.ts and browser-crawler.ts so the navigation timeout covers the entire navigation flow, including navigation hooks.
  • Removes the buffer timeout in both crawlers.
  • Improves BasicCrawler messaging: instead of “requestHandler timed out after 100s/130s,” the error now states that a safety timeout was reached.
  • This safety timeout is disabled by default with the new logic, but can be re-enabled by setting it to any value > 0.

Impact / Compatibility

  • Workloads that implicitly relied on the hidden buffer may now time out sooner. We may want to increase the default navigation timeout to compensate.
  • No behavioral surprises beyond the tighter, more honest timing.

Closes #2951

Contributors:
@ezequiel38
@anghel9

@anghel9 anghel9 changed the title Refactor timeout handling refactor: confusing timeout handling Aug 12, 2025
Copy link
Contributor

@barjin barjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!

I have a few questions and code quality ideas for now. Other contributors have worked with the timeouts deeper in the past and might have better insight into this part of Crawlee.

try {
crawlingContext.response = (await this._navigationHandler(crawlingContext, gotoOptions)) ?? undefined;
// Wrap the entire navigation phase in one timeout
await addTimeoutToPromise(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random thought - it would make more sense to me to do this timeout one level higher in _runRequestHandler (where _handleNavigation is called - i.e. here). The syntax would require less nesting and would align with how we deal with timeouts in user defined request handlers - see below:

await addTimeoutToPromise(
async () => Promise.resolve(this.userProvidedRequestHandler(crawlingContext as LoadedContext<Context>)),
this.requestHandlerTimeoutInnerMillis,
`requestHandler timed out after ${this.requestHandlerTimeoutInnerMillis / 1000} seconds.`,
);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would align with how you guys are currently handling the user request handler but the current implementation makes it so _handleNavigation has built in protection if you ever decide to reuse it and _runRequestHandler stays slimmer. If you believe it to be necessary I can make the change. I will flatten _handleNavigation to improve readability.

anghel9 and others added 3 commits August 13, 2025 14:13
Co-authored-by: Jindřich Bär <[email protected]>
…eout tests up to standard (maintainer suggestions)
…eout tests up to standard (maintainer suggestions) pt2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants