Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See how the logs change Before and After
Before:
Now:
Problem: When performing experiments on certain enhancements/changes, it takes a lot of time to analyze it manually. This manual undertaking is also error-prone and we can miss out on some key details.
Motivation: We can automate the process of analyzing these crawling experiments by ensuring that we have machine-readable crawling logs that contain sufficient information to understand how the crawl was performed. This enables us to create ready-made scripts or notebooks and generate crawling reports from the crawling logs.
Some Notes about the implementation:
url
andrequest_url
to take into account when Zyte API handles redirections.request_fingerprint
as with https://github.com/scrapinghub/scrapinghub-entrypoint-scrapy so that the hash matches with the ones inside Scrapy Cloud's Request-Tab.productNavigation
. This shouldn't be the case since users might override the spider or introduce other middlewares that filter out some requests based on some criteria and thus, the crawling logs doesn't match the actual spider requests.Other things we can do:
(We can do this in another PR as this PR attempts to remove slowdowns in how we currently analyze crawling experiments)