Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the new HttpResponse that replaces ResponseData in web_poet #67

Merged
merged 5 commits into from
May 7, 2022

Conversation

BurnzZ
Copy link
Contributor

@BurnzZ BurnzZ commented Mar 28, 2022

Reference PR from web_poet: scrapinghub/web-poet#30.

This also uses the small enhancement in scrapinghub/web-poet#33.

Checklist before release:

  • Remove references repo in setup.py and tox.ini that prevents breaking the CI on this PR

@BurnzZ BurnzZ requested a review from kmike March 28, 2022 06:54
@codecov
Copy link

codecov bot commented Mar 28, 2022

Codecov Report

Merging #67 (a254ba4) into master (0965c68) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master      #67   +/-   ##
=======================================
  Coverage   95.96%   95.96%           
=======================================
  Files           9        9           
  Lines         372      372           
=======================================
  Hits          357      357           
  Misses         15       15           
Impacted Files Coverage Δ
scrapy_poet/__init__.py 100.00% <ø> (ø)
scrapy_poet/middleware.py 100.00% <100.00%> (ø)
scrapy_poet/page_input_providers.py 94.00% <100.00%> (ø)

docs/providers.rst Outdated Show resolved Hide resolved
tests/test_middleware.py Outdated Show resolved Hide resolved

"""Build a ``web_poet.HttpResponse`` instance using a Scrapy ``Response``"""
return [
HttpResponse(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
HttpResponse(
web_poet.HttpResponse(

docs/providers.rst Outdated Show resolved Hide resolved
response_data["url"],
response_data["body"],
status=response_data["status"],
headers=HttpResponseHeaders.from_bytes_dict(response_data["headers"]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is HttpResponseHeaders.from_bytes_dict needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kmike It's because Scrapy headers look like this:

scrapy_headers = {
    b"Content-Encoding": [b"gzip", b"br"],
    b"Content-Type": [b"text/html"],
    b"content-length": b"648",
}

Reference for the motivation: scrapinghub/web-poet#33 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not wrong, here the result is not coming from Scrapy, it's a part of serialization / deserialization for cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh I see what you mean. Nice catch @kmike ! 🙌 Fixed this in #72

Copy link
Member

@kmike kmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good @BurnzZ!

@BurnzZ
Copy link
Contributor Author

BurnzZ commented May 7, 2022

Thanks for the review @kmike ! 🙏 I haven't ticked off the TODO-list on this PR regarding updating the setup.py and tox.ini deps. In anycase, the PR in #62 is built on top of this one so we could look into it again.

@BurnzZ BurnzZ merged commit 67a788b into master May 7, 2022
@BurnzZ BurnzZ deleted the responsedata-to-httpresponse branch May 7, 2022 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants