Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix duplicated requests in SC UI #73

Merged
merged 10 commits into from
Sep 19, 2023

Conversation

PyExplorer
Copy link
Contributor

@PyExplorer PyExplorer commented Sep 15, 2023

The fix for two issues from here (scrapy-plugins/scrapy-zyte-api#112):

  • requests in Scrapy Cloud are double counted
  • there are duplicate requests in SC request list
    The idea is to avoid counting requests if the response is DummyResponse class.

@codecov
Copy link

codecov bot commented Sep 15, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01% 🎉

Comparison is base (a55cc06) 95.53% compared to head (c1bdda9) 95.55%.
Report is 2 commits behind head on master.

❗ Current head c1bdda9 differs from pull request most recent head cedff4c. Consider uploading reports for the commit cedff4c to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #73      +/-   ##
==========================================
+ Coverage   95.53%   95.55%   +0.01%     
==========================================
  Files          14       14              
  Lines         739      742       +3     
==========================================
+ Hits          706      709       +3     
  Misses         33       33              
Files Changed Coverage Δ
sh_scrapy/crawl.py 87.14% <100.00%> (+0.28%) ⬆️
sh_scrapy/extension.py 100.00% <100.00%> (ø)
sh_scrapy/middlewares.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 63 to 64
if type(response).__name__ == "DummyResponse":
return response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am personally not a fan of checking like this, for the record, but I am OK with it, and I realize it simplifies test/CI changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1; I was also thinking about a conditional import. But the current approach looks good enough, and it does simplify testing (though we're not testing the real use case explicitly - it's only tested via manual QA).

sh_scrapy/middlewares.py Outdated Show resolved Hide resolved
tests/test_middlewares.py Outdated Show resolved Hide resolved

@dataclass
class DummyResponse:
url: str
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor:

Just so that we're testing as close to scrapy-poet's DummyResponse, what do you think about copying the 3 lines of code from https://github.com/scrapinghub/scrapy-poet/blob/957dc34808e46059a07dc69428d5d4dca6c71ecf/scrapy_poet/api.py#L10-L31 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, not much code there - will copy and add, thank you @BurnzZ

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might not need all of this code even in scrapy-poet; see scrapinghub/scrapy-poet#99

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kmike @BurnzZ I've changed this to the code from scrapy-poet - np revert it. Is it ok to back this to

 @dataclass
    class DummyResponse:
        url: str

? I think at the moment it will be enough to test the current fix in this repo and not be tied to changes in scrapy-poet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, I probably wouldn't copy the init method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyways, it seems it doesn't matter much; no pushback at all on merging as-is.

Copy link

@BurnzZ BurnzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kmike kmike requested a review from elacuesta September 15, 2023 19:28
@@ -60,6 +60,11 @@ def process_request(self, request, spider):
request.meta[HS_PARENT_ID_KEY] = request_id

def process_response(self, request, response, spider):
# This class of response check is intended to fix the bug described here
# https://github.com/scrapy-plugins/scrapy-zyte-api/issues/112
if type(response).__name__ == "DummyResponse":
Copy link
Member

@elacuesta elacuesta Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just discussed this in a meeting with @kmike.
In addition to the name, can we also check the import path? Something like

type(response).__module__ == "scrapy_poet.api"

or

type(response).__module__.startswith("scrapy_poet")

if we want to avoid problems in case the import path changes.

It probably does not happen often, but I'm concerned that any user-defined DummyResponse will also trigger this code path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @elacuesta , the fix is added.

tests/test_middlewares.py Outdated Show resolved Hide resolved
tests/test_middlewares.py Outdated Show resolved Hide resolved
PyExplorer and others added 2 commits September 19, 2023 18:56
Co-authored-by: Eugenio Lacuesta <[email protected]>
Co-authored-by: Eugenio Lacuesta <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants