Fix duplicated requests in SC UI #73

PyExplorer · 2023-09-15T05:43:18Z

The fix for two issues from here (scrapy-plugins/scrapy-zyte-api#112):

requests in Scrapy Cloud are double counted
there are duplicate requests in SC request list
The idea is to avoid counting requests if the response is DummyResponse class.

codecov · 2023-09-15T05:44:16Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01% 🎉

Comparison is base (a55cc06) 95.53% compared to head (c1bdda9) 95.55%.
Report is 2 commits behind head on master.

❗ Current head c1bdda9 differs from pull request most recent head cedff4c. Consider uploading reports for the commit cedff4c to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #73      +/-   ##
==========================================
+ Coverage   95.53%   95.55%   +0.01%     
==========================================
  Files          14       14              
  Lines         739      742       +3     
==========================================
+ Hits          706      709       +3     
  Misses         33       33

Files Changed	Coverage Δ
sh_scrapy/crawl.py	`87.14% <100.00%> (+0.28%)`	⬆️
sh_scrapy/extension.py	`100.00% <100.00%> (ø)`
sh_scrapy/middlewares.py	`100.00% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Gallaecio · 2023-09-15T07:28:34Z

sh_scrapy/middlewares.py

+        if type(response).__name__ == "DummyResponse":
+            return response


I am personally not a fan of checking like this, for the record, but I am OK with it, and I realize it simplifies test/CI changes.

+1; I was also thinking about a conditional import. But the current approach looks good enough, and it does simplify testing (though we're not testing the real use case explicitly - it's only tested via manual QA).

sh_scrapy/middlewares.py

tests/test_middlewares.py

BurnzZ · 2023-09-15T10:29:30Z

tests/test_middlewares.py

+
+    @dataclass
+    class DummyResponse:
+        url: str


Minor:

Just so that we're testing as close to scrapy-poet's DummyResponse, what do you think about copying the 3 lines of code from https://github.com/scrapinghub/scrapy-poet/blob/957dc34808e46059a07dc69428d5d4dca6c71ecf/scrapy_poet/api.py#L10-L31 ?

Actually, not much code there - will copy and add, thank you @BurnzZ

we might not need all of this code even in scrapy-poet; see scrapinghub/scrapy-poet#99

@kmike @BurnzZ I've changed this to the code from scrapy-poet - np revert it. Is it ok to back this to

@dataclass class DummyResponse: url: str

? I think at the moment it will be enough to test the current fix in this repo and not be tied to changes in scrapy-poet.

so, I probably wouldn't copy the init method

Anyways, it seems it doesn't matter much; no pushback at all on merging as-is.

BurnzZ

LGTM!

elacuesta · 2023-09-18T14:59:12Z

sh_scrapy/middlewares.py

@@ -60,6 +60,11 @@ def process_request(self, request, spider):
            request.meta[HS_PARENT_ID_KEY] = request_id

    def process_response(self, request, response, spider):
+        # This class of response check is intended to fix the bug described here
+        # https://github.com/scrapy-plugins/scrapy-zyte-api/issues/112
+        if type(response).__name__ == "DummyResponse":


We just discussed this in a meeting with @kmike.
In addition to the name, can we also check the import path? Something like

type(response).__module__ == "scrapy_poet.api"

or

type(response).__module__.startswith("scrapy_poet")

if we want to avoid problems in case the import path changes.

It probably does not happen often, but I'm concerned that any user-defined DummyResponse will also trigger this code path.

Thank you @elacuesta , the fix is added.

tests/test_middlewares.py

Co-authored-by: Eugenio Lacuesta <[email protected]>

PyExplorer added 2 commits September 14, 2023 12:16

check name for DummyResponse

819a066

tests for DummyResponse

e7a5865

PyExplorer mentioned this pull request Sep 15, 2023

Some discrepancies in request/response stats scrapy-plugins/scrapy-zyte-api#112

Closed

PyExplorer added 2 commits September 15, 2023 08:49

fix incompatibility with Python<3.7

9f05684

add dataclass import

ab4c623

Gallaecio approved these changes Sep 15, 2023

View reviewed changes

BurnzZ reviewed Sep 15, 2023

View reviewed changes

PyExplorer added 3 commits September 15, 2023 15:01

revert sorting imports

9f022be

using original DummyResponse from scrapy-poet

a3bb664

commenting a reason checking DummyResponse

f903f94

BurnzZ approved these changes Sep 15, 2023

View reviewed changes

kmike approved these changes Sep 15, 2023

View reviewed changes

wRAR approved these changes Sep 15, 2023

View reviewed changes

kmike requested a review from elacuesta September 15, 2023 19:28

elacuesta reviewed Sep 18, 2023

View reviewed changes

scrapy-poet.api as module for DummyResponse

5cf736c

elacuesta reviewed Sep 19, 2023

View reviewed changes

tests/test_middlewares.py Outdated Show resolved Hide resolved

tests/test_middlewares.py Outdated Show resolved Hide resolved

PyExplorer and others added 2 commits September 19, 2023 18:56

update request's type

baa44fb

Co-authored-by: Eugenio Lacuesta <[email protected]>

update request's type

cedff4c

Co-authored-by: Eugenio Lacuesta <[email protected]>

elacuesta merged commit 0df0d6e into scrapinghub:master Sep 19, 2023

BurnzZ mentioned this pull request Jan 22, 2024

Missing Parent Request #, Duration, and Response Size fields #78

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix duplicated requests in SC UI #73

Fix duplicated requests in SC UI #73

PyExplorer commented Sep 15, 2023 •

edited

Loading

codecov bot commented Sep 15, 2023 •

edited

Loading

Gallaecio Sep 15, 2023

kmike Sep 15, 2023

BurnzZ Sep 15, 2023

PyExplorer Sep 15, 2023

kmike Sep 15, 2023

PyExplorer Sep 15, 2023

kmike Sep 15, 2023

kmike Sep 15, 2023

BurnzZ left a comment

elacuesta Sep 18, 2023 •

edited

Loading

PyExplorer Sep 19, 2023

		if type(response).__name__ == "DummyResponse":
		return response

Fix duplicated requests in SC UI #73

Fix duplicated requests in SC UI #73

Conversation

PyExplorer commented Sep 15, 2023 • edited Loading

codecov bot commented Sep 15, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BurnzZ left a comment

Choose a reason for hiding this comment

elacuesta Sep 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PyExplorer commented Sep 15, 2023 •

edited

Loading

codecov bot commented Sep 15, 2023 •

edited

Loading

elacuesta Sep 18, 2023 •

edited

Loading