Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting to_return in web-poet rules #88

Merged
merged 80 commits into from
Jan 30, 2023
Merged

Supporting to_return in web-poet rules #88

merged 80 commits into from
Jan 30, 2023

Conversation

BurnzZ
Copy link
Contributor

@BurnzZ BurnzZ commented Oct 6, 2022

In line with the upcoming development in scrapinghub/web-poet#84.

This is built on top of #89 for now which moves away from the deprecated functionalities of web-poet as introduced in scrapinghub/web-poet#84.

This also addresses #90 .

TODO:

  • move away from deprecated web-poet functionalities
  • tests and adjustments for the new to_return parameter in web-poet's rules.
  • deprecate passing tuples in SCRAPY_POET_OVERRIDES and the Registry with the advent of to_return.
  • pass mypy
  • update setup.py and tox.ini to point to the new web-poet version containing new registry features.

NOTES:

@codecov
Copy link

codecov bot commented Oct 6, 2022

Codecov Report

Merging #88 (1caa0de) into master (399371f) will decrease coverage by 3.05%.
The diff coverage is 100.00%.

❗ Current head 1caa0de differs from pull request most recent head 140239a. Consider uploading reports for the commit 140239a to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #88      +/-   ##
==========================================
- Coverage   91.91%   88.87%   -3.05%     
==========================================
  Files          13       14       +1     
  Lines         569      737     +168     
==========================================
+ Hits          523      655     +132     
- Misses         46       82      +36     
Impacted Files Coverage Δ
scrapy_poet/commands.py 47.05% <ø> (ø)
scrapy_poet/utils/mockserver.py 67.44% <ø> (ø)
scrapy_poet/api.py 100.00% <100.00%> (ø)
scrapy_poet/downloadermiddlewares.py 100.00% <100.00%> (ø)
scrapy_poet/injection.py 98.97% <100.00%> (ø)
scrapy_poet/injection_errors.py 100.00% <100.00%> (ø)
scrapy_poet/page_input_providers.py 94.91% <100.00%> (-1.14%) ⬇️
scrapy_poet/utils/__init__.py 100.00% <100.00%> (ø)
scrapy_poet/utils/testing.py 81.57% <100.00%> (ø)
... and 2 more

scrapy_poet/overrides.py Outdated Show resolved Hide resolved
scrapy_poet/api.py Outdated Show resolved Hide resolved
scrapy_poet/api.py Outdated Show resolved Hide resolved
scrapy_poet/injection.py Outdated Show resolved Hide resolved
tests/test_web_poet_rules.py Outdated Show resolved Hide resolved
tests/test_web_poet_rules.py Outdated Show resolved Hide resolved
@BurnzZ BurnzZ changed the base branch from master to web-poet-update October 14, 2022 10:30
has received a different type of item class from the page object.
"""
item, deps = yield crawl_item_and_deps(ReplacedProduct)
assert "raise UndeclaredProvidedTypeError" in caplog.text
Copy link
Contributor Author

@BurnzZ BurnzZ Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return a more apt exception since UndeclaredProvidedTypeError primarily means that there's no provider available to give the requested dependency (in this case, the item).

However, this scenario is mostly due to an incorrect class that was returned.

Perhaps a better result is raising MalformedProvidedClassesError or creating another exception like IncorrectProvidedClassError.

tests/test_web_poet_rules.py Outdated Show resolved Hide resolved
tests/utils.py Outdated Show resolved Hide resolved
@wRAR
Copy link
Member

wRAR commented Nov 22, 2022

Continuing the today's discussion: my current implementation of a Scrapy command that takes a PO and returns its dependencies and result (like crawl_item_and_deps() from this PR, but integrated into the scrapy.cmdline-provided crawler) uses the following things from this PR: tests.test_web_poet_rules.spider_for(), tests.utils.create_scrapy_settings() + tests.test_web_poet_rules.rules_settings(), tests.utils.CollectorPipeline, tests.utils.InjectedDependenciesCollectorMiddleware. Its implementation looks like this:

        po = load_object(po_name)
        # here we could also support running a specific spider to apply its custom_settings etc
        spider_cls = spider_for(po)
        self.settings.setdict(additional_settings())
        crawler = Crawler(spider_cls, self.settings)
        self.crawler_process.crawl(crawler, url=url)
        self.crawler_process.start()
        items = crawler.spider.collected_items[0]
        deps = crawler.spider.collected_response_deps[0]

So if we keep this implementation, it would need at least CollectorPipeline, InjectedDependenciesCollectorMiddleware and spider_for somewhere in scrapy_poet, not in its tests.

@BurnzZ
Copy link
Contributor Author

BurnzZ commented Nov 23, 2022

Hi @wRAR , I moved most of the test utilities in scrapy_poet/utils/ so we could reuse most of it in the package. 9a00b63

However, I haven't moved spider_for() and rules_settings() since they're quite specific to the type of tests they're used. There are a few variations of this as well. For these, I think we can redefine them in the context we're using them.

@kmike kmike changed the base branch from web-poet-update to master November 24, 2022 07:55
docs/rules-from-web-poet.rst Outdated Show resolved Hide resolved
docs/settings.rst Outdated Show resolved Hide resolved
docs/settings.rst Outdated Show resolved Hide resolved
docs/rules-from-web-poet.rst Outdated Show resolved Hide resolved
docs/settings.rst Outdated Show resolved Hide resolved
scrapy_poet/injection.py Outdated Show resolved Hide resolved
Copy link
Member

@kmike kmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 🎉 🚀

@kmike
Copy link
Member

kmike commented Jan 30, 2023

Any idea what's up with the CI @BurnzZ? I tried to restart jobs, but it didn't help. No logs, nothing :)

@kmike kmike merged commit 4e2d850 into master Jan 30, 2023
@wRAR wRAR deleted the new-web-poet branch November 27, 2023 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants