Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Annotated[Item, PickFields("x", "y")] to decide which fields to populate in callback #111

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

BurnzZ
Copy link
Contributor

@BurnzZ BurnzZ commented Jan 4, 2023

A continuation for #88.

Stemming from the idea in scrapinghub/web-poet#77 to use typing.Annotated. See PEP 593.

TODO:

  • consider an exclusion list like NotPickFields
    • how does it work when Annotated[PickFields("x", "y"), NotPickFields("y", "z")]?
      • silent? warning? raises ValueError
  • avoid returning cached full-items as-is, vice-versa
  • handle partial item from the request itself (e.g. meta).
  • handle case when field in PickField/NotPickField is not available in PO
  • handle the inconsistencies of Annotated in different Python versions
  • more tests

When we're happy with the API:

  • Changelog
  • Docs

@BurnzZ BurnzZ mentioned this pull request Jan 4, 2023
7 tasks
@BurnzZ
Copy link
Contributor Author

BurnzZ commented Jan 4, 2023

There's still some stuff to do but I think this is ready for review to check if we're happy with the API before proceeding any further.

from typing Annotated, Optional

import attrs
import scrapy
from scrapy_poet import ItemPage, PickFields, handle_urls, field

@attrs.define
class BigItem:
    x: str
    y: Optional[str] = None

@handle_urls("example.com")
@attrs.define
class BigPage(ItemPage[BigItem]):
    @field
    def x(self) -> str:
        return "x"

    @field
    def y(self) -> str:
        return "y"

class SomeSpider(scrapy.Spider):
    name = "somespider"

    def start_requests(self):
        yield scrapy.Request("https://example.com", self.parse_item)

    def parse_item(self, response, item: Annotated[BigItem, PickFields("x")]):  #👈
        yield item  # should return BigItem(x="x", y=None)

@BurnzZ BurnzZ requested review from Gallaecio, kmike and wRAR January 4, 2023 12:44
@Gallaecio
Copy link
Member

The API looks good to me.

I think NotPickFields can be a nice API as well. In fact, I imagine it could potentially be used more often than PickFields. And I think we should definitely prevent mixing the two with an error or warning.

@BurnzZ BurnzZ requested a review from gatufo January 9, 2023 06:30
@BurnzZ BurnzZ force-pushed the to-return-override-docs branch from bc6acb6 to 3c6fdae Compare January 10, 2023 06:46
@BurnzZ BurnzZ changed the base branch from to-return-override-docs to new-web-poet January 10, 2023 09:30
@BurnzZ
Copy link
Contributor Author

BurnzZ commented Jan 20, 2023

Pausing this as we've decided to move most of the functionalities into web-poet. See scrapinghub/web-poet#115.

Base automatically changed from new-web-poet to master November 27, 2023 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants