Skip to content

Commit

Permalink
remove scrapy-poet registry in lieu of web-poet's registry
Browse files Browse the repository at this point in the history
  • Loading branch information
BurnzZ committed Jan 10, 2023
1 parent 2611199 commit bc6acb6
Show file tree
Hide file tree
Showing 14 changed files with 105 additions and 283 deletions.
82 changes: 44 additions & 38 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,61 +61,67 @@ in page objects and spider callbacks. The following is now possible:
In line with this, the following changes were made:

* Added a new ``scrapy_poet.page_input_providers.ItemProvider`` which makes
the usage above possible.
* Multiple changes to the ``scrapy_poet.PageObjectInputProvider`` base class
which are backward incompatible:

* It now accepts an instance of ``scrapy_poet.injection.Injector`` in its
constructor instead of ``scrapy.crawler.Crawler``. Although you can
still access the ``scrapy.crawler.Crawler`` via the ``Injector.crawler``
attribute.
* ``is_provided()`` is now an instance method instead of a class
method.

* The ``scrapy_poet.injection.Injector``'s attribute and constructor parameter
called ``overrides_registry`` is now simply called ``registry``.
* Added a new :class:`scrapy_poet.page_input_providers.ItemProvider` which
makes the usage above possible.
* Multiple changes to the
:class:`scrapy_poet.page_input_providers.PageObjectInputProvider` base
class which are backward incompatible:

* It now accepts an instance of :class:`scrapy_poet.injection.Injector`
in its constructor instead of :class:`scrapy.crawler.Crawler`. Although
you can still access the :class:`scrapy.crawler.Crawler` via the
``Injector.crawler`` attribute.
* :meth:`scrapy_poet.page_input_providers.PageObjectInputProvider.is_provided`
is now an instance method instead of a class method.

* The :class:`scrapy_poet.injection.Injector`'s attribute and constructor
parameter called ``overrides_registry`` is now simply called ``registry``.
This is backwards incompatible.
* An item class is now supported by ``scrapy_poet.callback_for`` alongside
the usual page objects. This means that it won't raise a ``TypeError``
anymore when not passing a subclass of ``web_poet.ItemPage``.
* ``scrapy_poet.overrides.OverridesRegistry`` has been deprecated and
overhauled into ``scrapy_poet.registry.OverridesAndItemRegistry``:

* It is now subclassed from ``web_poet.RulesRegistry`` which allows
outright access to its registry methods.
* It now allows retrieval of rules based on the returned item class.
* The registry doesn't accept tuples as rules anymore. Only
``web_poet.ApplyRule`` instances are allowed. The same goes for
``SCRAPY_POET_RULES`` (and the deprecated ``SCRAPY_POET_OVERRIDES``).

* As a result, the following type aliases have been removed:
``scrapy_poet.overrides.RuleAsTuple`` and
``scrapy_poet.overrides.RuleFromUser``
* These changes are backward incompatible.

* New exception: ``scrapy_poet.injector_error.ProviderDependencyDeadlockError``.
* An item class is now supported by :func:`scrapy_poet.callback_for`
alongside the usual page objects. This means that it won't raise a
:class:`TypeError` anymore when not passing a subclass of
:class:`web_poet.pages.ItemPage`.
* New exception: :class:`scrapy_poet.injection_errors.ProviderDependencyDeadlockError`.
This is raised when it's not possible to create the dependencies due to
a deadlock in their sub-dependencies, e.g. due to a circular dependency
between page objects.

Other changes:

* Now requires ``web-poet >= 0.7.0``.
* In line with web-poet's new features, the ``scrapy_poet.overrides`` module
which contained ``OverridesRegistryBase`` and ``OverridesRegistry`` has now
been removed. Instead, scrapy-poet directly uses
:class:`web_poet.rules.RulesRegistry`.

Everything should pretty much the same except for
:meth:`web_poet.rules.RulesRegistry.overrides_for` now accepts :class:`str`,
:class:`web_poet.page_inputs.http.RequestUrl`, or
:class:`web_poet.page_inputs.http.ResponseUrl` instead of
:class:`scrapy.http.Request`.

* This also means that the registry doesn't accept tuples as rules anymore.
Only :class:`web_poet.rules.ApplyRule` instances are allowed. The same goes
for ``SCRAPY_POET_RULES`` (and the deprecated ``SCRAPY_POET_OVERRIDES``).
As a result, the following type aliases have been removed:

* ``scrapy_poet.overrides.RuleAsTuple``
* ``scrapy_poet.overrides.RuleFromUser``

These changes are backward incompatible.

* Moved some of the utility functions from the test module into
``scrapy_poet.utils.testing``.
* Documentation improvements.
* Official support for Python 3.11

Deprecations:

* The ``scrapy_poet.overrides`` module has been replaced by
``scrapy_poet.registry``.
* The ``scrapy_poet.overrides.OverridesRegistry`` class is now replaced by
``scrapy_poet.registry.OverridesAndItemRegistry``.
* The ``SCRAPY_POET_OVERRIDES_REGISTRY`` setting has been replaced by
``SCRAPY_POET_REGISTRY``.
* The ``SCRAPY_POET_OVERRIDES`` setting has been replaced by
``SCRAPY_POET_RULES``.
* Official support for Python 3.11


0.6.0 (2022-11-24)
------------------
Expand Down
7 changes: 0 additions & 7 deletions docs/api_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,3 @@ Injection errors

.. automodule:: scrapy_poet.injection_errors
:members:

Registry
========

.. automodule:: scrapy_poet.registry
:members:
:show-inheritance:
11 changes: 6 additions & 5 deletions docs/rules-from-web-poet.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Rules from web-poet
===================

scrapy-poet fully supports the functionalities of :class:`web_poet.rules.ApplyRule`.
It has its own registry called :class:`scrapy_poet.registry.OverridesAndItemRegistry`
It uses the registry from web_poet called :class:`web_poet.rules.RulesRegistry`
which provides functionalties for:

* Returning the page object override if it exists for a given URL.
Expand Down Expand Up @@ -296,9 +296,10 @@ regarding :ref:`rules-item-class-example`.
Registry
========

As mentioned above, scrapy-poet has its own registry called
:class:`scrapy_poet.registry.OverridesAndItemRegistry`.
As mentioned above, scrapy-poet uses the registry from web-poet called
:class:`web_poet.rules.RulesRegistry`.

This registry implementation can be changed if needed. A different registry can
be configured by passing its class path to the ``SCRAPY_POET_REGISTRY`` setting.
Such registries must be a subclass of :class:`scrapy_poet.registry.OverridesRegistryBase`
and must implement the :meth:`scrapy_poet.registry.OverridesRegistryBase.overrides_for` method.
Such registries must be a subclass of :class:`web_poet.rules.RulesRegistry`
to ensure the expected methods and its types are properly accounted for.
12 changes: 6 additions & 6 deletions docs/settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,9 @@ SCRAPY_POET_RULES
Default: ``None``

Mapping of overrides for each domain. The format of the such ``dict`` mapping
depends on the currently set Registry. The default is currently
:class:`~.OverridesAndItemRegistry`. This can be overriden by the setting below:
``SCRAPY_POET_OVERRIDES_REGISTRY``.
depends on the currently set registry. The default is currently
:class:`web_poet.rules.RulesRegistry`. This can be overriden by the setting below:
``SCRAPY_POET_REGISTRY``.

There are sections dedicated for this at :ref:`intro-tutorial` and
:ref:`rules-from-web-poet`.
Expand All @@ -46,9 +46,9 @@ SCRAPY_POET_REGISTRY

Defaut: ``None``

Sets an alternative Registry to replace the default :class:`~.OverridesAndItemRegistry`.
To use this, set a ``str`` which denotes the absolute object path of the new
Registry.
Sets an alternative Registry to replace the default
:class:`web_poet.rules.RulesRegistry`. To use this, set a ``str`` which denotes
the absolute object path of the new registry.

More info at :ref:`rules-from-web-poet`.

Expand Down
11 changes: 5 additions & 6 deletions scrapy_poet/downloadermiddlewares.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@
from scrapy import Spider, signals
from scrapy.crawler import Crawler
from scrapy.http import Request, Response
from scrapy.utils.misc import create_instance, load_object
from scrapy.utils.misc import load_object
from twisted.internet.defer import Deferred, inlineCallbacks
from web_poet import RulesRegistry

from .api import DummyResponse
from .injection import Injector
Expand All @@ -22,7 +23,7 @@
RequestUrlProvider,
ResponseUrlProvider,
)
from .registry import OverridesAndItemRegistry
from .utils import create_registry_instance

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -60,12 +61,10 @@ def __init__(self, crawler: Crawler) -> None:
registry_cls = load_object(
settings.get(
"SCRAPY_POET_REGISTRY",
settings.get(
"SCRAPY_POET_OVERRIDES_REGISTRY", OverridesAndItemRegistry
),
settings.get("SCRAPY_POET_OVERRIDES_REGISTRY", RulesRegistry),
)
)
self.registry = create_instance(registry_cls, settings, crawler)
self.registry = create_registry_instance(registry_cls, crawler)
self.injector = Injector(
crawler,
default_providers=DEFAULT_PROVIDERS,
Expand Down
20 changes: 12 additions & 8 deletions scrapy_poet/injection.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,9 @@
from scrapy.statscollectors import StatsCollector
from scrapy.utils.conf import build_component_list
from scrapy.utils.defer import maybeDeferred_coro
from scrapy.utils.misc import create_instance, load_object
from scrapy.utils.misc import load_object
from twisted.internet.defer import inlineCallbacks
from web_poet import RulesRegistry
from web_poet.pages import is_injectable

from scrapy_poet.api import _CALLBACK_FOR_MARKER, DummyResponse
Expand All @@ -24,9 +25,8 @@
UndeclaredProvidedTypeError,
)
from scrapy_poet.page_input_providers import PageObjectInputProvider
from scrapy_poet.registry import OverridesAndItemRegistry, OverridesRegistryBase

from .utils import get_scrapy_data_path
from .utils import create_registry_instance, get_scrapy_data_path

logger = logging.getLogger(__name__)

Expand All @@ -42,11 +42,11 @@ def __init__(
crawler: Crawler,
*,
default_providers: Optional[Mapping] = None,
registry: Optional[OverridesRegistryBase] = None,
registry: Optional[RulesRegistry] = None,
):
self.crawler = crawler
self.spider = crawler.spider
self.registry = registry or OverridesAndItemRegistry()
self.registry = registry or RulesRegistry()
self.load_providers(default_providers)
self.init_cache()

Expand Down Expand Up @@ -138,7 +138,11 @@ def build_plan(self, request: Request) -> andi.Plan:
callback,
is_injectable=is_injectable,
externally_provided=self.is_class_provided_by_any_provider,
overrides=self.registry.overrides_for(request).get,
# Ignore the type since andi.plan expects overrides to be
# Callable[[Callable], Optional[Callable]] but the registry
# returns a more accurate typing for this scenario:
# Mapping[Type[ItemPage], Type[ItemPage]]
overrides=self.registry.overrides_for(request.url).get, # type: ignore[arg-type]
)

@inlineCallbacks
Expand Down Expand Up @@ -360,7 +364,7 @@ def is_provider_requiring_scrapy_response(provider):
def get_injector_for_testing(
providers: Mapping,
additional_settings: Optional[Dict] = None,
registry: Optional[OverridesRegistryBase] = None,
registry: Optional[RulesRegistry] = None,
) -> Injector:
"""
Return an :class:`Injector` using a fake crawler.
Expand All @@ -379,7 +383,7 @@ class MySpider(Spider):
spider.settings = settings
crawler.spider = spider
if not registry:
registry = create_instance(OverridesAndItemRegistry, settings, crawler)
registry = create_registry_instance(RulesRegistry, crawler)
return Injector(crawler, registry=registry)


Expand Down
6 changes: 0 additions & 6 deletions scrapy_poet/overrides.py

This file was deleted.

Loading

0 comments on commit bc6acb6

Please sign in to comment.