Skip to content

Commit

Permalink
feat: Add a poller setting
Browse files Browse the repository at this point in the history
  • Loading branch information
jpmckinney committed Jul 19, 2024
1 parent 76887a7 commit 276fab5
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 6 deletions.
21 changes: 19 additions & 2 deletions docs/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ Default
spiderqueue
~~~~~~~~~~~

The class that stores job queues.
The class that stores pending jobs.

Default
``scrapyd.spiderqueue.SqliteSpiderQueue``
Expand All @@ -151,12 +151,29 @@ Also used by
Poller options
--------------

.. _poller:

poller
~~~~~~

The class that tracks capacity for new jobs, and starts jobs when ready.

Default
``scrapyd.poller.QueuePoller``
Options
- ``scrapyd.poller.QueuePoller``. When using the default :ref:`application` and :ref:`launcher` values:

- The launcher adds :ref:`max_proc` capacity at startup, and one capacity each time a Scrapy process ends.
- The :ref:`application` starts a timer so that, every :ref:`poll_interval` seconds, a job starts if there's capacity: that is, if the number of Scrapy processes that are running is less than the :ref:`max_proc` value.

- Implement your own, using the ``IPoller`` interface

.. _poll_interval:

poll_interval
~~~~~~~~~~~~~

The number of seconds to wait between checking whether the number of Scrapy processes that are running is less than the :ref:`max_proc` value.
The number of seconds between capacity checks.

Default
``5.0``
Expand Down
1 change: 1 addition & 0 deletions docs/news.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Added

- Add a :ref:`status.json` webservice, to get the status of a job.
- Add a :ref:`unix_socket_path` setting, to listen on a Unix socket.
- Add a :ref:`poller` setting.
- Respond to HTTP ``OPTIONS`` method requests.
- Add environment variables to override common options. See :doc:`config`.

Expand Down
10 changes: 6 additions & 4 deletions scrapyd/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
from scrapyd.basicauth import PublicHTMLRealm, StringCredentialsChecker
from scrapyd.environ import Environment
from scrapyd.interfaces import IEggStorage, IEnvironment, IJobStorage, IPoller, ISpiderScheduler
from scrapyd.poller import QueuePoller
from scrapyd.scheduler import SpiderScheduler


Expand Down Expand Up @@ -43,13 +42,16 @@ def application(config):
unix_socket_path = os.getenv("SCRAPYD_UNIX_SOCKET_PATH") or config.get("unix_socket_path", "")
poll_interval = config.getfloat("poll_interval", 5)

poller = QueuePoller(config)
scheduler = SpiderScheduler(config)
app.setComponent(ISpiderScheduler, scheduler)

environment = Environment(config)
app.setComponent(IEnvironment, environment)

poller_path = config.get("poller", "scrapyd.poller.QueuePoller")
poller_cls = load_object(poller_path)
poller = poller_cls(config)
app.setComponent(IPoller, poller)
app.setComponent(ISpiderScheduler, scheduler)
app.setComponent(IEnvironment, environment)

jobstorage_path = config.get("jobstorage", "scrapyd.jobstorage.MemoryJobStorage")
jobstorage_cls = load_object(jobstorage_path)
Expand Down

0 comments on commit 276fab5

Please sign in to comment.