Skip to content

Commit

Permalink
docs: Add contributing docs and code comments to explain inter-proces…
Browse files Browse the repository at this point in the history
…s communication
  • Loading branch information
jpmckinney committed Jul 22, 2024
1 parent 543c42b commit 4d25ab8
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 0 deletions.
19 changes: 19 additions & 0 deletions docs/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,22 @@ To install an editable version for development, clone the repository, change to
.. code-block:: shell
pip install -e .
Developer documentation
-----------------------

Scrapyd starts Scrapy processes. It runs ``scrapy crawl`` in the :ref:`launcher`, and ``scrapy list`` in the :ref:`schedule.json` (to check the spider exists), :ref:`addversion.json` (to return the number of spiders) and :ref:`listspiders.json` (to return the names of spiders) webservices.

Environment variables
~~~~~~~~~~~~~~~~~~~~~

Scrapyd uses environment variables to communicate between the Scrapyd process and the Scrapy processes that it starts.

SCRAPY_PROJECT
The project to use. See ``scrapyd/runner.py``.
SCRAPYD_EGG_VERSION
The version of the project, to be retrieved as an egg from :ref:`eggstorage` and activated.
SCRAPY_SETTINGS_MODULE
The Python path to the `settings <https://docs.scrapy.org/en/latest/topics/settings.html#designating-the-settings>`__ module of the project.

This is usually the module from the `entry points <https://setuptools.pypa.io/en/latest/userguide/entry_point.html>`__ of the egg, but can be the module from the ``[settings]`` section of a :ref:`scrapy.cfg<config-settings>` file. See ``scrapyd/environ.py``.
6 changes: 6 additions & 0 deletions scrapyd/environ.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,18 @@ def get_settings(self, message):

def get_environment(self, message, slot):
project = message["_project"]

env = self.initenv.copy()
env["SCRAPY_PROJECT"] = project
# If the version is not provided, then the runner uses the default version, determined by egg storage.
if "_version" in message:
env["SCRAPYD_EGG_VERSION"] = message["_version"]
# Scrapy discovers the same scrapy.cfg files as Scrapyd. So, this is only needed if users are adding [settings]
# sections to Scrapyd configuration files (which Scrapy doesn't discover). This might lead to strange behavior
# if an egg project and a [settings] project have the same name (unlikely). Preserved, since committed in 2010.
if project in self.settings:
env["SCRAPY_SETTINGS_MODULE"] = self.settings[project]

return env

def _get_feed_uri(self, message, extension):
Expand Down
2 changes: 2 additions & 0 deletions scrapyd/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ def activate_egg(eggpath):

distribution.activate()

# setdefault() was added in https://github.com/scrapy/scrapyd/commit/0641a57. It's not clear why, since the egg
# should control its settings module. That said, it is unlikely to already be set.
os.environ.setdefault("SCRAPY_SETTINGS_MODULE", distribution.get_entry_info("scrapy", "settings").module_name)


Expand Down

0 comments on commit 4d25ab8

Please sign in to comment.