Skip to content

Commit

Permalink
docs: Copy-edit Overview and extract Quickstart
Browse files Browse the repository at this point in the history
  • Loading branch information
jpmckinney committed Jul 20, 2024
1 parent 94d624f commit 9dfda8c
Show file tree
Hide file tree
Showing 5 changed files with 62 additions and 89 deletions.
2 changes: 2 additions & 0 deletions docs/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,8 @@ Options

.. attention:: It is not recommended to use a low interval like 0.1 when using the default :ref:`spiderqueue` value. Consider a custom queue based on `queuelib <https://github.com/scrapy/queuelib>`__.

.. _config-launcher:

Launcher options
----------------

Expand Down
45 changes: 15 additions & 30 deletions docs/contributing.rst
Original file line number Diff line number Diff line change
@@ -1,52 +1,37 @@
.. _contributing:

Contributing
============

.. important:: Read through the `Scrapy Contribution Docs <http://scrapy.readthedocs.org/en/latest/contributing.html>`__ for tips relating to writing patches, reporting bugs, and project coding style.

These docs describe how to setup and contribute to Scrapyd.
.. important:: Read through the `Scrapy Contribution Docs <http://scrapy.readthedocs.org/en/latest/contributing.html>`__ for tips relating to writing patches, reporting bugs, and coding style.

Reporting issues & bugs
-----------------------
Issues and bugs
---------------

Issues should be reported to the Scrapyd project `issue tracker <https://github.com/scrapy/scrapyd/issues>`__ on GitHub.
Report on `GitHub <https://github.com/scrapy/scrapyd/issues>`__.

Tests
-----

Tests are implemented using the `Twisted unit-testing framework <https://docs.twisted.org/en/stable/development/test-standard.html>`__. Scrapyd uses ``trial`` as the test running application.

Running tests
-------------
Include tests in your pull requests.

To run all tests go to the root directory of the Scrapyd source code and run:
To run unit tests:

.. code-block:: shell
trial tests
pytest tests
To run a specific test (say ``tests/test_poller.py``) use:
To run integration tests:

.. code-block:: shell
trial tests.test_poller
Writing tests
-------------

All functionality (including new features and bug fixes) should include a test
case to check that it works as expected, so please include tests for your
patches if you want them to get accepted sooner.

Scrapyd uses unit tests, which are located in the `tests <https://github.com/scrapy/scrapyd/tree/master/tests>`__ directory.
Their module name typically resembles the full path of the module they're testing.
For example, the scheduler code is in ``scrapyd.scheduler`` and its unit tests are in ``tests/test_scheduler.py``.
printf "[scrapyd]\nusername = hello12345\npassword = 67890world\n" > scrapyd.conf
mkdir logs
scrapyd &
pytest integration_tests
Installing locally
------------------
Installation
------------

To install a locally edited version of Scrapyd onto the system to use and test, inside the project root run:
To install an editable version for development, clone the repository, change to its directory, and run:

.. code-block:: shell
Expand Down
5 changes: 0 additions & 5 deletions docs/deploy.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,6 @@
Deployment
==========

Deploying a Scrapy project
--------------------------

This involves building a `Python egg <https://setuptools.pypa.io/en/latest/deprecated/python_eggs.html>`__ and uploading it to Scrapyd via the `addversion.json <https://scrapyd.readthedocs.org/en/latest/api.html#addversion-json>`_ webservice. Do this easily with the `scrapyd-deploy` command from the `scrapyd-client <https://github.com/scrapy/scrapyd-client>`__ package.

.. _docker:

Creating a Docker image
Expand Down
40 changes: 37 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,22 +1,56 @@
=================
Scrapyd |release|
=================

.. include:: ../README.rst

Installation
------------
Quickstart
==========

Install Scrapyd
---------------

.. code-block:: shell
pip install scrapyd
Start Scrapyd
-------------

.. code-block:: shell
scrapyd
See :doc:`overview` and :doc:`config` for more details.

Upload a project
----------------

This involves building a `Python egg <https://setuptools.pypa.io/en/latest/deprecated/python_eggs.html>`__ and uploading it to Scrapyd via the `addversion.json <https://scrapyd.readthedocs.org/en/latest/api.html#addversion-json>`_ webservice.

Do this easily with the `scrapyd-deploy` command from the `scrapyd-client <https://github.com/scrapy/scrapyd-client>`__ package. Once configured:

.. code-block:: shell
scrapyd-deploy
Schedule a crawl
----------------

.. code-block:: shell-session
$ curl http://localhost:6800/schedule.json -d project=myproject -d spider=spider2
{"status": "ok", "jobid": "26d1b1a6d6f111e0be5c001e648c57f8"}
See :doc:`api` for more details.

.. toctree::
:maxdepth: 2
:caption: Contents

overview
config
deploy
api
deploy
contributing
news
59 changes: 8 additions & 51 deletions docs/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,74 +5,31 @@ Overview
Projects and versions
=====================

Scrapyd can manage multiple projects and each project can have multiple
versions uploaded, but only the latest one will be used for launching new
spiders.
Scrapyd can manage multiple Scrapy projects. Each project can have multiple versions. The latest version is used by default for starting spiders.

A common (and useful) convention to use for the version name is the revision
number of the version control tool you're using to track your Scrapy project
code. For example: ``r23``. The versions are not compared alphabetically but
using a smarter algorithm (the same `packaging <https://pypi.org/project/packaging/>`__ uses) so ``r10`` compares
greater to ``r9``, for example.
The latest version is the alphabetically greatest, unless all version names are `version specifiers <https://packaging.python.org/en/latest/specifications/version-specifiers/>`__ like ``1.0`` or ``1.0rc1``, in which case they are sorted as such.

How Scrapyd works
=================

Scrapyd is an application (typically run as a daemon) that listens to requests
for spiders to run and spawns a process for each one, which basically
executes:
Scrapyd is a server (typically run as a daemon) that listens for :doc:`api` and :ref:`webui` requests.

.. code-block:: shell
scrapy crawl myspider
Scrapyd also runs multiple processes in parallel, allocating them in a fixed
number of slots given by the :ref:`max_proc` and :ref:`max_proc_per_cpu` options,
starting as many processes as possible to handle the load.

In addition to dispatching and managing processes, Scrapyd provides a
:doc:`api` to upload new project versions
(as eggs) and schedule spiders. This feature is optional and can be disabled if
you want to implement your own custom Scrapyd. The components are pluggable and
can be changed, if you're familiar with the `Twisted Application Framework <https://docs.twisted.org/en/stable/core/howto/application.html>`__
which Scrapyd is implemented in.

Starting from 0.11, Scrapyd also provides a minimal :ref:`web interface
<webui>`.

Starting Scrapyd
================

To start the service, use the ``scrapyd`` command provided in the Scrapy
distribution:
The API is especially used to upload projects and schedule crawls. To start a crawl, Scrapyd spawns a process that essentially runs:

.. code-block:: shell
scrapyd
That should get your Scrapyd started.

Scheduling a spider run
=======================

To schedule a spider run:

.. code-block:: shell-session
scrapy crawl myspider
$ curl http://localhost:6800/schedule.json -d project=myproject -d spider=spider2
{"status": "ok", "jobid": "26d1b1a6d6f111e0be5c001e648c57f8"}
Scrapyd runs multiple processes in parallel, and manages the number of concurrent processes. See :ref:`config-launcher` for details.

For more resources see: :doc:`api` for more available resources.
If you are familiar with the `Twisted Application Framework <https://docs.twisted.org/en/stable/core/howto/application.html>`__, you can essentially reconfigure every part of Scrapyd. See :doc:`config` for details.

.. _webui:

Web interface
=============

Scrapyd comes with a minimal web interface (for monitoring running processes
and accessing logs) which can be accessed at http://localhost:6800/

Other options to manage your Scrapyd cluster include:
Scrapyd has a minimal web interface for monitoring running processes and accessing log files and item fees. By default, is is available at at http://localhost:6800/ Other options to manage Scrapyd include:

- `ScrapydWeb <https://github.com/my8100/scrapydweb>`__
- `spider-admin-pro <https://github.com/mouday/spider-admin-pro>`__

0 comments on commit 9dfda8c

Please sign in to comment.