From 9dfda8cbd2bb48cf4dafebbb0f8864cf56f56b8b Mon Sep 17 00:00:00 2001 From: James McKinney <26463+jpmckinney@users.noreply.github.com> Date: Sat, 20 Jul 2024 15:24:02 -0400 Subject: [PATCH] docs: Copy-edit Overview and extract Quickstart --- docs/config.rst | 2 ++ docs/contributing.rst | 45 +++++++++++---------------------- docs/deploy.rst | 5 ---- docs/index.rst | 40 ++++++++++++++++++++++++++--- docs/overview.rst | 59 ++++++------------------------------------- 5 files changed, 62 insertions(+), 89 deletions(-) diff --git a/docs/config.rst b/docs/config.rst index 18a09a7b..21820b11 100644 --- a/docs/config.rst +++ b/docs/config.rst @@ -203,6 +203,8 @@ Options .. attention:: It is not recommended to use a low interval like 0.1 when using the default :ref:`spiderqueue` value. Consider a custom queue based on `queuelib `__. +.. _config-launcher: + Launcher options ---------------- diff --git a/docs/contributing.rst b/docs/contributing.rst index cfb3ac07..c5cef0d7 100644 --- a/docs/contributing.rst +++ b/docs/contributing.rst @@ -1,52 +1,37 @@ -.. _contributing: - Contributing ============ -.. important:: Read through the `Scrapy Contribution Docs `__ for tips relating to writing patches, reporting bugs, and project coding style. - -These docs describe how to setup and contribute to Scrapyd. +.. important:: Read through the `Scrapy Contribution Docs `__ for tips relating to writing patches, reporting bugs, and coding style. -Reporting issues & bugs ------------------------ +Issues and bugs +--------------- -Issues should be reported to the Scrapyd project `issue tracker `__ on GitHub. +Report on `GitHub `__. Tests ----- -Tests are implemented using the `Twisted unit-testing framework `__. Scrapyd uses ``trial`` as the test running application. - -Running tests -------------- +Include tests in your pull requests. -To run all tests go to the root directory of the Scrapyd source code and run: +To run unit tests: .. code-block:: shell - trial tests + pytest tests -To run a specific test (say ``tests/test_poller.py``) use: +To run integration tests: .. code-block:: shell - trial tests.test_poller - -Writing tests -------------- - -All functionality (including new features and bug fixes) should include a test -case to check that it works as expected, so please include tests for your -patches if you want them to get accepted sooner. - -Scrapyd uses unit tests, which are located in the `tests `__ directory. -Their module name typically resembles the full path of the module they're testing. -For example, the scheduler code is in ``scrapyd.scheduler`` and its unit tests are in ``tests/test_scheduler.py``. + printf "[scrapyd]\nusername = hello12345\npassword = 67890world\n" > scrapyd.conf + mkdir logs + scrapyd & + pytest integration_tests -Installing locally ------------------- +Installation +------------ -To install a locally edited version of Scrapyd onto the system to use and test, inside the project root run: +To install an editable version for development, clone the repository, change to its directory, and run: .. code-block:: shell diff --git a/docs/deploy.rst b/docs/deploy.rst index bd43afae..2a8b9fd4 100644 --- a/docs/deploy.rst +++ b/docs/deploy.rst @@ -1,11 +1,6 @@ Deployment ========== -Deploying a Scrapy project --------------------------- - -This involves building a `Python egg `__ and uploading it to Scrapyd via the `addversion.json `_ webservice. Do this easily with the `scrapyd-deploy` command from the `scrapyd-client `__ package. - .. _docker: Creating a Docker image diff --git a/docs/index.rst b/docs/index.rst index fe727527..eb3655c0 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,22 +1,56 @@ +================= Scrapyd |release| ================= .. include:: ../README.rst -Installation ------------- +Quickstart +========== + +Install Scrapyd +--------------- .. code-block:: shell pip install scrapyd +Start Scrapyd +------------- + +.. code-block:: shell + + scrapyd + +See :doc:`overview` and :doc:`config` for more details. + +Upload a project +---------------- + +This involves building a `Python egg `__ and uploading it to Scrapyd via the `addversion.json `_ webservice. + +Do this easily with the `scrapyd-deploy` command from the `scrapyd-client `__ package. Once configured: + +.. code-block:: shell + + scrapyd-deploy + +Schedule a crawl +---------------- + +.. code-block:: shell-session + + $ curl http://localhost:6800/schedule.json -d project=myproject -d spider=spider2 + {"status": "ok", "jobid": "26d1b1a6d6f111e0be5c001e648c57f8"} + +See :doc:`api` for more details. + .. toctree:: :maxdepth: 2 :caption: Contents overview config - deploy api + deploy contributing news diff --git a/docs/overview.rst b/docs/overview.rst index 0a0bb895..7013fa8b 100644 --- a/docs/overview.rst +++ b/docs/overview.rst @@ -5,74 +5,31 @@ Overview Projects and versions ===================== -Scrapyd can manage multiple projects and each project can have multiple -versions uploaded, but only the latest one will be used for launching new -spiders. +Scrapyd can manage multiple Scrapy projects. Each project can have multiple versions. The latest version is used by default for starting spiders. -A common (and useful) convention to use for the version name is the revision -number of the version control tool you're using to track your Scrapy project -code. For example: ``r23``. The versions are not compared alphabetically but -using a smarter algorithm (the same `packaging `__ uses) so ``r10`` compares -greater to ``r9``, for example. +The latest version is the alphabetically greatest, unless all version names are `version specifiers `__ like ``1.0`` or ``1.0rc1``, in which case they are sorted as such. How Scrapyd works ================= -Scrapyd is an application (typically run as a daemon) that listens to requests -for spiders to run and spawns a process for each one, which basically -executes: +Scrapyd is a server (typically run as a daemon) that listens for :doc:`api` and :ref:`webui` requests. -.. code-block:: shell - - scrapy crawl myspider - -Scrapyd also runs multiple processes in parallel, allocating them in a fixed -number of slots given by the :ref:`max_proc` and :ref:`max_proc_per_cpu` options, -starting as many processes as possible to handle the load. - -In addition to dispatching and managing processes, Scrapyd provides a -:doc:`api` to upload new project versions -(as eggs) and schedule spiders. This feature is optional and can be disabled if -you want to implement your own custom Scrapyd. The components are pluggable and -can be changed, if you're familiar with the `Twisted Application Framework `__ -which Scrapyd is implemented in. - -Starting from 0.11, Scrapyd also provides a minimal :ref:`web interface -`. - -Starting Scrapyd -================ - -To start the service, use the ``scrapyd`` command provided in the Scrapy -distribution: +The API is especially used to upload projects and schedule crawls. To start a crawl, Scrapyd spawns a process that essentially runs: .. code-block:: shell - scrapyd - -That should get your Scrapyd started. - -Scheduling a spider run -======================= - -To schedule a spider run: - -.. code-block:: shell-session + scrapy crawl myspider - $ curl http://localhost:6800/schedule.json -d project=myproject -d spider=spider2 - {"status": "ok", "jobid": "26d1b1a6d6f111e0be5c001e648c57f8"} +Scrapyd runs multiple processes in parallel, and manages the number of concurrent processes. See :ref:`config-launcher` for details. -For more resources see: :doc:`api` for more available resources. +If you are familiar with the `Twisted Application Framework `__, you can essentially reconfigure every part of Scrapyd. See :doc:`config` for details. .. _webui: Web interface ============= -Scrapyd comes with a minimal web interface (for monitoring running processes -and accessing logs) which can be accessed at http://localhost:6800/ - -Other options to manage your Scrapyd cluster include: +Scrapyd has a minimal web interface for monitoring running processes and accessing log files and item fees. By default, is is available at at http://localhost:6800/ Other options to manage Scrapyd include: - `ScrapydWeb `__ - `spider-admin-pro `__