Skip to content

openzim/python-scraperlib

Repository files navigation

zimscraperlib

Build Status CodeFactor License: GPL v3 PyPI version shields.io PyPI - Python Version codecov

Collection of python code to re-use across python-based scrapers

Usage

  • This library is meant to be installed via PyPI (zimscraperlib).
  • Make sure to reference it using a version code as the API is subject to frequent changes.
  • API should remain the same only within the same minor version.

Example usage:

zimscraperlib>=1.1,<1.2

See functional architecture, software architecture and technical architecture for more details on scraperlib (not all aspects are covered yet, this is a WIP).

Dependencies

  • libmagic
  • wget
  • libzim (auto-installed, not available on Windows)
  • Pillow
  • FFmpeg
  • gifsicle (>=1.92)

macOS

brew install libmagic wget libtiff libjpeg webp little-cms2 ffmpeg gifsicle

Linux

sudo apt install libmagic1 wget ffmpeg \
    libtiff5-dev libjpeg8-dev libopenjp2-7-dev zlib1g-dev \
    libfreetype6-dev liblcms2-dev libwebp-dev tcl8.6-dev tk8.6-dev python3-tk \
    libharfbuzz-dev libfribidi-dev libxcb1-dev gifsicle

Alpine

apk add ffmpeg gifsicle libmagic wget libjpeg

Contribution

This project adheres to openZIM's Contribution Guidelines.

This project has implemented openZIM's Python bootstrap, conventions and policies v1.0.2.

pip install hatch
pip install ".[dev]"
pre-commit install
# For tests
invoke coverage

Users

Non-exhaustive list of scrapers using it (check status when updating API):