This provides a Hatch(ling) plugin for common openZIM operations:
- automatically populate common project metadatas
- install static files (e.g. external JS dependencies) at build time
This plugin intentionally has few dependencies, using the Python standard library whenever possible and hence limiting footprint to a minimum.
hatch-openzim adheres to openZIM's Contribution Guidelines.
hatch-openzim has implemented openZIM's Python bootstrap, conventions and policies v1.0.1.
Assuming you have an openZIM project, you could use such a configuration in your pyproject.toml
# Use the hatchling build backend, with the hatch-openzim plugin.
[build-system]
requires = ["hatchling", "hatch-openzim"]
build-backend = "hatchling.build"
[project]
name = "MyAwesomeScraper"
requires-python = ">=3.11,<3.12"
description = "Awesome scraper"
readme = "README.md"
# These project metadatas are dynamic because they will be generated from hatch-openzim
# and version plugins.
dynamic = ["authors", "classifiers", "keywords", "license", "version", "urls"]
# Enable the hatch-openzim metadata hook to generate default openZIM metadata.
[tool.hatch.metadata.hooks.openzim-metadata]
additional-keywords = ["awesome"] # some additional keywords
kind = "scraper" # indicate this is a scraper, so that additional keywords are added
# Additional author #1
[[tool.hatch.metadata.hooks.openzim-metadata.additional-authors]]
name="Bob"
email="[email protected]"
# Additional author #2
[[tool.hatch.metadata.hooks.openzim-metadata.additional-authors]]
name="Alice"
email="[email protected]"
# Enable the hatch-openzim build hook to install files (e.g. JS libs) at build time.
[tool.hatch.build.hooks.openzim-build]
toml-config = "openzim.toml" # optional location of the configuration file
dependencies = [ "zimscraperlib==3.1.0" ] # optional dependencies needed for file installations
NOTA: the dependencies
attribute is not specific to our hook(s), it is a generic hatch(ling) feature.
Variable | Required | Description |
---|---|---|
additional-authors |
N | List of authors that will be appended to the automatic one |
additional-classifiers |
N | List of classifiers that will be appended to the automatic ones |
additional-keywords |
N | List of keywords that will be appended to the automatic ones |
kind |
N | If set to scraper , scrapers keywords will be automatically added as well |
organization |
N | Override organization (otherwise detected from Github repository to set author and keyword appropriately). Case-insentive. Supported values are openzim , kiwix and offspot |
preserve-authors |
N | Boolean indicating that we do not want to set authors metadata but use the ones of pyproject.toml |
preserve-classifiers |
N | Boolean indicating that we do not want to set classifiers metadata but use the ones of pyproject.toml |
preserve-keywords |
N | Boolean indicating that we do not want to set keywords metadata but use the ones of pyproject.toml |
preserve-license |
N | Boolean indicating that we do not want to set license metadata but use the one of pyproject.toml |
preserve-urls |
N | Boolean indicating that we do not want to set urls metadata but use the ones of pyproject.toml |
The metadata hook will set:
authors
to[{"email": "[email protected]", "name": "Kiwix"}]
classifiers
will contain:License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
- all
Programming Language :: Python :: x
andProgramming Language :: Python :: x.y
matching therequired-versions
keywords
will contain:- at least
kiwix
- if
kind
isscraper
, it will addzim
andoffline
- and
additional-keywords
passed in the configuration
- at least
license
to{"text": "GPL-3.0-or-later"}
urls
toDonate
:https://www.kiwix.org/en/support-us/
Homepage
: Github repository URL (e.g.https://github.com/openzim/hatch-openzim
) if code is a git clone, otherwisehttps://www.kiwix.org
Variable | Required | Description |
---|---|---|
toml-config |
N | Location of the configuration, default to openzim.toml |
The build hook detailed configuration is done in a TOML file named openzim.toml
(if not customized
via toml-config
, see above). This file must be placed your project root folder, next to your
pyproject.toml
.
The build hook supports to download web resources at various location at build time.
To configure, this you first have to create a files
section in the openzim.toml
configuration
and declare its config
configuration. Name of the section (assets
in example below) is
free (do not forgot to escape it if you want to use special chars like .
in the name).
[files.assets.config]
target_dir="src/hatch_openzim/templates/assets"
execute_after=[
"touch somewhere/something.txt"
]
Variable | Required | Description |
---|---|---|
target_dir |
Y | Base directory where all downloaded content will be placed |
execute_after |
N | List of shell commands to execute once all actions (see below) have been executed; actions are executed with target_dir as current working directory |
Important: The execute_after
commands are always executed, no matter how many action are
present or how many actions have been ignored (see below for details about why an action might be ignored).
Nota: The example execute_after
command (touch
) is not representative of what you would usually do ^^
Once this section configuration is done, you will then declare multiple actions. All actions in a given section share the same base configuration declared above.
Three kinds of actions are supported:
get_file
: downloads a file to a locationextract_all
: extracts all content of a zip file to a locationextract_items
: extracts some items of a zip file to some locations
Each action is declared in its own TOML table. Action names are free.
[files.assets.actions.some_name]
action=...
This action downloads a file to a location.
Important: If target_file
is already present, the action is not executed, it is simply ignored.
Variable | Required | Description |
---|---|---|
action |
Y | Must be "get_file" |
source |
Y | URL of the online resource to download |
target_file |
Y | Relative path to the file target location, relative to the section target_dir |
execute_after |
N | List of shell commands to execute once file installation is completed; actions are executed with the section target_dir as current working directory |
You will find a sample below.
[files.assets.actions."jquery.min.js"]
action="get_file"
source="https://code.jquery.com/jquery-3.5.1.min.js"
target_file="jquery.min.js"
This action downloads a ZIP and extracts it to a location. Some items in the Zip content can be removed afterwards.
Important: If target_dir
is already present, the action is not executed, it is simply ignored.
Variable | Required | Description |
---|---|---|
action |
Y | Must be "extract_all" |
source |
Y | URL of the online ZIP to download |
target_dir |
Y | Relative path of the directory where ZIP content will be extracted, relative to the section target_dir |
remove |
N | List of glob patterns of ZIP content to remove after extraction (relative to action target_dir ) |
execute_after |
N | List of shell commands to execute once files extraction is completed; actions are executed with the section target_dir as current working directory |
You will find a sample below.
Nota:
- the ZIP is first saved to a temporary location before extraction, consuming some disk space
[files.assets.actions.chosen]
action="extract_all"
source="https://github.com/harvesthq/chosen/releases/download/v1.8.7/chosen_v1.8.7.zip"
target_dir="chosen"
remove=["docsupport", "chosen.proto.*", "*.html", "*.md"]
This action extracts a ZIP to a temporary directory, and move selected items to some locations. Some sub-items in the Zip content can be removed afterwards.
Important: If any target_paths
is already present, the action is not executed, it is simply ignored.
Variable | Required | Description |
---|---|---|
action |
Y | Must be "extract_all" |
source |
Y | URL of the online ZIP to download |
zip_paths |
Y | List of relative path in ZIP to select |
target_paths |
Y | Relative path of the target directory where selected items will be moved (relative to ZIP home folder) |
remove |
N | List of glob patterns of ZIP content to remove after extraction (must include the necessary target_paths , they are relative to the section target_dir ) |
execute_after |
N | List of shell commands to execute once ZIP extraction is completed; actions are executed with the section target_dir as current working directory |
Nota:
- the
zip_paths
andtarget_paths
are matched one-by-one, and must hence have the same length. - the ZIP is first saved to a temporary location before extraction, consuming some disk space
- all content is extracted before selected items are moved, and the rest is deleted
You will find a sample below.
[files.assets.actions.ogvjs]
action="extract_items"
source="https://github.com/brion/ogv.js/releases/download/1.8.9/ogvjs-1.8.9.zip"
zip_paths=["ogvjs-1.8.9"]
target_paths=["ogvjs"]
remove=["ogvjs/COPYING", "ogvjs/*.txt", "ogvjs/README.md"]
A full example with two distinct sections and three actions in total is below.
Nota: The touch
command in execute_after
is not representative of what you would usually do ^^
[files.assets.config]
target_dir="src/hatch_openzim/templates/assets"
execute_after=[
"fix_ogvjs_dist .",
]
[files.assets.actions."jquery.min.js"]
action="get_file"
source="https://code.jquery.com/jquery-3.5.1.min.js"
target_file="jquery.min.js"
execute_after=[
"touch done.txt",
]
[files.assets.actions.chosen]
action="extract_all"
source="https://github.com/harvesthq/chosen/releases/download/v1.8.7/chosen_v1.8.7.zip"
target_dir="chosen"
remove=["docsupport", "chosen.proto.*", "*.html", "*.md"]
[files.videos.config]
target_dir="src/hatch_openzim/templates/videos"
[files.videos.actions.ogvjs]
action="extract_items"
source="https://github.com/brion/ogv.js/releases/download/1.8.9/ogvjs-1.8.9.zip"
zip_paths=["ogvjs-1.8.9"]
target_paths=["ogvjs"]
remove=["ogvjs/COPYING", "ogvjs/*.txt", "ogvjs/README.md"]