Skip to content

Commit 896ab41

Browse files
Mokuro 0.2.0 (#91)
* web reader * remove unused code * fix * update version * fix * detect broken cache * update version * fix * update favicon * fix deleting volume * ignore volumes uploaded with missing files * fix drag and drop on firefox * recreate _ocr cache from mokuro file if missing * update version * alert user when they upload .mokuro file without corresponding volume data * fix natural sort for full width numerals * make drag and drop work on whole catalog popup * change service worker to network first * display storage usage * request persistent storage * error message on missing paths * Search for webp files too when constructing img_paths (#86) * update tests, add help * bug fix * delete reader * change disable_html -> legacy_html; add test for converting legacy _ocr to .mokuro * update readme * replace setup.py with pyproject.toml * exclude ctd from ruff * format with ruff * fox mokuro entry point * update gh workflows * fix package discovery * fix missing setuptools dependency (needed for pkg_resources used by CTD) * add --version flag --------- Co-authored-by: precondition <[email protected]>
1 parent 5b955aa commit 896ab41

File tree

141 files changed

+3146
-2284
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

141 files changed

+3146
-2284
lines changed

.github/workflows/main.yml

+3-5
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,13 @@ permissions:
1010
contents: read
1111

1212
jobs:
13-
build:
13+
test:
1414
runs-on: ubuntu-latest
1515

1616
steps:
1717
- name: Checkout
1818
uses: actions/checkout@v3
1919
with:
20-
path: mokuro
2120
submodules: recursive
2221

2322
- name: Set up Python
@@ -28,9 +27,8 @@ jobs:
2827
- name: Install dependencies
2928
run: |
3029
python -m pip install --upgrade pip
31-
pip install pytest
32-
pip install -e mokuro
30+
pip install -e ".[dev]"
3331
3432
- name: Test
3533
run: |
36-
pytest -v mokuro/tests
34+
pytest

.github/workflows/publish-to-pypi.yml

+122
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
name: Publish package to PyPI and TestPyPI
2+
3+
on:
4+
push:
5+
tags:
6+
- 'v[0-9]+.[0-9]+.[0-9]+-[a-zA-Z]+.[0-9]+'
7+
- 'v[0-9]+.[0-9]+.[0-9]+-[a-zA-Z]*'
8+
- 'v[0-9]+.[0-9]+.[0-9]+'
9+
10+
jobs:
11+
build:
12+
name: Build dist
13+
runs-on: ubuntu-latest
14+
15+
steps:
16+
- uses: actions/checkout@v4
17+
- name: Set up Python
18+
uses: actions/setup-python@v5
19+
with:
20+
python-version: "3.x"
21+
- name: Install pypa/build
22+
run: >-
23+
python3 -m
24+
pip install
25+
build
26+
--user
27+
- name: Build a binary wheel and a source tarball
28+
run: python3 -m build
29+
- name: Store the distribution packages
30+
uses: actions/upload-artifact@v3
31+
with:
32+
name: python-package-distributions
33+
path: dist/
34+
35+
publish-to-pypi:
36+
name: >-
37+
Publish to PyPI
38+
if: startsWith(github.ref, 'refs/tags/') # only publish to PyPI on tag pushes
39+
needs:
40+
- build
41+
runs-on: ubuntu-latest
42+
environment:
43+
name: pypi
44+
url: https://pypi.org/p/mokuro # Replace <package-name> with your PyPI project name
45+
permissions:
46+
id-token: write # IMPORTANT: mandatory for trusted publishing
47+
48+
steps:
49+
- name: Download all the dists
50+
uses: actions/download-artifact@v3
51+
with:
52+
name: python-package-distributions
53+
path: dist/
54+
- name: Publish to PyPI
55+
uses: pypa/gh-action-pypi-publish@release/v1
56+
57+
github-release:
58+
name: >-
59+
Sign with Sigstore
60+
and upload them to GitHub Release
61+
needs:
62+
- publish-to-pypi
63+
runs-on: ubuntu-latest
64+
65+
permissions:
66+
contents: write # IMPORTANT: mandatory for making GitHub Releases
67+
id-token: write # IMPORTANT: mandatory for sigstore
68+
69+
steps:
70+
- name: Download all the dists
71+
uses: actions/download-artifact@v3
72+
with:
73+
name: python-package-distributions
74+
path: dist/
75+
- name: Sign the dists with Sigstore
76+
uses: sigstore/[email protected]
77+
with:
78+
inputs: >-
79+
./dist/*.tar.gz
80+
./dist/*.whl
81+
- name: Create GitHub Release
82+
env:
83+
GITHUB_TOKEN: ${{ github.token }}
84+
run: >-
85+
gh release create
86+
'${{ github.ref_name }}'
87+
--repo '${{ github.repository }}'
88+
--notes ""
89+
- name: Upload artifact signatures to GitHub Release
90+
env:
91+
GITHUB_TOKEN: ${{ github.token }}
92+
# Upload to GitHub Release using the `gh` CLI.
93+
# `dist/` contains the built packages, and the
94+
# sigstore-produced signatures and certificates.
95+
run: >-
96+
gh release upload
97+
'${{ github.ref_name }}' dist/**
98+
--repo '${{ github.repository }}'
99+
100+
publish-to-testpypi:
101+
name: Publish to TestPyPI
102+
needs:
103+
- build
104+
runs-on: ubuntu-latest
105+
106+
environment:
107+
name: testpypi
108+
url: https://test.pypi.org/p/mokuro
109+
110+
permissions:
111+
id-token: write # IMPORTANT: mandatory for trusted publishing
112+
113+
steps:
114+
- name: Download all the dists
115+
uses: actions/download-artifact@v3
116+
with:
117+
name: python-package-distributions
118+
path: dist/
119+
- name: Publish to TestPyPI
120+
uses: pypa/gh-action-pypi-publish@release/v1
121+
with:
122+
repository-url: https://test.pypi.org/legacy/

.github/workflows/ruff.yml

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
name: Ruff
2+
on: [ push, pull_request ]
3+
jobs:
4+
ruff:
5+
runs-on: ubuntu-latest
6+
steps:
7+
- uses: actions/checkout@v4
8+
- uses: chartboost/ruff-action@v1

MANIFEST.in

-3
This file was deleted.

README.md

+36-8
Original file line numberDiff line numberDiff line change
@@ -8,25 +8,32 @@ https://user-images.githubusercontent.com/22717958/164993274-3e8d1650-9be3-457d-
88

99
<sup>Demo contains excerpt from [Manga109-s dataset](http://www.manga109.org/en/download_s.html). うちの猫’ず日記 © がぁさん</sup>
1010

11-
mokuro is aimed towards Japanese learners, who want to read manga in Japanese with a pop-up dictionary like [Yomichan](https://github.com/FooSoft/yomichan).
11+
mokuro is aimed towards Japanese learners, who want to read manga in Japanese with a pop-up dictionary like [Yomitan](https://github.com/themoeway/yomitan).
1212
It works like this:
1313
1. Perform text detection and OCR for each page.
14-
2. After processing a whole volume, generate a HTML file, which you can open in a browser.
15-
3. All processing is done offline (before reading). You can transfer the resulting HTML file together with manga images to
16-
another device (e.g. your mobile phone) and read there.
14+
2. After processing a whole volume, generate a .mokuro file, which contains OCR results and metadata. All processing is done offline (before reading).
15+
3. Load the .mokuro file together with manga images in [web reader](https://reader.mokuro.app/), which serves both as a manga reader and a catalog for processed series and volumes.
16+
17+
Alternatively, you can still use the old method from mokuro 0.1.*:
18+
Instead of a .mokuro file, generate an HTML file, which you can open in a browser.
19+
You can transfer the resulting HTML file together with manga images to another device (e.g. your mobile phone) and read there.
20+
This method is still supported for backward compatibility, but it is recommended to use the new .mokuro format and the web reader.
21+
For details, see [Legacy HTML vs. new .mokuro format](#legacy-html-vs-new-mokuro-format).
1722

1823
mokuro uses [comic-text-detector](https://github.com/dmMaze/comic-text-detector) for text detection
1924
and [manga-ocr](https://github.com/kha-white/manga-ocr) for OCR.
2025

2126
Try running on your manga in Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kha-white/mokuro/blob/master/notebooks/mokuro_demo.ipynb)
2227

2328
See also:
29+
- [mokuro-reader](https://github.com/ZXY101/mokuro-reader), a web reader for mokuro, developed by ZXY101
2430
- [Mokuro2Pdf](https://github.com/Kartoffel0/Mokuro2Pdf), cli Ruby script to generate pdf files with selectable text from Mokuro's html overlay
2531
- [Xelieu's guide](https://xelieu.github.io/jp-lazy-guide/setupMangaOnPC/), a comprehensive guide on setting up a reading and mining workflow with manga-ocr/mokuro (and many other useful tips)
2632

2733
# Installation
2834

29-
You need Python 3.6 or newer. Please note, that the newest Python release might not be supported due to a PyTorch dependency, which often breaks with new Python releases and needs some time to catch up.
35+
You need Python 3.6 or newer. Please note, that the newest Python release might not be supported due to a PyTorch dependency,
36+
which often breaks with new Python releases and needs some time to catch up.
3037
Refer to [PyTorch website](https://pytorch.org/get-started/locally/) for a list of supported Python versions.
3138

3239
Some users have reported problems with Python installed from Microsoft Store. If you see an error:
@@ -86,11 +93,32 @@ mokuro --parent_dir manga_title/
8693
## Other options
8794

8895
```
89-
--force_cpu - disable GPU
90-
--as_one_file - generate separate css and js files instead of embedding everything in html
91-
--disable_confirmation - run without asking for confirmation
96+
--pretrained_model_name_or_path: Name or path of the manga-ocr model.
97+
--force_cpu: Force the use of CPU even if CUDA is available.
98+
--disable_confirmation: Disable confirmation prompt. If False, the user will be prompted to confirm the list of volumes to be processed.
99+
--disable_ocr: Disable OCR processing. Generate mokuro/HTML files without OCR results.
100+
--ignore_errors: Continue processing volumes even if an error occurs.
101+
--no_cache: Do not use cached OCR results from previous runs (_ocr directories).
102+
--unzip: Extract volumes in zip/cbz format in their original location.
103+
--disable_html: Disable legacy HTML output. If True, acts as if --unzip is True.
104+
--as_one_file: Applies only to legacy HTML. If False, generate separate CSS and JS files instead of embedding them in the HTML file.
105+
--version: Print the version of mokuro and exit.
92106
```
93107

108+
## Legacy HTML vs. new .mokuro format
109+
110+
Before version 0.2.0, mokuro generated a separate HTML file for each processed volume, which caused some usability issues:
111+
- HTML files contained both the OCR results and the whole web reader GUI, so in order to update the GUI, all volumes needed to be updated with a new mokuro version
112+
- images were stored separately and linked in HTML files, so any change in the directory structure could break the links
113+
- transferring the manga to another device required transferring both the HTML files and the images
114+
- there was no unified GUI for a whole catalog containing multiple volumes
115+
- on some mobile devices, some workarounds were needed to open HTML files
116+
117+
Starting from version 0.2.0, a new .mokuro format is introduced, which is generated for each volume and contains only the OCR results and metadata necessary for the web reader GUI.
118+
Web reader is now a separate web app, which can open manga volumes with their associated .mokuro files.
119+
120+
The old HTML format is still generated for backward compatibility, but it will not be developed further, and it is recommended to use the new .mokuro format and the web reader.
121+
94122
# Contact
95123
For any inquiries, please feel free to contact me at [email protected]
96124

mokuro/__init__.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
__version__ = '0.1.8'
1+
from ._version import __version__ as __version__
22

3-
from mokuro.manga_page_ocr import MangaPageOcr, InvalidImage
4-
from mokuro.overlay_generator import OverlayGenerator
3+
from mokuro.manga_page_ocr import MangaPageOcr as MangaPageOcr
4+
from mokuro.mokuro_generator import MokuroGenerator as MokuroGenerator

mokuro/__main__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -7,5 +7,5 @@ def main():
77
fire.Fire(run)
88

99

10-
if __name__ == '__main__':
10+
if __name__ == "__main__":
1111
main()

mokuro/_version.py

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
__version__ = "0.2.0-beta.8"

mokuro/cache.py

+7-7
Original file line numberDiff line numberDiff line change
@@ -6,28 +6,28 @@
66

77
class cache:
88
def __init__(self):
9-
self.root = Path.home() / '.cache' / 'manga-ocr'
9+
self.root = Path.home() / ".cache" / "manga-ocr"
1010
self.root.mkdir(parents=True, exist_ok=True)
1111

1212
@property
1313
def comic_text_detector(self):
14-
path = self.root / 'comictextdetector.pt'
15-
url = 'https://github.com/zyddnys/manga-image-translator/releases/download/beta-0.2.1/comictextdetector.pt'
14+
path = self.root / "comictextdetector.pt"
15+
url = "https://github.com/zyddnys/manga-image-translator/releases/download/beta-0.2.1/comictextdetector.pt"
1616

1717
self._download_if_needed(path, url)
1818
return path
1919

2020
def _download_if_needed(self, path, url):
2121
if not path.is_file():
22-
logger.info(f'Downloading {url}')
22+
logger.info(f"Downloading {url}")
2323
r = requests.get(url, stream=True, verify=True)
2424
if r.status_code != 200:
25-
raise RuntimeError(f'Failed downloading {url}')
26-
with path.open('wb') as f:
25+
raise RuntimeError(f"Failed downloading {url}")
26+
with path.open("wb") as f:
2727
for chunk in r.iter_content(1024):
2828
if chunk:
2929
f.write(chunk)
30-
logger.info(f'Finished downloading {url}')
30+
logger.info(f"Finished downloading {url}")
3131

3232

3333
cache = cache()

mokuro/env.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
from pathlib import Path
22

3-
ASSETS_PATH = Path(__file__).parent / 'assets'
3+
ASSETS_PATH = Path(__file__).parent / "assets"
44

5-
assert ASSETS_PATH.is_dir(), f'{ASSETS_PATH} missing'
5+
assert ASSETS_PATH.is_dir(), f"{ASSETS_PATH} missing"

mokuro/legacy/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)