board_dl

Downloads images/videos from 4chan using Python3.6 or newer.

Supports original filenames.
Supports Unicode names.
Fast parallel download.
If no URL is passed as argument takes the URL from the clipboard.
Can download until 404.
Supports showing files when download finishes.
Supports HTTP and HTTPS links.
Tested with Windows 10 and Linux Ubuntu 18.04.
Can use official API to workaround error 403 "Forbidden" when scraping
Automatic retry on rate limit (error 429 "Too many requests")

Usage

python3 ./4chandl.py --help
usage: 4chandl.py [-h] [--symlink-names] [--force-download]
                  [--after-action AFTER_ACTION] [--until-404]
                  [--retry-delay RETRY_DELAY] [--retries-max RETRIES_MAX]
                  [--save-html SAVE_HTML] [--method METHOD] 
                  [URL]
 
Downloads all media from a 4chan thread.
 
positional arguments:
  URL                   A link to the thread like https://boards.4chan.org/gif/thread/12891600/threadname
 
optional arguments:
  -h, --help            show this help message and exit
  --symlink-names       Creates a subdirectory 'symlinks' linking the parsed
                        original upload filenames with the numbered media
                        files (requires Admin on Windows)  
  --force-download      Downloads the media files again overwriting existing
                        files
  --after-action AFTER_ACTION
                        Can be SHOW_FILES
  --until-404           Keeps downloading all new media until the thread dies
                        or --retry-max is reached
  --retry-delay RETRY_DELAY
                        Delay in seconds in between download rounds. Combine
                        with --until-404. Default=120
  --retries-max RETRIES_MAX
                        Max retries before we give up, even if thread is not
                        404 yet. 0 = no limit. Combine with --until-404.
                        Default=30
  --save-html SAVE_HTML
                        Save html file (just the raw thread file, no
                        dependencies). Default=True (FIXME)
  --method METHOD       Method can either be 'api' or 'crawl'. Default=api

--symlink-names is recommended (works as user on Linux, but requires Administrator privileges on Windows).

Install

On Linux (Ubuntu) install python3 and python3-tk (e.g. sudo apt-get install python3.7 python3-tk)

On newer Windows 10 and Windows 11 installs it is sufficient to install Python 3.9+ from the Windows store and then packages urllib3 and certifi with pip install -r requirements.txt

On Linux (Nixos) you may run this with e.g. nix-shell -p python312Full -p python312Packages.urllib3 -p python312Packages.certifi --run "python 4chandl.py"

Debugging the legacy `crawl` method

Saves a thread.html, check with https://regex101.com/ if the imgReg regex in 4chandl.py is broken due to board changes.

Logfiles: Date, url, number of urls found, [original-filename-long-optional,url,filename-short]

Time: 2018-06-09 16:47:33.921037
https://boards.4chan.org/trv/thread/1417871/best-longstay-cities-for-degenerate-lifestyle
Found 1 media urls

[('D71CBC8C-3CBE-4CCE-AE79-1A142C3DFC7C.jpg', '//is2.4chan.org/trv/1528553461662.jpg', 'D71CBC8C-3CBE-4CCE-AE79-1(...).jpg')]

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.gitignore		.gitignore
4chandl.py		4chandl.py
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

board_dl

Usage

Install

Debugging the legacy `crawl` method

About

Uh oh!

Releases

Packages

Uh oh!

Languages

kwinz/board_dl

Folders and files

Latest commit

History

Repository files navigation

board_dl

Usage

Install

Debugging the legacy crawl method

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Debugging the legacy `crawl` method

Packages