In order to start the script the following requirements must be met:
- Python 3.11
- Poetry
- After cloning run
poetry install
- Set environment variables:
imgur_cid
: Imgur Client ID (register an application on the Imgur Developer portalreddit_cid
: Your reddit application client id (Register an application on the Reddit Developer Portal)reddit_cs
: Your reddit client secretreddit_username
: [Optional] Your reddit username, being signed in provides a higher ratelimitreddit_password
: [Optional] Your reddit password, only use together withreddit_username
- Run application with
python main.py -d path/to/data subreddit [subreddit ...]
- The application will tell you if environment variables are missing
All visited submissions by the script are stored in the meta/dupmap.json
file,
they contain the submission id and the absolute path to the file,
if you wish to move your data folder to a different location,
please ensure to update these paths.
The application also requires Python 3.11 due to the use of TaskGroups
which were only added in it.
Due to the Reddit API limit, it will take around 15 minutes per 1000 posts (which is the limit of the API). So please calculate 15-17 minutes per new subreddit and 1000 posts.