chrome-based http proxy for better fetching of original page content #3894
Ivan8or
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have recently gotten into RSS and have settled on miniflux as my aggregator.
My experience with miniflux itself has been positive so far, but my experience with the sites i am collecting articles from less so...
Specifically, i have run into several websites which enforce javascript challenges on their articles, i suppose to prevent creatures like us from avoiding their ads and tracking.
After looking around online and being disappointed with the existing options, I have decided to take it upon myself to create a web scraper
curlkwhich bypasses simple fingerprinting / js challenges and, very importantly, takes the form of an http proxy.This is a single docker application which can be run in conjunction with miniflux and configured to be either miniflux's global HTTP_CLIENT_PROXY or as an individual feed's prox(ies).
this is how I currently use curlk in my miniflux deployment:
NOTE: you must enable 'Allow self-signed or invalid certificates' for all feeds which use curlk to request https urls
you can find the project at https://github.com/Ivan8or/curlk and the latest docker images at https://hub.docker.com/r/ivan8or/curlk
Beta Was this translation helpful? Give feedback.
All reactions