Releases: EdJoPaTo/website-stalker
v0.12.0
Editors
Two new editors json_prettify
and html_url_canonicalize
. 73814fb e51baf0
IPv6 vs legacy IPv4
The log output now shows which kind of address was used. e034a70
v0.11.0
Simplify Git Logic
The git part was heavily updated. When running with --commit
the command now aborts when not in a git repo or the repo is unclean.
If the repo is unclean (without --commit
) no more git add
is used which simplifies testing out the ideal config before commiting it.
With these changes also now all the git logic is handled via libgit2. The git binary is not anymore a required dependency. ❇️
- feat(run)!: prevent --commit in a not clean repo 73800f0
- feat!: prevent --commit when not in a git repo 664837c
- fix(run): only git add when --commit 25fa0d8
- fix(git): dont integrate git diff and git status da23989
- feat(run): dont cleanup or reset b75d9f8
- refactor(run): simplify git finishup logic 8efda45
Warn on redirected URLs
Some urls are redirected first before the content is returned. This results in additional traffic and roundtrips. As this is done every time the website-stalker is running this adds up over time. In order to reduce traffic the target of the redirects should be specified directly.
There is now a warning which shows which URL leads where and suggests using the target instead.
- feat: warn on redirected URLs to reduce traffic 4c9136c
Init command
You can now init a directory with a git repo (git init
) and a config (website-stalker example-config > website-stalker.yaml
) in one neat command:
website-stalker init
- feat(init): provide init folder/repo/config command 9842d9a
Case insensitive site filter
The site filter is now case insensitve. When you had to use website-stalker run EdJoPaTo
for running on https://EdJoPaTo.de
you can now do so with website-stalker run edjopato
- feat(cli)!: site filter is now case insensitive 85af5f6
Config format is now fixed
Before you could use other formats for the config like website-stalker.toml
. In order to simplify the config logic the config now has to be a yaml file.
- refactor(config)!: simplify 4d5e390
Minor Changes
v0.10.0
html_markdownify
A new editor html_markdownify
can create markdown from html input. See more details about this new editor in the README. e1798ee
html_textify
Creates now up to one empty line between filled lines db894e9 32fa6d1
Rename editors to be more like functions
Editors should now be more clear in what they are doing when they are applied. This is a breaking change and you have to adapt your configs in order to work with this release. 82cefbc
- html_text → html_textify
- css_selector → css_select
- regex_replacer → regex_replace
v0.9.0
v0.8.0
More generic config file format
Each site in the config file is now more generic. Before each entry was an html
or utf8
entry. Now each entry is basically the same.
Each entry has an URL and a file extension which is then used to save the resulting file.
Each site can also have editors. An editor manipulates the content before saving the result.
css_selector
and regex_replacer
are now editors. The default behavior of html
to prettify the content is now and editor too: html_prettify
.
Additionally this update includes a new editor html_text
which only returns text entries from the HTML.
To give an example:
sites:
- url: "https://edjopato.de/post/"
extension: html
editors:
- css_selector: article
- css_selector:
selector: .meta
remove: true
- html_prettify
If you want to see a config migration see this commit.
css_selector
remove elements
The css_selector
can now remove matching HTML elements from the result. This is already included in the example above.
html_text
Editor
This editor only returns text entries from the HTML.
To give an example: This will save every h1
heading to the resulting file.
- url: "https://edjopato.de/post/"
extension: txt
editors:
- css_selector: h1
- html_text
systemd improvements
v0.7.1
v0.7.0
systemd files
Adds a systemd service and timer to be used locally 3a210f2
libgit2
Migrate some functions from running git
as a commandline tool towards libgit2
.
This should make handling and detecting easier on the code side of things.
Not everything is migrated (yet?). Some outputs like the git diff are just fine currently via the commandline command.
0074611 bf08eb4 20f07d5 9398cd6
This also allows for running from within a subfolder of a git repo 9398cd6