Releases: EdJoPaTo/website-stalker
v0.18.0
html_prettify
attribute improvements
Before this release changes like this occurred regularly:
-<a class="external link">
+<a class="link external">
-<a style="color: white; display: none">
+<a style="display:none;color:white">
This release sorts class
es and formats style
. This reduces the amount of diffs when the host only changes something like the order.
It also fits into the concept of 'pretty' HTML which this editor attempts.
support URL queries
Some websites are server generated based on the queries used. Different queries for the same domain/path are now possible.
Minor changes
housekeeping, dependency updates, …
v0.17.0
HTML parsing improvements
html_markdownify
, html_prettify
and html_textify
received bugfixes and improvements to parsing.
HTML parts aren't escaped anymore 2989f56 and prettify ensures indentation of text contents 952bde3.
html_markdownify
now uses the html2md crate which implements more features and less strange edge cases 1362a4e.
RSS pubDate
It is now attempted to read the datetime
attribute from elements to determine the pubDate
of the RSS item.
The goal of the datetime
element is to provide a machine-readable format. As parsing the date time from various human formats is hard this is probably the simplest way of adding a useful pubDate
when possible while not over-complicating things.
Minor Changes
v0.16.0
Automatically assume file extensions
previously you configured the wanted extension via the config file. This is now automatically assumed based on the Content-Type HTTP Header and the used editors.
- url: "https://edjopato.de"
- extension: md
editors:
- html_markdownify
Notifications
Its now possible to send notifications on changes via pling.
Notification targets (E-Mail, Slack, Telegram, …) are entirely configured via environment variables as they mainly contain secrets. Check the pling documentation about which environment variables can be set.
The sent notification can be changed via the new config key notification_template.
When using GitHub Actions you can check out their Environment variable documentation and the example repo config which configures Telegram notifications into this Telegram channel.
1b1977a 14d3837 24b6cd6 8ad53c1
Improvements to website-stalker check
Check shows more details like configured notifications. This will not show details to prevent leakage of secrets and only the amount of configured notification targets.
Its also possible to print or rewrite the current config as yaml.
This is helpful when migrating older configs or checking if certain environment variables are correctly read.
Minor Changes
v0.15.0
Multiple URLs with same options
You can now specify an URL array to be used for an entry in the config. This way multiple urls will use the same specified options.
This is especially for stalking multiple nearly the same webpages.
To provide an example:
sites:
- url: "https://edjopato.de/"
extension: html
- url: "https://edjopato.de/post/"
extension: html
Can now also be specified like this:
sites:
- url:
- "https://edjopato.de/"
- "https://edjopato.de/post/"
extension: html
Minor Changes
v0.14.0
v0.13.0
Split css_select
into css_select
and css_remove
This results in simpler configs for removing via css selector:
editors:
- - css_select:
- selector: img
- remove: true
+ - css_remove: img
This is a breaking change and also simplifies the internal logic.
img
in html_markdownify
Images are now added to the markdown output.
Images will require absolute paths when markdown is being rendered as html so html_url_canonicalize
is helpful here.
If you do not want the images (like it was before this release) add the editor css_remove: img
to your config.
Minor Changes
- fix(git): work in repo without commits yet 7436dff
v0.12.1
v0.12.0
Editors
Two new editors json_prettify
and html_url_canonicalize
. 73814fb e51baf0
IPv6 vs legacy IPv4
The log output now shows which kind of address was used. e034a70
v0.11.0
Simplify Git Logic
The git part was heavily updated. When running with --commit
the command now aborts when not in a git repo or the repo is unclean.
If the repo is unclean (without --commit
) no more git add
is used which simplifies testing out the ideal config before commiting it.
With these changes also now all the git logic is handled via libgit2. The git binary is not anymore a required dependency. ❇️
- feat(run)!: prevent --commit in a not clean repo 73800f0
- feat!: prevent --commit when not in a git repo 664837c
- fix(run): only git add when --commit 25fa0d8
- fix(git): dont integrate git diff and git status da23989
- feat(run): dont cleanup or reset b75d9f8
- refactor(run): simplify git finishup logic 8efda45
Warn on redirected URLs
Some urls are redirected first before the content is returned. This results in additional traffic and roundtrips. As this is done every time the website-stalker is running this adds up over time. In order to reduce traffic the target of the redirects should be specified directly.
There is now a warning which shows which URL leads where and suggests using the target instead.
- feat: warn on redirected URLs to reduce traffic 4c9136c
Init command
You can now init a directory with a git repo (git init
) and a config (website-stalker example-config > website-stalker.yaml
) in one neat command:
website-stalker init
- feat(init): provide init folder/repo/config command 9842d9a
Case insensitive site filter
The site filter is now case insensitve. When you had to use website-stalker run EdJoPaTo
for running on https://EdJoPaTo.de
you can now do so with website-stalker run edjopato
- feat(cli)!: site filter is now case insensitive 85af5f6
Config format is now fixed
Before you could use other formats for the config like website-stalker.toml
. In order to simplify the config logic the config now has to be a yaml file.
- refactor(config)!: simplify 4d5e390
Minor Changes
v0.10.0
html_markdownify
A new editor html_markdownify
can create markdown from html input. See more details about this new editor in the README. e1798ee
html_textify
Creates now up to one empty line between filled lines db894e9 32fa6d1
Rename editors to be more like functions
Editors should now be more clear in what they are doing when they are applied. This is a breaking change and you have to adapt your configs in order to work with this release. 82cefbc
- html_text → html_textify
- css_selector → css_select
- regex_replacer → regex_replace