Releases: adbar/courlan
Releases · adbar/courlan
courlan-0.5.0
- more complex language heuristics, use langcodes
- extended blacklists and whitelists
- more precise filters and more efficient code
- support for Python 3.10
courlan-0.4.2
- enhanced cleaning
- fixed language filter
courlan-0.4.1
- keep trailing slashes to avoid redirection
- fixes: normalization and crawlable URLs
courlan-0.4.0
- URL manipulation tools added: extract parts, fix relative URLs
- filters added: language, navigation and crawls
- more robust link handling and extraction
- removed support for Python 3.4
courlan-0.3.1
- improve filter precision
courlan-0.3.0
- reduced dependencies: replace
requests
with bareurllib3
, andtldextract
withtld
for Python 3.6 upwards - better path and fragment normalization
courlan-0.2.3
- Python 3.9 compatibility
- Simplified imports
- Bug fixes
courlan-0.2.2
- English and German language filters
- Function to detect external links
- Support for domain blacklisting
courlan-0.2.1
- Less aggressive strict filters
- CLI bug fixed
courlan-0.2.0
- Cleaner and more efficient filtering
- Helper functions to scrub, clean and normalize
- Removed two dependencies with more extensive usage of
urllib.parse