Releases: adbar/courlan
Releases · adbar/courlan
courlan-1.3.2
courlan-1.3.1
courlan-1.3.0
- parsing: validate netloc with port number by @naz-theori in #104
- cleaning: fix handling of apostrophes (#107)
- maintenance: deprecate Python 3.6 & 3.7, add
pyproject.toml
setup file (#59, #105)
courlan-1.2.0
courlan-1.1.0
- replace
langcodes
bybabel
and use its information on locales (#89, #92) - simplified and faster code: domain extraction, cleaning, filters and UrlStore (#90, #93, #94, #95)
- UrlStore: better url batches, replace
timelimit
parameter bytime_limit
(#91) - maintenance: update readme and convert it to markdown (#97)
courlan-1.0.0
courlan-0.9.5
courlan-0.9.4
- new UrlStore functions:
add_from_html()
(#42),discard()
(#44),get_unvisited_domains
- CLI: removed
--samplesize
, use--sample
with an integer instead (#54) - added plausibility filter for domains/hosts (#48)
- speedups and more efficient processing (#47, #49, #50)
- fixed handling of relative URLs with @feltcat in #46
- fixed bugs and ensured compatibility (#41, #43, #51, #56)
- official support for Python 3.12
Full Changelog: v0.9.3...v0.9.4
courlan-0.9.3
- more efficient URL parsing (#33)
- refined link extraction and link filters (#30, #36)
- more efficient normalization (#32)
- more efficient sampling strategy (#31, #35)
- added meta function to clear LRU caches (#34)
- added parallel option in command-line interface (#37, #39)
- added
get_unvisited_domains()
method toUrlStore
(#40)
Full Changelog: v0.9.2...v0.9.3
courlan-0.9.2
- add blogspot archives to type filter
- maintenance: upgrade urllib3 and review code