Skip to content

Releases: adbar/courlan

courlan-1.3.2

29 Oct 16:40
0d85371
Compare
Choose a tag to compare
  • UrlStore.get_download_urls(): timelimit removed, fix type hints (#119, 19c580e)
  • extract_links(): deprecate base_url parameter (#121)
  • setup: simplify workflow (#118)

courlan-1.3.1

04 Sep 12:27
54e3ec2
Compare
Choose a tag to compare
  • UrlStore compression: make bz2 and zlib optional, update pickle protocol (#113)
  • extract_links(): review and document, add deprecation warning for base_url argument (#115)
  • maintenance: add __all__ to init.py and lint code (#116)

courlan-1.3.0

25 Jul 14:35
a48713a
Compare
Choose a tag to compare
  • parsing: validate netloc with port number by @naz-theori in #104
  • cleaning: fix handling of apostrophes (#107)
  • maintenance: deprecate Python 3.6 & 3.7, add pyproject.toml setup file (#59, #105)

courlan-1.2.0

04 Jun 16:39
0549988
Compare
Choose a tag to compare
  • more compact UrlStore: use bytes instead of str for URL paths (#88)
  • UrlStore maintenance: deprecate timelimit argument (#101)
  • maintenance: simplify code (#103)
  • support for Python 3.13

courlan-1.1.0

30 Apr 11:20
2b11567
Compare
Choose a tag to compare
  • replace langcodes by babel and use its information on locales (#89, #92)
  • simplified and faster code: domain extraction, cleaning, filters and UrlStore (#90, #93, #94, #95)
  • UrlStore: better url batches, replace timelimit parameter by time_limit (#91)
  • maintenance: update readme and convert it to markdown (#97)

courlan-1.0.0

01 Feb 14:56
1cfb7db
Compare
Choose a tag to compare
  • license change from GPLv3+ to Apache 2.0 (#81)
  • UrlStore: write() method and load_store() function added (#83)
  • add parameter trailing_slash to keep of discard slashes at the end of URLs (#52)
  • maintenance: fix whitespace in clean_url() (#77), simplify code (#79)

courlan-0.9.5

28 Nov 11:34
b61b1b3
Compare
Choose a tag to compare
  • IRI to URI normalization: encode path, query and fragments (#58, #60)
  • normalization: strip common trackers (#65)
  • new function is_valid_url() (#63)
  • hardening of domain filter (#64)

Full Changelog: v0.9.4...v0.9.5

courlan-0.9.4

06 Sep 15:17
869912c
Compare
Choose a tag to compare
  • new UrlStore functions: add_from_html() (#42), discard() (#44), get_unvisited_domains
  • CLI: removed --samplesize, use --sample with an integer instead (#54)
  • added plausibility filter for domains/hosts (#48)
  • speedups and more efficient processing (#47, #49, #50)
  • fixed handling of relative URLs with @feltcat in #46
  • fixed bugs and ensured compatibility (#41, #43, #51, #56)
  • official support for Python 3.12

Full Changelog: v0.9.3...v0.9.4

courlan-0.9.3

31 May 14:41
05c6e20
Compare
Choose a tag to compare
  • more efficient URL parsing (#33)
  • refined link extraction and link filters (#30, #36)
  • more efficient normalization (#32)
  • more efficient sampling strategy (#31, #35)
  • added meta function to clear LRU caches (#34)
  • added parallel option in command-line interface (#37, #39)
  • added get_unvisited_domains() method to UrlStore (#40)

Full Changelog: v0.9.2...v0.9.3

courlan-0.9.2

02 May 17:02
eb23b9b
Compare
Choose a tag to compare
  • add blogspot archives to type filter
  • maintenance: upgrade urllib3 and review code