Releases: bitextor/warc2text
Releases · bitextor/warc2text
v1.2.0
What's Changed
- Add
--robotspass
shunt for records related to robots.txt by @jelmervdl in #43 - Add
--jsonl
option by @jelmervdl in #35 - warc2html changes by @ZJaume in #50
- ZSTD compression and compression level support by @ZJaume in #51
- Move JSONL output to --stdout and allow file-based output with JSONL by @ZJaume in #52
Full Changelog: v1.1.0...v1.2.0
v1.1.0: Merge pull request #36 from jelmervdl/fasttext-option
Changes:
- Add option to use a FastText model as a language identifier
- Record identified by CLD2 as Unknown are classified as
unk
instead of dropped.