Skip to content

Releases: KoljaB/stream2sentence

v0.3.0

14 Dec 17:33
Compare
Choose a tag to compare
  • combine sentences that are shorter than the specified minimum length with subsequent sentences (fixes situations where minimum_sentence_length would not work)

0.2.9

29 Nov 21:57
Compare
Choose a tag to compare

Enhancements to Sentence Processing

  • Improved buffer handling by ensuring it starts with an alphanumeric character to prevent TTS confusion caused by initial non-phonetic characters.
  • Bug Fix: Resolved an issue where the word counter wasn’t reset after triggering force_first_fragment_after_words, causing processing errors.
  • Increased the default force_first_fragment_after_words threshold from 15 to 30 for better fragment control.

(Bugfix for: KoljaB/RealtimeTTS#223)

v0.2.7

07 Nov 13:01
Compare
Choose a tag to compare
  • implements PRs #3, #4
  • fixes #5
  • upgrades stanza, nltk and emoji dependencies to latest versions
  • upgrades nltk to latest "punkt-tab" model

v0.2.5

21 Jul 15:43
Compare
Choose a tag to compare
  • new parameters for sentence splitting:
    context_size_look_overhead: The number of characters to look over the context_size boundaries to detect sentence splitting characters (improves sentence detection).
    quick_yield_for_all_sentences (bool): If set to True, the generator will yield every sentence first fragment as quickly as possible (not only the first sentence first fragment)
    quick_yield_every_fragment (bool): If set to True, the generator not only yield every sentence first fragment, but also every following fragment.

v0.2.4

17 Jul 19:01
Compare
Choose a tag to compare
  • bugfix with some special characters that could break sentence splitting (esp in context of using chinese tokenizers where sentence splitting could fail)

v0.2.3

21 Mar 19:22
Compare
Choose a tag to compare
  • new parameters:
    sentence_fragment_delimiters: Characters considered as sentence delimiters for yielding quick fragment.
    force_first_fragment_after_words: Forces the first sentence fragment to yield after a specified number of words. Default is 10 words.

v0.2.2

01 Dec 16:12
Compare
Choose a tag to compare
  • enable early tokenizer initialization with init_tokenizer

v0.2.1

01 Dec 11:48
Compare
Choose a tag to compare

Minor bugfix

  • print message exchanged with logging.info for more clean and customizable output

v0.2.0

29 Nov 00:07
Compare
Choose a tag to compare
  • added stanza tokenizer to support sentence splitting for more languages (for example chinese)
text = "我喜欢读书。天气很好。我们去公园吧。今天是星期五。早上好。这是我的朋友。请帮我。吃饭了吗?我在学中文。晚安。"
expected = ["我喜欢读书。", "天气很好。", "我们去公园吧。", "今天是星期五。", "早上好。", "这是我的朋友。", "请帮我。", "吃饭了吗?", "我在学中文。晚安。"]
sentences = list(generate_sentences(text, minimum_sentence_length = 2, context_size=2, tokenizer="stanza", language="zh"))
self.assertEqual(sentences, expected)   
  • emoji library added to filter emojis more precisely out of the stream (the previous emoji filter method would not work well with some languages)