Releases: KoljaB/stream2sentence
Releases · KoljaB/stream2sentence
v0.3.0
0.2.9
Enhancements to Sentence Processing
- Improved buffer handling by ensuring it starts with an alphanumeric character to prevent TTS confusion caused by initial non-phonetic characters.
- Bug Fix: Resolved an issue where the word counter wasn’t reset after triggering
force_first_fragment_after_words
, causing processing errors. - Increased the default
force_first_fragment_after_words
threshold from 15 to 30 for better fragment control.
(Bugfix for: KoljaB/RealtimeTTS#223)
v0.2.7
v0.2.5
- new parameters for sentence splitting:
context_size_look_overhead: The number of characters to look over the context_size boundaries to detect sentence splitting characters (improves sentence detection).
quick_yield_for_all_sentences (bool): If set to True, the generator will yield every sentence first fragment as quickly as possible (not only the first sentence first fragment)
quick_yield_every_fragment (bool): If set to True, the generator not only yield every sentence first fragment, but also every following fragment.
v0.2.4
v0.2.3
v0.2.2
v0.2.1
v0.2.0
- added stanza tokenizer to support sentence splitting for more languages (for example chinese)
text = "我喜欢读书。天气很好。我们去公园吧。今天是星期五。早上好。这是我的朋友。请帮我。吃饭了吗?我在学中文。晚安。"
expected = ["我喜欢读书。", "天气很好。", "我们去公园吧。", "今天是星期五。", "早上好。", "这是我的朋友。", "请帮我。", "吃饭了吗?", "我在学中文。晚安。"]
sentences = list(generate_sentences(text, minimum_sentence_length = 2, context_size=2, tokenizer="stanza", language="zh"))
self.assertEqual(sentences, expected)
- emoji library added to filter emojis more precisely out of the stream (the previous emoji filter method would not work well with some languages)