Releases · KoljaB/stream2sentence

14 Dec 17:33

KoljaB

v0.3.0

5a88d79

v0.3.0 Latest

Latest

combine sentences that are shorter than the specified minimum length with subsequent sentences (fixes situations where minimum_sentence_length would not work)

Assets 2

29 Nov 21:57

KoljaB

v0.2.9

a0f8be1

0.2.9

Enhancements to Sentence Processing

Improved buffer handling by ensuring it starts with an alphanumeric character to prevent TTS confusion caused by initial non-phonetic characters.
Bug Fix: Resolved an issue where the word counter wasn’t reset after triggering force_first_fragment_after_words, causing processing errors.
Increased the default force_first_fragment_after_words threshold from 15 to 30 for better fragment control.

(Bugfix for: KoljaB/RealtimeTTS#223)

Assets 2

07 Nov 13:01

KoljaB

v0.2.7

b9329a2

v0.2.7

implements PRs #3, #4
fixes #5
upgrades stanza, nltk and emoji dependencies to latest versions
upgrades nltk to latest "punkt-tab" model

Assets 2

21 Jul 15:43

KoljaB

v0.2.5

5bcfc78

v0.2.5

new parameters for sentence splitting:
context_size_look_overhead: The number of characters to look over the context_size boundaries to detect sentence splitting characters (improves sentence detection).
quick_yield_for_all_sentences (bool): If set to True, the generator will yield every sentence first fragment as quickly as possible (not only the first sentence first fragment)
quick_yield_every_fragment (bool): If set to True, the generator not only yield every sentence first fragment, but also every following fragment.

Assets 2

17 Jul 19:01

KoljaB

v0.2.4

7158c04

v0.2.4

bugfix with some special characters that could break sentence splitting (esp in context of using chinese tokenizers where sentence splitting could fail)

Assets 2

21 Mar 19:22

KoljaB

v0.2.3

1ba99ba

v0.2.3

new parameters:
sentence_fragment_delimiters: Characters considered as sentence delimiters for yielding quick fragment.
force_first_fragment_after_words: Forces the first sentence fragment to yield after a specified number of words. Default is 10 words.

Assets 2

01 Dec 16:12

KoljaB

v0.2.2

c57896a

v0.2.2

enable early tokenizer initialization with init_tokenizer

Assets 2

01 Dec 11:48

KoljaB

v0.2.1

4dc7231

v0.2.1

Minor bugfix

print message exchanged with logging.info for more clean and customizable output

Assets 2

29 Nov 00:07

KoljaB

v0.2.0

5820cb6

v0.2.0

added stanza tokenizer to support sentence splitting for more languages (for example chinese)

text = "我喜欢读书。天气很好。我们去公园吧。今天是星期五。早上好。这是我的朋友。请帮我。吃饭了吗？我在学中文。晚安。"
expected = ["我喜欢读书。", "天气很好。", "我们去公园吧。", "今天是星期五。", "早上好。", "这是我的朋友。", "请帮我。", "吃饭了吗？", "我在学中文。晚安。"]
sentences = list(generate_sentences(text, minimum_sentence_length = 2, context_size=2, tokenizer="stanza", language="zh"))
self.assertEqual(sentences, expected)

emoji library added to filter emojis more precisely out of the stream (the previous emoji filter method would not work well with some languages)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: KoljaB/stream2sentence

v0.3.0

0.2.9

v0.2.7

v0.2.5

v0.2.4

v0.2.3

v0.2.2

v0.2.1

v0.2.0