v4.5.0 #1285

AngledLuffa · 2022-07-22T23:21:26Z

AngledLuffa
Jul 22, 2022
Maintainer

CoreNLP 4.5.0

Main features are improved lemmatization of English, improved tokenization of both English and non-English flex-based languages, and some updates to tregex, tsurgeon, and semgrex

All PTB and German tokens normalized now in PTBLexer (previously only German umlauts).
This makes the tokenizer 2% slower, but should avoid issues with resume' for example
d46fecd
log4j removed entirely from public CoreNLP (internal "research" branch still has a use)
f05cb54
Fix NumberFormatException showing up in NER models: java.lang.NumberFormatException: Bad number put into wordToNumber #547 5ee2c39
Fix "seconds" in the lemmatizer: e7a073b
Fix double escaping of & in the online demos: 8413fa1
Report the cause of an error if "tregex" is asked for but no parse annotator is added: 4db80c0
Merge ssplit and cleanxml into the tokenize annotator (done in a backwards compatible manner): Cleanxml #1259
Custom tregex pattern, ROOT tregex pattern, and tsurgeon operation for simultaneously moving a subtree and pruning anything left behind, used for processing the Italian VIT treebank in stanza: Add a moveprune operation which prunes an empty node if needed after … #1263
Refactor tokenization of punctuation, filenames, and other entities common to all languages, not just English: 3c40ba3 58a2288 8b97d64
Improved tokenization of number patterns, names with apostrophes such as Sh'reyan, non-American phone numbers, invisible commas 9476a8e 6193934 afb1ea8 7c84960
Significant lemmatizer improvements: adjectives & adverbs, along with some various other special cases Ud feats #1266
Include graph & semgrex indices in the results for a semgrex query (will make the results more usable) 45b47e2
Trim words in the NER training process. spaces can still be inside a word, but random whitespace won't ruin the performance of the models 0d9e9c8

This discussion was created from the release v4.5.0.

paulk-asert · 2022-07-29T06:53:41Z

paulk-asert
Jul 29, 2022

Hi, congrats on the release! Is there a plan for publication to Maven central?

0 replies

AngledLuffa · 2022-07-29T07:14:18Z

AngledLuffa
Jul 29, 2022
Maintainer Author

Hopefully in the next few days! We were just making sure there are no horrible bugs

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4.5.0 #1285

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

v4.5.0 #1285

AngledLuffa Jul 22, 2022 Maintainer

CoreNLP 4.5.0

Replies: 2 comments

paulk-asert Jul 29, 2022

AngledLuffa Jul 29, 2022 Maintainer Author

AngledLuffa
Jul 22, 2022
Maintainer

paulk-asert
Jul 29, 2022

AngledLuffa
Jul 29, 2022
Maintainer Author