v4.5.0 #1285
AngledLuffa
announced in
Announcements
v4.5.0
#1285
Replies: 2 comments
-
Hi, congrats on the release! Is there a plan for publication to Maven central? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hopefully in the next few days! We were just making sure there are no
horrible bugs
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
CoreNLP 4.5.0
Main features are improved lemmatization of English, improved tokenization of both English and non-English flex-based languages, and some updates to tregex, tsurgeon, and semgrex
All PTB and German tokens normalized now in PTBLexer (previously only German umlauts).
This makes the tokenizer 2% slower, but should avoid issues with resume' for example
d46fecd
log4j removed entirely from public CoreNLP (internal "research" branch still has a use)
f05cb54
Fix NumberFormatException showing up in NER models: java.lang.NumberFormatException: Bad number put into wordToNumber #547 5ee2c39
Fix "seconds" in the lemmatizer: e7a073b
Fix double escaping of & in the online demos: 8413fa1
Report the cause of an error if "tregex" is asked for but no parse annotator is added: 4db80c0
Merge ssplit and cleanxml into the tokenize annotator (done in a backwards compatible manner): Cleanxml #1259
Custom tregex pattern, ROOT tregex pattern, and tsurgeon operation for simultaneously moving a subtree and pruning anything left behind, used for processing the Italian VIT treebank in stanza: Add a moveprune operation which prunes an empty node if needed after … #1263
Refactor tokenization of punctuation, filenames, and other entities common to all languages, not just English: 3c40ba3 58a2288 8b97d64
Improved tokenization of number patterns, names with apostrophes such as Sh'reyan, non-American phone numbers, invisible commas 9476a8e 6193934 afb1ea8 7c84960
Significant lemmatizer improvements: adjectives & adverbs, along with some various other special cases Ud feats #1266
Include graph & semgrex indices in the results for a semgrex query (will make the results more usable) 45b47e2
Trim words in the NER training process. spaces can still be inside a word, but random whitespace won't ruin the performance of the models 0d9e9c8
This discussion was created from the release v4.5.0.
Beta Was this translation helpful? Give feedback.
All reactions