Releases · stanfordnlp/CoreNLP

22 Oct 11:28

J38

v4.3.1

4c28eb5

v4.3.1

Fixes

character offset issue with StatTok
fixes path issue with default Hungarian properties
adds Hungarian and Italian to demo
fixes umlaut issue

Assets 2

06 Oct 10:54

J38

v4.3.0

147e458

v4.3.0

Overview

This release adds new European languages, improvements to the parsers and tokenizers, and other misc. fixes.

Enhancements

Hungarian pipeline
Italian pipeline
Improvements to English tokenizer
Better memory usage by dependency parser

Fixes

issue with umlaut handling in German #1184

Assets 2

14 May 21:36

J38

v4.2.2

8e3a8bd

v4.2.2

This release includes some small fixes to version 4.2.1.

It includes:

demo fixes for 4.2.2, resolving cache issues with demo resources
small fix to RegexNERSequenceClassifier issue allowing AnswerAnnotation to be overwritten

Assets 2

05 May 20:58

AngledLuffa

v4.2.1

7669744

v4.2.1

Fix the server having some links http instead of https
#1146

Improve MWE expressions in the enhanced dependency conversion
1ef9ef9

Add the ability for the command line semgrex processor to handle multiple calls in one process
c9d50ef

Fix interaction between discarding tokens in ssplit and assigning NER tags
a803bc3

Reduce the size of the sr parser models (not a huge amount, but some)
#1142

Various QuoteAnnotator bug fixes
#1135
#1134
#1121
#1118
9f1b015
#1147

Switch to newer istack implementation
#1133
Newer protobuf
#1150

Add a conllu output format to some of the segmenter code, useful for testing with the official test scripts
c70ddec

Fix Turkish locale enums
#1126
stanfordnlp/stanza#580

Use StringBuilder instead of StringBuffer where possible
#1010

Assets 2

17 Nov 10:17

J38

v4.2.0

c56e5f9

v4.2.0

Overview

This release features a collection of small bug fixes and updates. It is the first release built directly from the GitHub repo.

Enhancements

Upgrade libraries (EJML, JUnit, JFlex)
Add character offsets to Tregex responses from server
Improve cleaning of treebanks for English models
Speed up loading of Wikidict annotator
New utility for tagging CoNLL-U files in place
Command line tool for processing TokensRegex

Fixes

Output single token NER entities in inline XML output format
Add currency symbol part of speech training data
Fix issues with tree binarizing

Assets 2

04 May 02:47

J38

v4.0.0

a00179c

Stanford CoreNLP 4.0.0

Overview

The latest release of Stanford CoreNLP includes a major overhaul of tokenization and a large collection of new parsing and tagging models. There are also miscellaneous enhancements and fixes.

Enhancements

UD v2.0 tokenization standard for English, French, German, and Spanish. That means "new" LDC tokenization for English (splitting on most hyphens) and not escaping parentheses or turning quotes etc. into ASCII sequences by default.
Upgrade options for normalizing special chars (quotes, parentheses, etc.) in PTBTokenizer
Have WhitespaceTokenizer support same newline processing as PTBTokenizer
New mwt annotator for handling multiword tokens in French, German, and Spanish.
New models with more training data and better performance for tagging and parsing in English, French, German, and Spanish.
Add French NER
New Chinese segmentation based off CTB9
Improved handling of double codepoint characters
Easier syntax for specifying language specific pipelines and NER pipeline properties
Improved CoNLL-U processing
Improved speed and memory performance for CRF training
Tregex support in CoreSentence
Updated library dependencies

Fixes

NPE while simultaneously tokenizing on whitespace and sentence splitting on newlines
NPE in EntityMentionsAnnotator during language check
NPE in CorefMentionAnnotator while aligning coref mentions with titles and entity mentions
NPE in NERCombinerAnnotator in certain configurations of models on/off
Incorrect handling of eolonly option in ArabicSegmenterAnnotator
Apply named entity granularity change prior to coref mention detection
Incorrect handling of keeping newline tokens when using Chinese segmenter on Windows
Incorrect handling of reading in German treebank files
SR parser crashes when given bad training input
New PTBTokenizer known abbreviations: "Tech.", "Amb.". Fix legacy tokenizer hack special casing 'Alex.' for 'Alex. Brown'
Fix ancient bug in printing constituency tree with multiple roots.
Fix parser from failing on word "STOP" because it treated it as a special word

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes

Overview

Enhancements

Fixes

Overview

Enhancements

Fixes

Overview

Enhancements

Fixes

Releases: stanfordnlp/CoreNLP

v4.3.1

Fixes

v4.3.0

Overview

Enhancements

Fixes

v4.2.2

v4.2.1

v4.2.0

Overview

Enhancements

Fixes

Stanford CoreNLP 4.0.0

Overview

Enhancements

Fixes