Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The chunker needs punctuation to work properly #29

Open
alexkalderimis opened this issue Sep 13, 2013 · 4 comments
Open

The chunker needs punctuation to work properly #29

alexkalderimis opened this issue Sep 13, 2013 · 4 comments

Comments

@alexkalderimis
Copy link

Using the definitions of tokenize, pos-tag, and chunker from the readme, and 1.5.1 versions of the model files, the following behaviour is observed:

 (-> "I am looking for a good way to annotate this english text."
    tokenize pos-tag chunker phrases)
;; => (["I"] ["am" "looking"]  ["for"]  ["a" "good" "way"] ["to" "annotate"] ["this" "English" "text"]))

;; cf. the same operation, when the text is not full-stop terminated:
 (-> "I am looking for a good way to annotate this English text"
    tokenize pos-tag chunker phrases)
;; => (["I"] ["am" "looking"] ["for"] ["a" "good" "way"] ["to" "annotate"] ["this" "English"])

The pos-tag output seems correct however.

@dakrone
Copy link
Owner

dakrone commented Sep 13, 2013

Yea, this is a known issue documented here: https://github.com/dakrone/clojure-opennlp#known-issues It's something that the OpenNLP libary does, not clojure-opennlp.

@alexkalderimis
Copy link
Author

Seems fair - thanks for the reply. And sorry for not spotting that disclaimer.

@kottmann
Copy link

Would be nice if you could report this to OpenNLP, so it can be fixed in the next version.

@wenxijuji
Copy link

I think the OpenNLP 1.7.2 version this project is using right now has fixed the punctuation problem. So maybe we can include the end punctuation?

Also, I notice the OpenNLP produce phrase tag as "O", where in the clojure-opennlp "O" is not incorporated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants