-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPACE is not UPOS #65
Comments
We're adding them in, not filtering them out. The way a spaCy It should be easy to write a custom pipeline component to convert |
Thanks for the quick answer and explanation! I already assumed its more depend on spaCy than on the mind model behind UPOS. Writing an additional component is actually not my problem (I already done it). Its more iterating once more over an corpus again, which slows down the whole annotation part. At least it would be really nice, if you could make consumer of this package aware of this behavior. It might seems obvious to people using spaCy on more daily bases, but not for people like me, which are assuming it sticks to standards. Also it should be recognized for spaCy itself since the linkage for Token#pos_ is misleading and causing bugs like in my case. [1]: Before you ask. Since I deal during a projection with at least 2 corpora and I cannot assume that during the projection all components using the exact same tokenizer, so I need to stick to standards like UPOS rather then concrete implementations like spaCy. |
Yes, the spaCy docs could be improved here. If you don't want to convert stanza's |
Thanks again for the quick answer. Plain stanza is currently not an option for this iteration, since this requires a lot of changes to my project, which I cannot effort to do at the moment due time pressure. The answer I am seeking is more like - "hey we already know that and we are planning to go around this with ...". But it's also okay, if the answer is: "Oh, we make that by intend.", which it looks like to be. However I would be very grateful, if this will addressed in the docs - it costs me several hours to figure this out and I will probably not the last person, which stumbles over that. Anyways, thanks for the help. |
Hey,
First of all thanks for the great job!
I am currently using stanza via spaCy for an small annotation projection project. However while integrating I realized that spacy-stanza uses an custom Universal POS tag. I guess its a bit against the idea of Universal POS tags and it makes my life harder since I need another run to filter those tags out.
My questions are now: Is there any reason why this wrapper does not filter them out? Is there any possible solution/workaround/filter possible to overcome this?
Thanks for your time!
The text was updated successfully, but these errors were encountered: