-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Takes too long to parse doc results #18
Comments
@Joselinejamy snlp = stanfordnlp.Pipeline(processors='tokenize,pos', models_dir=model_dir)
nlp = StanfordNLPLanguage(snlp)
word_count = 0
for doc in nlp.pipe(lines):
word_count += len(doc)
print(word_count) The main efficiency problem we have at the moment is that we don't have support for batching the predictions and returning a |
@honnibal Thank you for that instant response. But when i ran the below code with just spacy's model it took relatively less time around jus 5sec.
As per the documentation,
So when the same English object is used why is it taking much time ?. Or is my understanding diverged from what is intended ? |
Hi, I'm also seeing a drastic performance decrease when using stanza. For a comparison, here's a project I'm working on, where I'm running a different number of parsers on over 6000 sentences. It can be seen that running CoreNLP 3 + CoreNLP 4 + spaCy roughly takes 8 times less time than running CoreNLP 3 + CoreNL4 + Stanza trough spacy_stanza. Could this be GPU related as well ? These tests are run on a CPU, not GPU. |
The
And as Matt said above, there's no good batching solution for |
Hello,
It takes too long to parse the doc object, i.e to iterate over sentence and tokens in them. Is that expected ?
The above code takes few milliseconds (apart from initialisation) to run over 500 sentences,
while this takes almost a minute(apart from initialisation) to run over 500 sentences
P.S : Have put nlp.pipe() inside a for loop intentionally to get all tokens for one sentence even though it gets segmented.
The text was updated successfully, but these errors were encountered: