Input already tokenized array #72
-
Hello, The text I have is already tokenized as an array of strings and although I can use the 'as' helper wrongly eg. like this as.bow(alreadyTokenizedText) to get a bag of words, there are a lot of limitations with its other functions or the 'its' helper. Thank you for your time. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
Hi @Nireas1, We are not very clear about the exact problem you are experiencing using winkNLP. BM25 Vectorization and Similarity features are very much available in winkNLP taking tokenized text as an input. Please refer to the documentation . Request you to share the use case along with the problem statement. Thanks, |
Beta Was this translation helpful? Give feedback.
-
Hi, Thanks for elaborating on the requirement. Your need is unique for winkNLP as it is a processing pipe which is handling all the annotations with You will have to follow a hybrid approach in such a case, like you already have tried. Here are some how-tos on winkNLP. Cheers, |
Beta Was this translation helpful? Give feedback.
Hi,
Thanks for elaborating on the requirement.
Your need is unique for winkNLP as it is a processing pipe which is handling all the annotations with
readDoc('input text')
. These annotations also include the token properties such as stop word, abbreviations, token type, negation and many more. Thus, an only array of tokens is not sufficient to use helpers.You will have to follow a hybrid approach in such a case, like you already have tried.
Here are some how-tos on winkNLP.
Cheers,
Rachna