You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am having to search through bodies of text for specific words which may be non-normalized; which is to (perhaps incorrectly) say they have the possibility of being plural, singular, or conjugated in some odd way. This idea is also true of the search query that is being compared against each word in the target body of text. I would like to use the compromise library to solve this problem by perhaps normalizing both the target processed word, along with the query word, and then check if they are the same in their most basic form.
On the examples for root matches, it seems like this would be where my issue would be solved, but the following code does not yield the expected results (a positive match):
The expected output would be "Palatability", but the above produces no search results found.
Am I doing something wrong with my implementation?
Thank you for your time, and I do hope this message finds you well.
Edit:
I ran the above "palatability" through a variety of online stemmers, and found it correctly correlated to the resulting "palat", but code such as the below snippet would not produce this result. The same is true with "goodness" being incorrectly left in it's non-root form, wherein the root form would then be "good".
nlp('palatability').text('root')// produces "palatability", should be "palat"nlp('goodness').text('root')// produces "goodness", should be "good"
The text was updated successfully, but these errors were encountered:
Hey Cal - yep, you're right. There's a soft-spot with this 'noun-ing' of verbs and adjectives, that I've gone back and forth about, a few times.
The problem is not the conjugation, but that some percentage of these just sound silly, and it's hard to machine-learn which ones.
You can see we kept the +'ness' adjective conjugation here, which produces some strangeness itself.
I think the verb+'ability' form may be the same. Browse through our verb-list and try to guess which percent are good-sounding, like 'walkability', and what percent are awkward-enough to be wrong, like 'backfire', 'baffle'. I don't know, It's a odd problem.
That said, maybe the root lookup should quietly generate these, in order to grab the true-positives, like 'palatability'. It wouldn't be hard, as I think it is a pretty-simple conjugation.
Maybe it would help to find, or generate some data, on how big of a problem this is. If there are only 100 cases, we could hard-code them. If it effects half of verbs, maybe we could look at their suffixes for patterns. Otherwise, if verb+'ability' is okay 90%, I can just add it in.
I am having to search through bodies of text for specific words which may be non-normalized; which is to (perhaps incorrectly) say they have the possibility of being plural, singular, or conjugated in some odd way. This idea is also true of the search query that is being compared against each word in the target body of text. I would like to use the compromise library to solve this problem by perhaps normalizing both the target processed word, along with the query word, and then check if they are the same in their most basic form.
On the examples for root matches, it seems like this would be where my issue would be solved, but the following code does not yield the expected results (a positive match):
The expected output would be "Palatability", but the above produces no search results found.
Am I doing something wrong with my implementation?
Thank you for your time, and I do hope this message finds you well.
Edit:
I ran the above "palatability" through a variety of online stemmers, and found it correctly correlated to the resulting "palat", but code such as the below snippet would not produce this result. The same is true with "goodness" being incorrectly left in it's non-root form, wherein the root form would then be "good".
The text was updated successfully, but these errors were encountered: