-
Notifications
You must be signed in to change notification settings - Fork 22
chore: use exact for fuzziness threshold 1 #1262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…thub.com/cap-js/cds-dbs into chore-use-exact-for-fuzzinessThreshold-1
// rewrite ref to xpr to mix in search config | ||
// ensure in place modification to reuse .toString method that ensures quoting | ||
e.xpr = [{ ref: e.ref }, fuzzy] | ||
delete e.ref | ||
}) | ||
} else { | ||
ref = `${ref} FUZZY MINIMAL TOKEN SCORE ${fuzzyIndex} SIMILARITY CALCULATION MODE 'search'` | ||
if (fuzzyIndex === 1) | ||
ref = `${ref} EXACT MINIMAL SCORE 1 search mode 'text'` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ref = `${ref} EXACT MINIMAL SCORE 1 search mode 'text'` | |
ref = `${ref} EXACT` |
according to java tests, this is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johannes-vogel
For the search mode java also always wraps the search term in wildcards like: *<term>*
.
But with placeholders this is not possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Java use placeholders? Otherwise this opens doors for SQL injection since the search term comes from end user?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of the discussions is to always forward the $search
string directly into the score
function. If customers expect to use wildcard characters like *
they should include it inside their search field or the application developer has to include it inside the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johannes-vogel They are generating prepared statements like SCORE ? IN ... and the value for ? is *<SearchTerm>*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean Java deviates from the agreement that search is an arbitrary string that is used as is in score function? It looks to me at least that way...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johannes-vogel with EXACT MINIMAL SCORE 1 search mode 'text'
the search term will be interpreted as a whole string. E.g. for the string 'this is a test':
search term:
- "this" --> found
- "is a" --> found
- "this test" --> not found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hana colleagues wrote:
SCORE('PR00690415' IN txt FUZZY MINIMAL TOKEN SCORE 1 SIMILARITY CALCULATION MODE 'search')
MINIMAL TOKEN SCORE gets only used on string search (SEARCH MODE 'text' ).
If SEARCH MODE 'text' is set, full text search is executed.
If not set, only a string-like search without tokenisation is done.
This means your term above is equivalent to:
SCORE('PR00690415' IN txt FUZZY MINIMAL SCORE 0.8 SIMILARITY CALCULATION MODE 'search')
default value of minimal score is 0.8.
@johannes-vogel How should we proceed from here?
Use
EXACT MINIMAL SCORE
instead ofFUZZY MINIMAL TOKEN SCORE
for a fuzziness threshold of 1.Therefore, a fuzziness threshold of 1 now requires an exact match.