Skip to content

Conversation

larsplessing
Copy link
Contributor

@larsplessing larsplessing commented Jul 2, 2025

Use EXACT MINIMAL SCORE instead of FUZZY MINIMAL TOKEN SCORE for a fuzziness threshold of 1.
Therefore, a fuzziness threshold of 1 now requires an exact match.

@larsplessing larsplessing requested a review from BobdenOs July 2, 2025 10:03
// rewrite ref to xpr to mix in search config
// ensure in place modification to reuse .toString method that ensures quoting
e.xpr = [{ ref: e.ref }, fuzzy]
delete e.ref
})
} else {
ref = `${ref} FUZZY MINIMAL TOKEN SCORE ${fuzzyIndex} SIMILARITY CALCULATION MODE 'search'`
if (fuzzyIndex === 1)
ref = `${ref} EXACT MINIMAL SCORE 1 search mode 'text'`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ref = `${ref} EXACT MINIMAL SCORE 1 search mode 'text'`
ref = `${ref} EXACT`

according to java tests, this is sufficient.

Copy link
Contributor Author

@larsplessing larsplessing Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johannes-vogel
For the search mode java also always wraps the search term in wildcards like: *<term>*.
But with placeholders this is not possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Java use placeholders? Otherwise this opens doors for SQL injection since the search term comes from end user?!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the discussions is to always forward the $search string directly into the score function. If customers expect to use wildcard characters like * they should include it inside their search field or the application developer has to include it inside the request.

Copy link
Contributor Author

@larsplessing larsplessing Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johannes-vogel They are generating prepared statements like SCORE ? IN ... and the value for ? is *<SearchTerm>*

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean Java deviates from the agreement that search is an arbitrary string that is used as is in score function? It looks to me at least that way...

Copy link
Contributor Author

@larsplessing larsplessing Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johannes-vogel with EXACT MINIMAL SCORE 1 search mode 'text' the search term will be interpreted as a whole string. E.g. for the string 'this is a test':
search term:

  • "this" --> found
  • "is a" --> found
  • "this test" --> not found

Copy link
Contributor Author

@larsplessing larsplessing Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hana colleagues wrote:

SCORE('PR00690415' IN txt FUZZY MINIMAL TOKEN SCORE 1 SIMILARITY CALCULATION MODE 'search')

MINIMAL TOKEN SCORE gets only used on string search (SEARCH MODE 'text' ).

If SEARCH MODE 'text' is set, full text search is executed.

If not set, only a string-like search without tokenisation is done.

This means your term above is equivalent to:

SCORE('PR00690415' IN txt FUZZY MINIMAL SCORE 0.8 SIMILARITY CALCULATION MODE 'search')


default value of minimal score is 0.8.

@johannes-vogel How should we proceed from here?

@johannes-vogel johannes-vogel added the next release pr to be checked for next release label Jul 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
next release pr to be checked for next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants