-
-
Notifications
You must be signed in to change notification settings - Fork 64
[WIP] Limited version of spelling correction #1007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is a prototype version of spelling correction attempting to mirror the client's implementation at https://github.com/gremid/xapian-spelling-suggestions/ For an unknown reason the new unit test fails as follows: [ RUN ] Suggestion.spellingSuggestions Resolve redirect set index test/suggestion.cpp:835: Failure Expected equality of these values: getSpellingSuggestions(a, "Tsunge", 1) Which is: {} std::vector<std::string> ({"Zunge"}) Which is: { "Zunge" } test/suggestion.cpp:841: Failure Expected equality of these values: getSpellingSuggestions(a, "Lax", 1) Which is: {} std::vector<std::string> ({"Lachs"}) Which is: { "Lachs" } test/suggestion.cpp:842: Failure Expected equality of these values: getSpellingSuggestions(a, "Mont", 1) Which is: {} std::vector<std::string> ({"Mond"}) Which is: { "Mond" } test/suggestion.cpp:845: Failure Expected equality of these values: getSpellingSuggestions(a, "Trok", 1) Which is: {} std::vector<std::string> ({"Trog"}) Which is: { "Trog" } test/suggestion.cpp:850: Failure Expected equality of these values: getSpellingSuggestions(a, "Son", 1) Which is: {} std::vector<std::string> ({"Sohn"}) Which is: { "Sohn" } test/suggestion.cpp:852: Failure Expected equality of these values: getSpellingSuggestions(a, "Grahl", 1) Which is: { "Stuhl" } std::vector<std::string> ({"Gral"}) Which is: { "Gral" } test/suggestion.cpp:861: Failure Expected equality of these values: getSpellingSuggestions(a, "aba", 1) Which is: {} std::vector<std::string> ({"aber"}) Which is: { "aber" } test/suggestion.cpp:880: Failure Expected equality of these values: getSpellingSuggestions(a, "Füreschein", 1) Which is: {} std::vector<std::string> ({"Führerschein"}) Which is: { "F\xC3\xBChrerschein" As Text: "Führerschein" } [ FAILED ] Suggestion.spellingSuggestions (280 ms)
4c7c178 to
04d82ad
Compare
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (59.25%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #1007 +/- ##
==========================================
+ Coverage 58.13% 58.14% +0.01%
==========================================
Files 101 102 +1
Lines 5384 5462 +78
Branches 2197 2234 +37
==========================================
+ Hits 3130 3176 +46
- Misses 795 798 +3
- Partials 1459 1488 +29 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
... and fixed a bug in the test data.
This reduced the count of failures in the Suggestion.spellingSuggestions
unit test from 8 to 1:
[ RUN ] Suggestion.spellingSuggestions
Resolve redirect
set index
test/suggestion.cpp:841: Failure
Expected equality of these values:
getSpellingSuggestions(a, "Lax", 1)
Which is: {}
std::vector<std::string> ({"Lachs"})
Which is: { "Lachs" }
[ FAILED ] Suggestion.spellingSuggestions (260 ms)
The spelling correction "Lax -> Lachs" is not returned because the max
edit distance is capped at (length(query_word) - 1) which reduces our
passed value of the max edit distance argument from 3 to 2.
This problem disappears if the version of libxapian found on Ubuntu
22.04 (libxapian.so.30.11.0) is used instead of the one that we build
ourselves as a base dependency (libxapian.so.30.12.4).
b530190 to
b325bff
Compare
|
@veloman-yunkan Can we close this because kiwix/libkiwix#1230 superseed it? Do we have anything else left interesting which has not been put in kiwix/libkiwix#1230? |
|
Superseded by kiwix/libkiwix#1230 |
This PR is a less ambitious version of #994 intended to deliver a new feature in a more limited form as soon as possible.
Fixes #731 (will open other issues for future improvements)