Add findnearest_partial
and findall_partial
#44
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds two functions that act like
Partial
except give indices of the match (or of all matches) in addition to the closest distance. I've found this useful for fuzzy matching keywords in long documents (via KeywordSearch.jl, which adds some types and more API on top of these functions). Having the indices in that context is important since it lets you quickly check if the match looks genuine or spurious (i.e. does it look like a typo, or is it a completely different word). To get the indices of the match, we need the collection to be indexable, not just iterable. That's more restrictive than the rest of StringDistances but I think it's necessary for this functionality to work.Let me know what you think; this code can stay in KeywordSearch if you prefer, but I thought it might be more useful here.
Closes #29 which was an initial version of this.