-
Notifications
You must be signed in to change notification settings - Fork 200
Open
Labels
internalThis issue is complex and should be handled by maintainers.This issue is complex and should be handled by maintainers.meta-issueLists a bunch of tasksLists a bunch of tasks
Description
In its current form, the fuzzy join has some limitations in the effectiveness of its joins, its scalability, and its performance.
This issue collects various other issues that are related to the improvement of the fuzzy join.
- ENH - Allow using a different distance for the nearest neighbors in fuzzy join #869
- We might want to implement the distance suggested in the issue, or other distance metrics
- fuzzy_join: dense vs sparse arrays #558
- Add a fuzzy version of the AggJoiner #1289
- Optimize the runtime performance of the fuzzy join
- This would involve benchmarking and profiling the current version
- Implement a stateless, only numeric fuzzy join
- This would be useful for joins on geographic data (for example). Whatever implementation we have won't be as good as a dedicated solution, but it should be "good enough" while avoiding heavy dependencies like geopandas, or using full GIS databases.
Metadata
Metadata
Assignees
Labels
internalThis issue is complex and should be handled by maintainers.This issue is complex and should be handled by maintainers.meta-issueLists a bunch of tasksLists a bunch of tasks