Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address discrepancies in docs #36

Merged
merged 1 commit into from
Apr 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 3 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ target = rng.rand(100,50)
# fit and get neighbors
k_inst = Kiez()
k_inst.fit(source, target)
nn_dist, nn_ind = k_inst.kneighbors()
nn_dist, nn_ind = k_inst.kneighbors(5)
```
Using (A)NN libraries and hubness reduction methods:
``` python
Expand All @@ -65,18 +65,16 @@ rng = np.random.RandomState(0)
source = rng.rand(100,50)
target = rng.rand(100,50)
# prepare algorithm and hubness reduction
algo_kwargs = {"n_candidates": 10}
k_inst = Kiez(n_neighbors=5, algorithm="Faiss" algorithm_kwargs=algo_kwargs, hubness="CSLS")
k_inst = Kiez(n_candidates=10, algorithm="Faiss", hubness="CSLS")
# fit and get neighbors
k_inst.fit(source, target)
nn_dist, nn_ind = k_inst.kneighbors()
nn_dist, nn_ind = k_inst.kneighbors(5)
```

## Torch Support
Beginning with version 0.5.0 torch can be used, when using `Faiss` as NN library:

```python

from kiez import Kiez
import torch
source = torch.randn((100,10))
Expand All @@ -89,7 +87,6 @@ Beginning with version 0.5.0 torch can be used, when using `Faiss` as NN library
You can also utilize tensor on the GPU:

```python

k_inst = Kiez(algorithm="Faiss", algorithm_kwargs={"use_gpu":True}, hubness="CSLS")
k_inst.fit(source.cuda(), target.cuda())
nn_dist, nn_ind = k_inst.kneighbors()
Expand Down
10 changes: 3 additions & 7 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The central class of kiez serves to bundle all necessary steps to obtain nearest
# fit and get neighbors
k_inst = Kiez()
k_inst.fit(source, target)
nn_dist, nn_ind = k_inst.kneighbors()
nn_dist, nn_ind = k_inst.kneighbors(5)

The main feature of kiez lies in the ability to use hubness reduction methods and approximate nearest neighbor (ANN) algorithms. This enables you to profit from the speed advantage of ANN algorithms, while achieving highly accurate nearest neighbor results:

Expand All @@ -34,14 +34,10 @@ The main feature of kiez lies in the ability to use hubness reduction methods an
source = rng.rand(100,50)
target = rng.rand(100,50)
# prepare algorithm and hubness reduction
from kiez.neighbors import HNSW
hnsw = HNSW(n_candidates=10)
from kiez.hubness_reduction import CSLS
hr = CSLS()
k_inst = Kiez(n_candidates=10, algorithm="Faiss", hubness="CSLS")
# fit and get neighbors
k_inst = Kiez(n_neighbors=5, algorithm=hnsw, hubness=hr)
k_inst.fit(source, target)
nn_dist, nn_ind = k_inst.kneighbors()
nn_dist, nn_ind = k_inst.kneighbors(5)

You can install kiez via pip:

Expand Down
21 changes: 6 additions & 15 deletions docs/source/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ The `Kiez` class enables the usage of different nearest neighbor (NN) algorithms

# via string and arguments as dict
k_inst = Kiez(
algorithm="HNSW",
algorithm_kwargs={"n_candidates": 10},
algorithm="SklearnNN",
n_candidates=10,
hubness="LocalScaling",
hubness_kwargs={"method": "NICDM"},
)
Expand All @@ -20,8 +20,8 @@ The `Kiez` class enables the usage of different nearest neighbor (NN) algorithms
from kiez.neighbors import HNSW

k_inst = Kiez(
algorithm=HNSW,
algorithm_kwargs={"n_candidates": 10},
algorithm=SklearnNN,
n_candidates=10,
hubness=LocalScaling,
hubness_kwargs={"method": "NICDM"},
)
Expand All @@ -35,7 +35,7 @@ The `Kiez` class enables the usage of different nearest neighbor (NN) algorithms

# content of 'conf.json' file
# {
# "algorithm": "HNSW",
# "algorithm": "SklearnNN",
# "algorithm_kwargs": {
# "n_candidates": 10
# },
Expand All @@ -57,17 +57,10 @@ With your initialized kiez instance you are ready to fit your data and retrieve
source = rng.rand(100,50)
target = rng.rand(100,50)
k_inst.fit(source, target)
neigh_dist, neigh_ind = k_inst.kneighbors()
neigh_dist, neigh_ind = k_inst.kneighbors(5)

This will retrieve all nearest neighbors of source entities in the target entities.

You can also query for specific entities and a specific number of k neighbors:

.. code-block:: python

neigh_dist, neigh_ind = k_inst.kneighbors()
# get 2 nearest neighbors of the first 5 source entities
k_inst.kneighbors(source_query_points=source[:5,:], k=2)

Single source case
-------------------
Expand All @@ -82,8 +75,6 @@ While the main focus of kiez is to be part of an embedding-based entity resoluti

# get the nearest neighbors of all source entities amongst themselves
k_inst.kneighbors()
# get 2 nearest neighbors of the first 5 source entities
k_inst.kneighbors(source_query_points=source[:5,:], k=2)

Evaluation
----------
Expand Down
18 changes: 5 additions & 13 deletions kiez/kiez.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ class Kiez:

Parameters
----------
n_neighbors : int, default=5
number of nearest neighbors used in search
n_candidates : int, default=10
number of nearest neighbors used for candidate search
algorithm : :obj:`~kiez.neighbors.NNAlgorithm`, default = None
initialised `NNAlgorithm` object that will be used for neighbor search
If no algorithm is provided :obj:`~kiez.neighbors.Faiss` is used if available else
Expand Down Expand Up @@ -53,7 +53,7 @@ class Kiez:
>>> # fit and get neighbors
>>> k_inst = Kiez()
>>> k_inst.fit(source, target)
>>> nn_dist, nn_ind = k_inst.kneighbors()
>>> nn_dist, nn_ind = k_inst.kneighbors(5)

Using a specific algorithm and hubness reduction

Expand All @@ -64,18 +64,10 @@ class Kiez:
>>> source = rng.rand(100,50)
>>> target = rng.rand(100,50)
>>> # prepare algorithm and hubness reduction
>>> from kiez.neighbors import NMSLIB
>>> hnsw = NMSLIB(n_candidates=10)
>>> from kiez.hubness_reduction import CSLS
>>> hr = CSLS()
>>> k_inst = Kiez(n_candidates=10, algorithm="Faiss", hubness="CSLS")
>>> # fit and get neighbors
>>> k_inst = Kiez(n_neighbors=5, algorithm=hnsw, hubness=hr)
>>> k_inst.fit(source, target)
>>> nn_dist, nn_ind = k_inst.kneighbors()

NN and hubness algorithms can also be supplied via string:

>>> k_inst = Kiez(algorithm="SklearnNN", hubness="CSLS")
>>> nn_dist, nn_ind = k_inst.kneighbors(5)

You can investigate which NN algos are installed and which hubness methods are implemented with:

Expand Down
2 changes: 1 addition & 1 deletion tests/example_conf.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"algorithm": "NMSLIB",
"algorithm": "SklearnNN",
"algorithm_kwargs": {
"n_candidates": 10
},
Expand Down
4 changes: 3 additions & 1 deletion tests/test_kiez.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,9 @@ def test_from_config():
assert isinstance(kiez.hubness, LocalScaling), f"wrong hubness: {kiez.hubness}"
assert kiez.algorithm is not None
assert isinstance(kiez.algorithm, NNAlgorithm)
assert isinstance(kiez.algorithm, NMSLIB), f"wrong algorithm: {kiez.algorithm}"
assert isinstance(
kiez.algorithm, SklearnNN
), f"wrong algorithm: {kiez.algorithm}"


def mock_make(name, algorithm_kwargs):
Expand Down
Loading