Skip to content

Commit

Permalink
Address discrepancies in docs (#36)
Browse files Browse the repository at this point in the history
  • Loading branch information
dobraczka authored Apr 23, 2024
1 parent 4c965ba commit f58ed3b
Show file tree
Hide file tree
Showing 6 changed files with 21 additions and 43 deletions.
9 changes: 3 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ target = rng.rand(100,50)
# fit and get neighbors
k_inst = Kiez()
k_inst.fit(source, target)
nn_dist, nn_ind = k_inst.kneighbors()
nn_dist, nn_ind = k_inst.kneighbors(5)
```
Using (A)NN libraries and hubness reduction methods:
``` python
Expand All @@ -65,18 +65,16 @@ rng = np.random.RandomState(0)
source = rng.rand(100,50)
target = rng.rand(100,50)
# prepare algorithm and hubness reduction
algo_kwargs = {"n_candidates": 10}
k_inst = Kiez(n_neighbors=5, algorithm="Faiss" algorithm_kwargs=algo_kwargs, hubness="CSLS")
k_inst = Kiez(n_candidates=10, algorithm="Faiss", hubness="CSLS")
# fit and get neighbors
k_inst.fit(source, target)
nn_dist, nn_ind = k_inst.kneighbors()
nn_dist, nn_ind = k_inst.kneighbors(5)
```

## Torch Support
Beginning with version 0.5.0 torch can be used, when using `Faiss` as NN library:

```python

from kiez import Kiez
import torch
source = torch.randn((100,10))
Expand All @@ -89,7 +87,6 @@ Beginning with version 0.5.0 torch can be used, when using `Faiss` as NN library
You can also utilize tensor on the GPU:

```python

k_inst = Kiez(algorithm="Faiss", algorithm_kwargs={"use_gpu":True}, hubness="CSLS")
k_inst.fit(source.cuda(), target.cuda())
nn_dist, nn_ind = k_inst.kneighbors()
Expand Down
10 changes: 3 additions & 7 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The central class of kiez serves to bundle all necessary steps to obtain nearest
# fit and get neighbors
k_inst = Kiez()
k_inst.fit(source, target)
nn_dist, nn_ind = k_inst.kneighbors()
nn_dist, nn_ind = k_inst.kneighbors(5)
The main feature of kiez lies in the ability to use hubness reduction methods and approximate nearest neighbor (ANN) algorithms. This enables you to profit from the speed advantage of ANN algorithms, while achieving highly accurate nearest neighbor results:

Expand All @@ -34,14 +34,10 @@ The main feature of kiez lies in the ability to use hubness reduction methods an
source = rng.rand(100,50)
target = rng.rand(100,50)
# prepare algorithm and hubness reduction
from kiez.neighbors import HNSW
hnsw = HNSW(n_candidates=10)
from kiez.hubness_reduction import CSLS
hr = CSLS()
k_inst = Kiez(n_candidates=10, algorithm="Faiss", hubness="CSLS")
# fit and get neighbors
k_inst = Kiez(n_neighbors=5, algorithm=hnsw, hubness=hr)
k_inst.fit(source, target)
nn_dist, nn_ind = k_inst.kneighbors()
nn_dist, nn_ind = k_inst.kneighbors(5)
You can install kiez via pip:

Expand Down
21 changes: 6 additions & 15 deletions docs/source/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ The `Kiez` class enables the usage of different nearest neighbor (NN) algorithms
# via string and arguments as dict
k_inst = Kiez(
algorithm="HNSW",
algorithm_kwargs={"n_candidates": 10},
algorithm="SklearnNN",
n_candidates=10,
hubness="LocalScaling",
hubness_kwargs={"method": "NICDM"},
)
Expand All @@ -20,8 +20,8 @@ The `Kiez` class enables the usage of different nearest neighbor (NN) algorithms
from kiez.neighbors import HNSW
k_inst = Kiez(
algorithm=HNSW,
algorithm_kwargs={"n_candidates": 10},
algorithm=SklearnNN,
n_candidates=10,
hubness=LocalScaling,
hubness_kwargs={"method": "NICDM"},
)
Expand All @@ -35,7 +35,7 @@ The `Kiez` class enables the usage of different nearest neighbor (NN) algorithms
# content of 'conf.json' file
# {
# "algorithm": "HNSW",
# "algorithm": "SklearnNN",
# "algorithm_kwargs": {
# "n_candidates": 10
# },
Expand All @@ -57,17 +57,10 @@ With your initialized kiez instance you are ready to fit your data and retrieve
source = rng.rand(100,50)
target = rng.rand(100,50)
k_inst.fit(source, target)
neigh_dist, neigh_ind = k_inst.kneighbors()
neigh_dist, neigh_ind = k_inst.kneighbors(5)
This will retrieve all nearest neighbors of source entities in the target entities.

You can also query for specific entities and a specific number of k neighbors:

.. code-block:: python
neigh_dist, neigh_ind = k_inst.kneighbors()
# get 2 nearest neighbors of the first 5 source entities
k_inst.kneighbors(source_query_points=source[:5,:], k=2)

Single source case
-------------------
Expand All @@ -82,8 +75,6 @@ While the main focus of kiez is to be part of an embedding-based entity resoluti
# get the nearest neighbors of all source entities amongst themselves
k_inst.kneighbors()
# get 2 nearest neighbors of the first 5 source entities
k_inst.kneighbors(source_query_points=source[:5,:], k=2)
Evaluation
----------
Expand Down
18 changes: 5 additions & 13 deletions kiez/kiez.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ class Kiez:
Parameters
----------
n_neighbors : int, default=5
number of nearest neighbors used in search
n_candidates : int, default=10
number of nearest neighbors used for candidate search
algorithm : :obj:`~kiez.neighbors.NNAlgorithm`, default = None
initialised `NNAlgorithm` object that will be used for neighbor search
If no algorithm is provided :obj:`~kiez.neighbors.Faiss` is used if available else
Expand Down Expand Up @@ -53,7 +53,7 @@ class Kiez:
>>> # fit and get neighbors
>>> k_inst = Kiez()
>>> k_inst.fit(source, target)
>>> nn_dist, nn_ind = k_inst.kneighbors()
>>> nn_dist, nn_ind = k_inst.kneighbors(5)
Using a specific algorithm and hubness reduction
Expand All @@ -64,18 +64,10 @@ class Kiez:
>>> source = rng.rand(100,50)
>>> target = rng.rand(100,50)
>>> # prepare algorithm and hubness reduction
>>> from kiez.neighbors import NMSLIB
>>> hnsw = NMSLIB(n_candidates=10)
>>> from kiez.hubness_reduction import CSLS
>>> hr = CSLS()
>>> k_inst = Kiez(n_candidates=10, algorithm="Faiss", hubness="CSLS")
>>> # fit and get neighbors
>>> k_inst = Kiez(n_neighbors=5, algorithm=hnsw, hubness=hr)
>>> k_inst.fit(source, target)
>>> nn_dist, nn_ind = k_inst.kneighbors()
NN and hubness algorithms can also be supplied via string:
>>> k_inst = Kiez(algorithm="SklearnNN", hubness="CSLS")
>>> nn_dist, nn_ind = k_inst.kneighbors(5)
You can investigate which NN algos are installed and which hubness methods are implemented with:
Expand Down
2 changes: 1 addition & 1 deletion tests/example_conf.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"algorithm": "NMSLIB",
"algorithm": "SklearnNN",
"algorithm_kwargs": {
"n_candidates": 10
},
Expand Down
4 changes: 3 additions & 1 deletion tests/test_kiez.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,9 @@ def test_from_config():
assert isinstance(kiez.hubness, LocalScaling), f"wrong hubness: {kiez.hubness}"
assert kiez.algorithm is not None
assert isinstance(kiez.algorithm, NNAlgorithm)
assert isinstance(kiez.algorithm, NMSLIB), f"wrong algorithm: {kiez.algorithm}"
assert isinstance(
kiez.algorithm, SklearnNN
), f"wrong algorithm: {kiez.algorithm}"


def mock_make(name, algorithm_kwargs):
Expand Down

0 comments on commit f58ed3b

Please sign in to comment.