Update docs #7

dobraczka · 2021-07-28T07:46:52Z

Explain the new possibilities enabled by class-resolver
Incorporate architecture picture with explanation
link to readthedocs in readme
Give new examples (including for single-source)

cthoyt · 2021-07-28T07:55:19Z

It would be great to make an example where Kiez is applied to the embeddings coming from PyKEEN. This tutorial shows how to get embeddings out after training. The following example shows it for TransE:

from pykeen.pipeline import pipeline

results = pipeline(
    model='TransE',
    dataset='Nations',
    epochs=1,  # change this to ~25 for real usage on Nations
)

entity_embeddings   = results.model.entity_representations[0]()    # is torch.Size([14, 50])
relation_embeddings = results.model.relation_representations[0]()  # is torch.Size([55, 50])

These are torch tensors, which can be converted to numpy ndarrays of the same dimensions with

entity_embeddings   = results.model.entity_representations[0]().detach().numpy()
relation_embeddings = results.model.relation_representations[0]().detach().numpy()

A kiez API function could probably do some logic to accept either a pipeline result, model, or pykeen embedding, like:

from pykeen.models import Model
from pykeen.pipeline import PipelineResult
from pykeen.nn import Embedding
from typing import Union

def from_pykeen(model: Union[Model, PipelineResult, Embedding]):
    if isinstance(model, Embedding):
        representation = model
    elif isinstance(model, Model):
        representation = model.entity_representations[0]
    elif isinstance(model, PipelineResult):
        representation = model.model.entity_representations[0]
    else:
        raise TypeError
    arr = representation().detach().numpy()
    ...

dobraczka · 2021-07-30T14:08:18Z

The main purpose of Kiez (for now) is to use it for entity resolution, i.e. I have entity embeddings of two datasources (source & target) and want to find the nearest neighbors of source entities in the target space.
However using it to find nearest neighbors within a single source is technically already possible, e.g.

from kiez import Kiez
source = ... # get embeddings from somewhere
k_inst = Kiez()
k_inst.fit(source, source)
k_inst.kneighbors()

I want to adapt the API to make this use-case more intuitively available. In the course of that I would add your mentioned example in the doc and implement the convenience function for pykeen.

cthoyt · 2021-07-30T14:48:51Z

Oh I see. There are a few pykeen datasets that are constructed such that they contain 2 knowledge graphs with support edges linking the same entity in each (like a english and german version of the same graph, with different completeness) but none of them are directly accessible to give a really good example at the moment

dobraczka · 2021-08-09T10:09:57Z

I will close this and we can continue to talk about use cases in #11

cthoyt · 2021-08-09T11:24:41Z

@dobraczka great! Looking forward to it.

dobraczka added the 📜 documentation Improvements or additions to documentation label Jul 28, 2021

dobraczka self-assigned this Jul 28, 2021

dobraczka mentioned this issue Jul 30, 2021

Improve single source handling #8

Closed

dobraczka closed this as completed Aug 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update docs #7

Update docs #7

dobraczka commented Jul 28, 2021 •

edited

Loading

cthoyt commented Jul 28, 2021 •

edited

Loading

dobraczka commented Jul 30, 2021

cthoyt commented Jul 30, 2021

dobraczka commented Aug 9, 2021

cthoyt commented Aug 9, 2021

Update docs #7

Update docs #7

Comments

dobraczka commented Jul 28, 2021 • edited Loading

cthoyt commented Jul 28, 2021 • edited Loading

dobraczka commented Jul 30, 2021

cthoyt commented Jul 30, 2021

dobraczka commented Aug 9, 2021

cthoyt commented Aug 9, 2021

dobraczka commented Jul 28, 2021 •

edited

Loading

cthoyt commented Jul 28, 2021 •

edited

Loading