-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use case: assessing goodness of fit between two PyKEEN models #11
Comments
I'm afraid this would not really work properly, because Kiez assumes that the two embeddings are in the same space and based on that performs the hubness reduction and returns the k nearest neighbors of the source entities in the target entity space. However what you are proposing is a really interesting investigation to determine how similar the results of different embedding approaches are. from kiez import Kiez
from pykeen.datasets import Nations
from pykeen.pipeline import pipeline
dataset = Nations()
r1 = pipeline(model="TransE", dataset=dataset)
r2 = pipeline(model="PairRE", dataset=dataset)
k_inst_transe = Kiez()
# old single-source api usage since I haven't released the patch yet
k_inst_transe.fit(
r1.model.entity_representations[0]().detach().numpy(),
r1.model.entity_representations[0]().detach().numpy(),
)
k_inst_pairre = Kiez()
k_inst_pairre.fit(
r2.model.entity_representations[0]().detach().numpy(),
r2.model.entity_representations[0]().detach().numpy(),
)
transe_k_neighbors = k_inst_transe.kneighbors(return_distance=False)
pairre_k_neighbors = k_inst_pairre.kneighbors(return_distance=False)
def overlap(left_neighbors, right_neighbors):
perfect_matching_rolling_percentage_sum = 0
for l_ent_neighbors, r_ent_neighbors in zip(left_neighbors, right_neighbors):
matching = 0
for l, r in zip(l_ent_neighbors, r_ent_neighbors):
if l == r:
matching += 1
perfect_matching_rolling_percentage_sum += matching / len(l_ent_neighbors)
return perfect_matching_rolling_percentage_sum / len(left_neighbors)
print(overlap(transe_k_neighbors, pairre_k_neighbors)) But I think there should be a more clever metric, that takes into account how distantly the respective neighbors are ranked. I'd have to think about that one... |
If I have two different embedding spaces describing the same entities, like if I train two models on the same dataset in PyKEEN, how can I use Kiez to assess how good they correspond? Or maybe there's a notion of how "good" the Kiez fit is?
A naive idea is I could I iterate through each entity and calculate the overlap coefficient of the nearest neighbors in both embedding spaces, then maybe report the average overlap coefficient. I'm sure I could come up with a few things like this, but I bet you know better! Any ideas appreciated.
I would start with code like this:
The text was updated successfully, but these errors were encountered: