Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when "create_inverse_triples" is set to false for ConvE, the evaluation becomes extremely slow. #1482

Open
3 tasks done
g-yit opened this issue Nov 8, 2024 · 8 comments
Open
3 tasks done
Labels
bug Something isn't working

Comments

@g-yit
Copy link

g-yit commented Nov 8, 2024

Describe the bug

In PyKEEN, when "create_inverse_triples" is set to false for ConvE, the evaluation becomes extremely slow.
image

How to reproduce

def replicate_conve():
    import pykeen.datasets
    import pykeen.models
    import pykeen.training
    import pykeen.optimizers
    import pykeen.evaluation
    from pykeen.losses import BCEAfterSigmoidLoss

    # Load the FB15K dataset
    dataset = pykeen.datasets.FB15k(create_inverse_triples=False)
    # Set up the loss function
    loss = BCEAfterSigmoidLoss(reduction='mean')
    # Initialize the ConvE model with the specified parameters
    model = pykeen.models.ConvE(
        embedding_dim=200,
        # input_channels=1,
        output_channels=32,
        embedding_height=10,
        embedding_width=20,
        kernel_height=3,
        kernel_width=3,
        input_dropout=0.2,
        feature_map_dropout=0.2,
        output_dropout=0.3,
        apply_batch_normalization=True,
        entity_initializer='xavier_normal',
        relation_initializer='xavier_normal',
        triples_factory=dataset.training,
        loss=loss
    ).to("cuda")

    # Set up the optimizer
    optimizer = pykeen.optimizers.Adam(
        params=model.get_grad_params(),
        lr=0.001
    )
    # Configure the training loop
    training_loop = pykeen.training.LCWATrainingLoop(
        model=model,
        triples_factory=dataset.training,
        optimizer=optimizer,
    )
    eval_callback = ZTrainingCallback(evaluation_triples=dataset.validation.mapped_triples,
                                      full_test_evaluation_triples=dataset.testing.mapped_triples,
                                      additional_filter_triples=dataset.training.mapped_triples)

    # Train the model
    training_loop.train(
        triples_factory=dataset.training,
        num_epochs=3,
        batch_size=128,
        label_smoothing=0.1,
        use_tqdm_batch=False,
        callbacks=[eval_callback],
    )

    # Evaluate the model
    evaluator = pykeen.evaluation.RankBasedEvaluator(filtered=True)
    results = evaluator.evaluate(
        model=model,
        mapped_triples=dataset.testing.mapped_triples,
        additional_filter_triples=[
            dataset.training.mapped_triples,
            dataset.validation.mapped_triples
        ]
    )

    # Output the results
    print(results.to_dict())

Environment

1.11.1-dev

Additional information

No response

Issue Template Checks

  • This is not a feature request (use a different issue template if it is)
  • This is not a question (use the discussions forum instead)
  • I've read the text explaining why including environment information is important and understand if I omit this information that my issue will be dismissed
@g-yit g-yit added the bug Something isn't working label Nov 8, 2024
@mberr
Copy link
Member

mberr commented Nov 8, 2024

Hi @g-yit ,

this is unfortunately to be expected since ConvE's interaction function is designed to allow fast 1:n tail scoring while being quite inefficient for head scoring. In the original paper, they use inverse relations, so they only need tail scoring 😉

Background:

ConvE uses an interaction function of the form $\langle f(h, r), t \rangle$, where $f(h, r)$ is an expensive operation. When you score tails, you can compute $f(h, r)$ just once, but for head scoring, you need to do this (number of entities) many times.

@g-yit
Copy link
Author

g-yit commented Nov 8, 2024

import pykeen.models
import pykeen.training
import pykeen.optimizers
import pykeen.evaluation
from pykeen.datasets import FB15k237
from pykeen.losses import SoftplusLoss
from pykeen.regularizers import PowerSumRegularizer
from pykeen.sampling import BernoulliNegativeSampler
from tests.ztest.callback.training_callback import ZTrainingCallback

# Load the FB15k-237 dataset
dataset = FB15k237()
regularizer = PowerSumRegularizer(
    weight=0.0005,
    p=2.0,
    apply_only_once=True,
    normalize=False
)
# Set up the loss function
loss = SoftplusLoss(reduction='mean')
# Initialize the ConvKB model with the specified parameters
model = pykeen.models.ConvKB(
    embedding_dim=100,
    num_filters=50,
    hidden_dropout_rate=0.0,
    entity_initializer='xavier_uniform',
    relation_initializer='xavier_uniform',
    triples_factory=dataset.training,
    regularizer=regularizer,
    loss=loss,
).to("cuda")

# Set up the regularizer


# Set up the optimizer
optimizer = pykeen.optimizers.Adam(
    params=model.get_grad_params(),
    lr=5e-06
)

# Set up the negative sampler
negative_sampler = BernoulliNegativeSampler(
    mapped_triples=dataset.training.mapped_triples
)

# Configure the training loop
training_loop = pykeen.training.SLCWATrainingLoop(
    model=model,
    triples_factory=dataset.training,
    optimizer=optimizer,
    negative_sampler=negative_sampler,
    negative_sampler_kwargs=dict(num_negs_per_pos=1)
)

# Train the model
training_loop.train(
    num_epochs=1,
    batch_size=256,
    triples_factory=dataset.training,
    use_tqdm_batch=False,
)

# Evaluate the model
evaluator = pykeen.evaluation.RankBasedEvaluator(filtered=True)
results = evaluator.evaluate(
    model=model,
    mapped_triples=dataset.testing.mapped_triples,
    additional_filter_triples=[
        dataset.training.mapped_triples,
        dataset.validation.mapped_triples
    ]
)
print(results.to_dict())

When running the above code for ConKB training, the validation process is also very slow.

The graphics card is an NVIDIA RTX 4090 with 24GB of memory.
image

@mberr
Copy link
Member

mberr commented Nov 8, 2024

ConvKB is also very slow in 1:n scoring (even worse than ConvE where at least one direction is fast). The default evaluation protocol is to use 1:n scoring. You could try to change to SampledRankBasedEvaluator for validation metrics, which does not score against all entities but only some sampled negative. Note however, that most metrics cannot directly be compared between sampled and full setting. These two papers [0] [1] describe a few metrics which are comparable, and those are also implemented in PyKEEN. Basically you need to look for metrics which have "adjusted" in their name.

@g-yit
Copy link
Author

g-yit commented Nov 9, 2024

Thank you. I used the configuration file from the experiment to build the conkb pipeline. There are two issues:

  1. The evaluation time is particularly long, at 1023.292813539505, while the training time is only 566.7723512649536.
  2. The training results differ significantly from those reported in the paper.

Did I configure something incorrectly? (You can also try the code mentioned above.) Is there any guidance on conkb training code that can yield the correct results?
image
训练的代码如下:
replicate_pipeline_from_path('E:\code\python\kge\pykeen_zyt\pykeen\src\pykeen\experiments\convkb\nguyen2018_convkb_fb15k237.json',directory="./",replicates = 1)

"metrics": {
"both": {
"optimistic": {
"adjusted_arithmetic_mean_rank": 0.49161833830752394,
"adjusted_arithmetic_mean_rank_index": 0.5084529168082013,
"adjusted_geometric_mean_rank_index": 0.8129274434993866,
"adjusted_hits_at_k": 0.04409761990771625,
"adjusted_inverse_harmonic_mean_rank": 0.028990788367274135,
"arithmetic_mean_rank": 3508.025711909189,
"count": 40876.0,
"geometric_mean_rank": 981.909951047591,
"harmonic_mean_rank": 33.689574794872044,
"hits_at_1": 0.00988355024953518,
"hits_at_10": 0.04476954692239945,
"hits_at_3": 0.037723847734611994,
"hits_at_5": 0.03953420099814072,
"inverse_arithmetic_mean_rank": 0.00028506062444330415,
"inverse_geometric_mean_rank": 0.0010184233278551754,
"inverse_harmonic_mean_rank": 0.029682772967268557,
"inverse_median_rank": 0.00039816842524387816,
"median_absolute_deviation": 3584.190863237293,
"median_rank": 2511.5,
"standard_deviation": 3545.3494154993555,
"variance": 12569502.477981621,
"z_arithmetic_mean_rank": 177.82301884691216,
"z_geometric_mean_rank": 164.6389409176076,
"z_hits_at_k": 336.15763297564826,
"z_inverse_harmonic_mean_rank": 545.9140874982392
},
"pessimistic": {
"adjusted_arithmetic_mean_rank": 0.4916184068763812,
"adjusted_arithmetic_mean_rank_index": 0.5084528482297332,
"adjusted_geometric_mean_rank_index": 0.812927421312853,
"adjusted_hits_at_k": 0.04409761990771625,
"adjusted_inverse_harmonic_mean_rank": 0.02899078831481532,
"arithmetic_mean_rank": 3508.0262011938544,
"count": 40876.0,
"geometric_mean_rank": 981.9100673820868,
"harmonic_mean_rank": 33.68957485436971,
"hits_at_1": 0.00988355024953518,
"hits_at_10": 0.04476954692239945,
"hits_at_3": 0.037723847734611994,
"hits_at_5": 0.03953420099814072,
"inverse_arithmetic_mean_rank": 0.0002850605846842533,
"inverse_geometric_mean_rank": 0.0010184232071946706,
"inverse_harmonic_mean_rank": 0.029682772914847125,
"inverse_median_rank": 0.00039816842524387816,
"median_absolute_deviation": 3584.190863237293,
"median_rank": 2511.5,
"standard_deviation": 3545.349755856654,
"variance": 12569504.891352836,
"z_arithmetic_mean_rank": 177.82299486272447,
"z_geometric_mean_rank": 164.63893642425776,
"z_hits_at_k": 336.15763297564826,
"z_inverse_harmonic_mean_rank": 545.9140865104079
},
"realistic": {
"adjusted_arithmetic_mean_rank": 0.4916183617106642,
"adjusted_arithmetic_mean_rank_index": 0.5084528934017807,
"adjusted_geometric_mean_rank_index": 0.8129274160047766,
"adjusted_hits_at_k": 0.04409761990771625,
"adjusted_inverse_harmonic_mean_rank": 0.02899078763356493,
"arithmetic_mean_rank": 3508.02587890625,
"count": 40876.0,
"geometric_mean_rank": 981.9100952148438,
"harmonic_mean_rank": 33.68957562702935,
"hits_at_1": 0.00988355024953518,
"hits_at_10": 0.04476954692239945,
"hits_at_3": 0.037723847734611994,
"hits_at_5": 0.03953420099814072,
"inverse_arithmetic_mean_rank": 0.0002850606106221676,
"inverse_geometric_mean_rank": 0.0010184231214225292,
"inverse_harmonic_mean_rank": 0.029682772234082225,
"inverse_median_rank": 0.0003981684276368469,
"median_absolute_deviation": 3584.190863237293,
"median_rank": 2511.5,
"standard_deviation": 3545.349609375,
"variance": 12569504.0,
"z_arithmetic_mean_rank": 177.82301066090278,
"z_geometric_mean_rank": 164.63893534923432,
"z_hits_at_k": 336.15763297564826,
"z_inverse_harmonic_mean_rank": 545.9140736820508
}
},
"head": {
"optimistic": {
"adjusted_arithmetic_mean_rank": 0.6321478767554082,
"adjusted_arithmetic_mean_rank_index": 0.3679044073597997,
"adjusted_geometric_mean_rank_index": 0.5339996208323922,
"adjusted_hits_at_k": 0.0022226369361912695,
"adjusted_inverse_harmonic_mean_rank": 0.002845870988414374,
"arithmetic_mean_rank": 4448.195958508661,
"count": 20438.0,
"geometric_mean_rank": 2407.67342488841,
"harmonic_mean_rank": 280.3423290052399,
"hits_at_1": 0.0,
"hits_at_10": 0.0029357079949114393,
"hits_at_3": 0.002593208728838438,
"hits_at_5": 0.0027889225951658676,
"inverse_arithmetic_mean_rank": 0.00022481023977533317,
"inverse_geometric_mean_rank": 0.000415338720634983,
"inverse_harmonic_mean_rank": 0.0035670674619433193,
"inverse_median_rank": 0.0002520478890989288,
"median_absolute_deviation": 4028.9715287889735,
"median_rank": 3967.5,
"standard_deviation": 3455.2146644943596,
"variance": 11938508.37773687,
"z_arithmetic_mean_rank": 90.88571043736795,
"z_geometric_mean_rank": 76.47646472821779,
"z_hits_at_k": 11.88184816121705,
"z_inverse_harmonic_mean_rank": 37.5819869458943
},
"pessimistic": {
"adjusted_arithmetic_mean_rank": 0.6321479880095942,
"adjusted_arithmetic_mean_rank_index": 0.3679042960898008,
"adjusted_geometric_mean_rank_index": 0.533999549519833,
"adjusted_hits_at_k": 0.0022226369361912695,
"adjusted_inverse_harmonic_mean_rank": 0.0028458709528019705,
"arithmetic_mean_rank": 4448.196741364126,
"count": 20438.0,
"geometric_mean_rank": 2407.673793184336,
"harmonic_mean_rank": 280.3423318020593,
"hits_at_1": 0.0,
"hits_at_10": 0.0029357079949114393,
"hits_at_3": 0.002593208728838438,
"hits_at_5": 0.0027889225951658676,
"inverse_arithmetic_mean_rank": 0.0002248102002101037,
"inverse_geometric_mean_rank": 0.0004153386571016426,
"inverse_harmonic_mean_rank": 0.0035670674263566728,
"inverse_median_rank": 0.0002520478890989288,
"median_absolute_deviation": 4028.9715287889735,
"median_rank": 3967.5,
"standard_deviation": 3455.2150784679957,
"variance": 11938511.238472598,
"z_arithmetic_mean_rank": 90.88568294964912,
"z_geometric_mean_rank": 76.47645451522848,
"z_hits_at_k": 11.88184816121705,
"z_inverse_harmonic_mean_rank": 37.581986475604225
},
"realistic": {
"adjusted_arithmetic_mean_rank": 0.6321479237315103,
"adjusted_arithmetic_mean_rank_index": 0.3679043603770207,
"adjusted_geometric_mean_rank_index": 0.5339999209348403,
"adjusted_hits_at_k": 0.0022226369361912695,
"adjusted_inverse_harmonic_mean_rank": 0.002845870967054809,
"arithmetic_mean_rank": 4448.1962890625,
"count": 20438.0,
"geometric_mean_rank": 2407.671875,
"harmonic_mean_rank": 280.3423306827129,
"hits_at_1": 0.0,
"hits_at_10": 0.0029357079949114393,
"hits_at_3": 0.002593208728838438,
"hits_at_5": 0.0027889225951658676,
"inverse_arithmetic_mean_rank": 0.00022481022460851818,
"inverse_geometric_mean_rank": 0.0004153389891143888,
"inverse_harmonic_mean_rank": 0.0035670674405992027,
"inverse_median_rank": 0.0002520479029044509,
"median_absolute_deviation": 4028.9715287889735,
"median_rank": 3967.5,
"standard_deviation": 3455.21484375,
"variance": 11938509.0,
"z_arithmetic_mean_rank": 90.88569883092029,
"z_geometric_mean_rank": 76.47650770722639,
"z_hits_at_k": 11.88184816121705,
"z_inverse_harmonic_mean_rank": 37.58198666382428
}
},
"tail": {
"optimistic": {
"adjusted_arithmetic_mean_rank": 0.35493601752362436,
"adjusted_arithmetic_mean_rank_index": 0.6451531573132943,
"adjusted_geometric_mean_rank_index": 0.9249687689860338,
"adjusted_hits_at_k": 0.085971619378208,
"adjusted_inverse_harmonic_mean_rank": 0.0551351505596238,
"arithmetic_mean_rank": 2567.855465309717,
"count": 20438.0,
"geometric_mean_rank": 400.4476446015376,
"harmonic_mean_rank": 17.92163563189566,
"hits_at_1": 0.01976710049907036,
"hits_at_10": 0.08660338584988747,
"hits_at_3": 0.07285448674038555,
"hits_at_5": 0.07627947940111557,
"inverse_arithmetic_mean_rank": 0.00038943001797002886,
"inverse_geometric_mean_rank": 0.0024972053487667344,
"inverse_harmonic_mean_rank": 0.055798478472593796,
"inverse_median_rank": 0.0015503875968992248,
"median_absolute_deviation": 951.8306242805965,
"median_rank": 645.0,
"standard_deviation": 3381.221139328889,
"variance": 11432656.39304455,
"z_arithmetic_mean_rank": 159.73652598187994,
"z_geometric_mean_rank": 132.46303022548187,
"z_hits_at_k": 467.33221826444293,
"z_inverse_harmonic_mean_rank": 740.3253305927404
},
"pessimistic": {
"adjusted_arithmetic_mean_rank": 0.3549360445757312,
"adjusted_arithmetic_mean_rank_index": 0.6451531302574478,
"adjusted_geometric_mean_rank_index": 0.9249687626685364,
"adjusted_hits_at_k": 0.085971619378208,
"adjusted_inverse_harmonic_mean_rank": 0.05513515049031892,
"arithmetic_mean_rank": 2567.8556610235837,
"count": 20438.0,
"geometric_mean_rank": 400.4476782343243,
"harmonic_mean_rank": 17.921635654139724,
"hits_at_1": 0.01976710049907036,
"hits_at_10": 0.08660338584988747,
"hits_at_3": 0.07285448674038555,
"hits_at_5": 0.07627947940111557,
"inverse_arithmetic_mean_rank": 0.0003894299882889001,
"inverse_geometric_mean_rank": 0.0024972051390315317,
"inverse_harmonic_mean_rank": 0.05579847840333757,
"inverse_median_rank": 0.0015503875968992248,
"median_absolute_deviation": 951.8306242805965,
"median_rank": 645.0,
"standard_deviation": 3381.221266794729,
"variance": 11432657.255024955,
"z_arithmetic_mean_rank": 159.73651928299472,
"z_geometric_mean_rank": 132.46302932076503,
"z_hits_at_k": 467.33221826444293,
"z_inverse_harmonic_mean_rank": 740.3253296621515
},
"realistic": {
"adjusted_arithmetic_mean_rank": 0.3549360179991497,
"adjusted_arithmetic_mean_rank_index": 0.6451531568377032,
"adjusted_geometric_mean_rank_index": 0.9249687885809247,
"adjusted_hits_at_k": 0.085971619378208,
"adjusted_inverse_harmonic_mean_rank": 0.055135150511545356,
"arithmetic_mean_rank": 2567.85546875,
"count": 20438.0,
"geometric_mean_rank": 400.4475402832031,
"harmonic_mean_rank": 17.921635647326898,
"hits_at_1": 0.01976710049907036,
"hits_at_10": 0.08660338584988747,
"hits_at_3": 0.07285448674038555,
"hits_at_5": 0.07627947940111557,
"inverse_arithmetic_mean_rank": 0.00038943003164604306,
"inverse_geometric_mean_rank": 0.002497205976396799,
"inverse_harmonic_mean_rank": 0.0557984784245491,
"inverse_median_rank": 0.001550387591123581,
"median_absolute_deviation": 951.8306242805965,
"median_rank": 645.0,
"standard_deviation": 3381.22119140625,
"variance": 11432656.0,
"z_arithmetic_mean_rank": 159.73652586412607,
"z_geometric_mean_rank": 132.46303303162918,
"z_hits_at_k": 467.33221826444293,
"z_inverse_harmonic_mean_rank": 740.3253299471687
}
}
},
"times": {
"evaluation": 1023.292813539505,
"training": 566.7723512649536
}
}

@g-yit
Copy link
Author

g-yit commented Nov 9, 2024

image
When I trained using conve and set create_inverse_triples to False, I can understand that the evaluation would be slow, but it is only 2.83 triples/s.

@mberr
Copy link
Member

mberr commented Nov 10, 2024

The training results differ significantly from those reported in the paper.

Did you first train a TransE model and use its weights to initialize the ConvKB ones? That one is easy to miss, and unfortunately not easy to set as default initialization for ConvKB (since it requires training another model first). You can find a config for this training on FB15k237 here. Note that there are some best guesses in there because the paper did not report all hyperparameters for this first run. Once you have the weights, you can use pykeen.nn.init.PretrainedInitializer as initializer.

Even with those, we were not able to reproduce the paper's results, cf. https://arxiv.org/pdf/2006.13365, Table 9 in the appendix.

@mberr
Copy link
Member

mberr commented Nov 10, 2024

When I trained using conve and set create_inverse_triples to False, I can understand that the evaluation would be slow, but it is only 2.83 triples/s.

For each triple it needs to score ~15k entities, i.e., it is running 3.31*15k = 49.65k score evaluations per second.

@mberr
Copy link
Member

mberr commented Nov 10, 2024

btw, from you method name replicate_conve it looks like you are trying to reproduce ConvE? If so, you may want to take a look at https://github.com/pykeen/pykeen/blob/master/src/pykeen/experiments/conve/dettmers2018_conve_fb15k237.json. In particular, reproducing the ConvE paper requires the use of inverse triples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants