This repository corresponds to CompanyKG Version 1.x. We have extended this work to Version 2.x, hosted in a new repository. Since 2.x is backward compatible, we recommend submitting issues and pull requests for both Version 1.x and 2.x to the CompanyKG2 repository.
This repository contains all code released to accompany the release of the CompanyKG knowledge graph illustrated in Figure 1 below. For details of the dataset and benchmark experiments, see the official release of the paper and dataset.
There are two main parts to the code release:
- CompanyKG dataset access and task evaluations (see below)
- Benchmark model training and experiments
- Python 3.8
There are also optional dependencies, if you want to be able to convert the KG to one of the data structures used by these packages:
- DGL:
dgl
- iGraph:
python-igraph
- PyTorch Geometric (PyG):
torch-geometric
The companykg
Python package provides a data structure to load CompanyKG into memory,
convert between different graph representations and run evaluation of trained embeddings
or other company-ranking predictors on three evaluation tasks.
To install the comapnykg
package and its Python dependencies, activate a virtual
environment (such as Virtualenv or Conda) and run:
pip install -e .
The first time you instantiate the CompanyKG class, if the dataset is not already available (in the default subdirectory or another location you specify), the latest version will be automatically downloaded from Zenodo.
By default, the CompanyKG dataset will be loaded from (and, if necessary, downloaded to)
a data
subdirectory of the working directory. To load the dataset from this default location,
simply instantiate the CompanyKG
class:
from companykg import CompanyKG
ckg = CompanyKG()
If you have already downloaded the dataset and want to load it from its current location, specify the path:
ckg = CompanyKG(data_root_folder="/path/to/stored/companykg/directory")
The graph can be loaded with different vector representations (embeddings) of
company description data associated with the nodes: msbert
(mSBERT), simcse
(SimCSE),
ada2
(ADA2) or pause
(PAUSE).
ckg = CompanyKG(nodes_feature_type="pause")
If you want to experiment with different embedding types, you can also load embeddings of a different type into an already-loaded graph:
ckg.change_feature_type("simcse")
By default, edge weights are not loaded into the graph. To change this use:
ckg = CompanyKG(load_edges_weights=True)
A tutorial showing further ways to use CompanyKG is here.
Implementations of various benchmark graph-based learning models are provided in this repository.
To use them, install the ckg_benchmarks
Python package, along with its dependencies, from the
benchmarks
subdirectory. First install companykg
as above and then:
cd benchmarks
pip install -e .
Further instructions for using the benchmarks package for model training and provided in the benchmarks README file.
We collect all benchmarking results on this dataset here. Welcome to reach out to us (via github issue or email shown in our paper) if you wish to include your experimental results.
- Knorreman reported results using fastRP algorithm achieving competitive results (i.e.,
sp_auc=85.7%
,sr_test_acc=69.2%
,R@50=0.353
, andR@100=0.430
obtained on different hyper-parameters and initial node embeddings).
Cite the paper:
@article{cao2023companykg,
author = {Lele Cao and
Vilhelm von Ehrenheim and
Mark Granroth-Wilding and
Richard Anselmo Stahl and
Drew McCornack and
Armin Catovic and
Dhiana Deva Cavacanti Rocha},
title = {{CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification}},
journal = {IEEE Transactions on Big Data},
year = {2024},
doi = {10.1109/TBDATA.2024.3407573}
}
Cite the official release of the CompanyKG dataset on Zenodo:
@article{companykg_2023_8010239,
author = {Lele Cao and
Vilhelm von Ehrenheim and
Mark Granroth-Wilding and
Richard Anselmo Stahl and
Drew McCornack and
Armin Catovic and
Dhiana Deva Cavacanti Rocha},
title = {{CompanyKG Dataset: A Large-Scale Heterogeneous Graph for Company Similarity Quantification}},
month = June,
year = 2023,
publisher = {Zenodo},
version = {1.1},
doi = {10.5281/zenodo.8010239},
url = {https://doi.org/10.5281/zenodo.8010239}
}