- Add the
skip_verify
attribute to theKG
class to skip or not the verification of the entity existence with remote Knowledge Graphs (default toskip_verify=False
). - Add
WideSampler
as a new sampling strategy. - Add
SplitWalker
as a new walking strategy.
- Fix the installation dependencies with
poetry
. - Fix the cache memory for local Knowledge Graphs.
- Fix validation URL for remote Knowledge Graphs.
- Fix the
HALKWalker
walking strategy. - Fix the DFS algorithm of
RandomWalker
andCommunityWalker
to return duplicate walks and prevent a different number of walks for the entities. - Fix the walk extraction with the
with_reverse
parameter for the different walking strategies.
- Add the
_post_extract
private method in theWalker
class for a post processing of walks by a walking strategy.
- Replace the default minimum frequency thresholds of a hop to keep with
HALKWalker
(0.001 -> 0.01). - Drop support for Python 3.7.0
- Remove
negative=20
andvector_size=500
for Word2Vec.
- Add a first support of FastText as embedding technique.
- Fix the
size
hyperparameter byvector_size
of the default dictionary in theWord2Vec
class. - Fix random determinism with walking strategies.
- Fix the calculation of walks for duplicate entities in a file.
- Fix the total recovery of entities, walks, literals and embeddings of a model after multiple online learning.
- Add the
_update
private method in theRDF2VecTransformer
class. - Add the
md5_bytes
attribute in theCommuniWalker
,HALKWalker
,RandomWalker
, andWLWalker
classes to hash or not an object in MD5 and with how many bytes to keep.
- Replace the
extract
method in theWalker
to returns a list of entities with their walks instead of a list of walks.
- Fix the issue with
nest-asyncio
as dependency.
- Add support for Python 3.9
- Add the
cache
(default tocachetools.TTLCache(maxsize=1024, ttl=1200)
) attribute to theKG
class to significantly speed up the walks extraction through caching. - Add the
is_update
(default toFalse
) hyper-parameter in thefit
method of theEmbedder
andWord2Vec
classes to update an existing vocabulary. - Add the
literals
(default to[]
) attribute in theKG
class to support a basic literal extraction. - Add the
mul_req
(default toFalse
) attribute to theKG
class to speed up the extraction of walks and literals for remote Knowledge Graph by sending asynchronous requests. - Add the
n_jobs
(default toNone
) attribute to theWalker
class to speed up the extraction of walks with multiprocessing. - Add the
random_state
(default toNone
) parameter for theWalker
class to handle better random determinism with walking and sampling strategies. - Add the
verbose
(default to0
) attribute to theRDF2VecTransformer
class to display useful debugging information and to measure the time of extraction, fit and generation of embeddings and literals. - Add the
with_reverse
(default toFalse
) parameter for theWalker
class to generate more walks and improve the accuracy withWord2Vec
, by including the parents of the entities in the walks. - Add the possibility to do online learning of a model with the
load
and thesave
methods in theRDF2VecTransformer
class. - Add the validators for class parameter attributes.
- Add the
Connector
generic class to simplify the implementation of new connectors. - Add the
SPARQLConnector
class to delegate the connection part to the SPARQL endpoint server. - Add the
Vertex
class in a slot to reduce RAM usage. - Add the
WalkerNotSupported
andSamplerNotSupported
exceptions in theWalker
andSampler
classes when a walking strategy and a sampling strategy is not supported. - Add the
_cast_literals
private method to theKG
class to convert the raw literals of an entity according to their real types. - Add the
_embeddings
,_entities
,_literals
, and_walks
, attributes in theRDF2VecTransformer
class to be able to get all the embeddings, entities, literals, and walks after the online training of a model. - Add the
_fill_hops
private method in theKG
class to fill the entity hops in cache whenmul_req=True
is provided for a remote Knowledge Graph. - Add the
_get_hops
private method in theKG
class to get the hops of a vertex for a local Knowledge Graph. - Add the
_is_support_remote
(default toFalse
) private attribute in theWalker
andSampler
classes to restrict the use of walking and sampling strategies for some remote/local Knowledge Graph. - Add the
_res2hops
private method in theKG
class to convert a JSON response from a SPARQL endpoint server to hops. - Add the
add_walk
method to theKG
class to simplify the addition of walk in a Knowledge Graph. - Add the attr decorator for all classes.
- Add the
examples/online-training
andexamples/literals
files to illustrate the use of online training and literals withpyRDF2Vec
. - Add the
fetch_hops
method to theKG
class to fetch to get the hops of a vertex on a remote Knowledge Graph. - Add the
get_pliterals
method to theKG
class to gets the literals for an entity and a local KG based on a chain of predicates. - Add the
get_walks
method in theRDF2VecTransformer
class to get the walks of a given entities in a Knowledge Graph. - Add the
get_weights
method in theSampler
class to get the hops weights. - Add the
pyrdf2vec.typings
file to contains the aliases of the most commonly used typing with mypy.
- Fix the
get_weight
method in thePageRankSampler
to raise an error if the method is called before thefit
method. - Fix the
remove_edge
method of theKG
class to also remove the edge of a children for a parent node. - Fix the addition of predicate in memory for remote Knowledge Graphs.
- Fix the initialization of the
_counts
dictionary with thePredFreqSampler
andObjPredFreqSampler
classes.
- Remove support for Python 3.6
- Remove the
_get_shops
and_get_rhops
functions in theKG
class. - Remove the
id
attribute of theVertex
class. - Remove the
print_walks
method of theWalker
class. - Remove the
read_file
method in theKG
class. - Remove the
visualise
method in theKG
class. - Replace the
HalkWalker
class byHALKWalker
. - Replace the
SPARQLWrapper
library in favor of usingrequests
for synchronous requests andaiohttp
for asynchronous requests. - Replace the
WeisfeilerLehmanWalker
class byWLWalker
. - Replaces the
add_edge
,add_vertex
, andremove_edge
methods in theKG
class to return a boolean value indicating that the addition/removal of an edge/vertex has been performed. - Replace the
depth
parameter withmax_depth
for theWalker
class. - Replace the
extract_random_community_walks
,extract_random_community_walks_bfs
, andextract_random_community_walks_dfs
methods in theCommunityWalker
class byextract_walks
,_bfs
, and_dfs
methods. - Replace the
extract_random_walks
,extract_random_walks_bfs
, andextract_random_walks_dfs
methods in theRandomWalker
class byextract_walks
,_bfs
, and_dfs
methods. - Replace the
file_type
attribute in theKG
class byfmt
. - Replace the
get_inv_neighbors
method in theKG
class by ais_reverse
(default toFalse
) parameter in theget_neighbors
method. - Replace the
initialize
method in theSampler
class by the use of@property
. - Replace the
is_remote
parameter in theKG
class for automatic link detection based on the http and https prefix. - Replace the
last
parameter withis_last_depth
in thesample_neighbor
method of theSampler
class. - Replace the
label_predicates
attribute in theKG
class byskip_predicates
and now use a set instead of a list. - Replace the
pyrdf2vec.graphs.kg.Vertex
class withpyrdf2vec.graphs.Vertex
. - Replace the
fit_transform
andtransform
functions in theRDF2VecTransformer
class to return a tuple containing the list of embeddings and literals. - Replace the default embedding technique in the
RDF2VecTransformer
class forWord2Vec
. - Replace the default hyper-parameters of the
Word2Vec
class tosize=500
,min_count=0
, andnegative=20
. - Replace the default list of walkers in the
RDF2VecTransformer
class to[RandomWalker(2)]
.
- Add a
verbose
(default toFalse
) hyper-parameter for thefit
method. - Add basic support for remote Knowledge Graphs through SPARQL endpoint.
- Add configuration for Embedding Techniques through the
Embedder
abstract class (currently only Word2Vec is included). - Add online documentation.
- Add sampling strategies (default to
UniformSampler
) from Cochez et al. to better deal with larger Knowledge Graphs. - Add static typing for methods.
- Add support for Python 3.6 and 3.7.
- Add the Google Style Python Docstrings.
- Add the
extract_random_walks_dfs
andextract_random_walks_bfs
methods for theRamdomWalker
class. - Add the
get_hops
method along with the private_get_rhops
and_get_shops
methods in theKG
class. - Add three examples (
examples/countries.py
,examples/mutag.py
andexamples/samplers.py
) forpyRDF2vec
.
- Replace
graph
forkg
in thefit
andfit_transform
methods of theRDF2VecTransformer
class. - Replace
instance
forentities
in thetransform
andfit_transform
methods of theRDF2VecTransformer
class. - Replace default values of hyper-parameters of Word2Vec to match with the
default ones
of the
gensim
implementation. - Replace the
KnowledgeGraph
class forKG
. - Replace the
Walker
class to be abstract. - Replace the
_rdf2vec.py
file forrdf2vec.py
. - Replace the
extract_random_community_walks
method in theCommunityWalker
to be private. - Replace the
extract
methods inwalkers
to be private. - Replace the
graph.py
file forgraphs/kg.py
. - Replace the
rdf2vec
module forpyrdf2vec
. - Replace the
sample_neighbor
method of thesampler
class bysample_hop
. - Replace the imec licence for an MIT licence.
- Remove
graph
hyper-parameter in thetransform
method of theRDF2VecTransformer
class. - Remove hyper-parameters of
RDF2VecTransformer
forembedder
andwalkers
ones. - Remove the
WildcardWalker
walking strategy. - Remove the
converter.py
file. - Remove the
create_kg
,endpoint_to_kg
,rdflib_to_kg
functions for thelocation
,file_type
,is_remote
hyper-parameters inKG
with theread_file
private method. - Replace
Vertex.vertex_count
foritertools.count
in theVertex
class.