The LDBC-SNB Data Generator (DATAGEN) is the responsible of providing the data sets used by all the LDBC benchmarks. This data generator is designed to produce directed labeled graphs that mimic the characteristics of those graphs of real data. A detailed description of the schema produced by datagen, as well as the format of the output files, can be found in the latest version of official LDBC SNB specification document
ldbc_snb_datagen is part of the LDBC project (http://www.ldbc.eu/). ldbc_snb_datagen is GPLv3 licensed, to see detailed information about this license read the LICENSE.txt.
Datasets
Publicly available datasets can be found at the LDBC-SNB Amazon Bucket. These datasets are the official SNB datasets and were generated using version 0.2.6. They are available in the three official supported serializers: CSV, CSVMergeForeign and TTL. The bucket is configured in "Requester Pays" mode, thus in order to access them you need a properly set up AWS client.
Community provided tools
- [Apache Flink Loader:] (https://github.com/s1ck/ldbc-flink-import) A loader of LDBC datasets for Apache Flink