-
Notifications
You must be signed in to change notification settings - Fork 14
/
defs.tex
51 lines (33 loc) · 7.62 KB
/
defs.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
\chapter*{Definitions}
This section defines fundamental concepts used in the LDBC benchmark terminology. Part of the definitions below are repeated from the LDBC benchmark specification document.
\begin{description}
\item[\ldbcsnb] The Linked Data Benchmark Council's Social Network Benchmark suite which currently consists of the Interactive workload and a preliminary version of the Business Intelligence workload.
\item[System Under Test (SUT)] This is the totality of the hardware and software that participates in a benchmark run, excluding parts that are exclusively used for driving the workload. If the parts driving the workload are collocated on the same operating system instance as the SUT, then this is also considered a part of the SUT. In client-server configurations where the test driver is not on a machine hosting any DBMS function the SUT is not considered to encompass the hardware or software which exclusively serves to drive the test workload.
\item[\datagen] This module is provided by LDBC SNB and produces the standard benchmark datasets to be loaded into the SUT for the benchmark. The data generation phase is not part of running the benchmark.
\item[Test Driver (Benchmark Driver, Driver)] The test driver refers to the parts of the benchmark run that coordinate query execution and, if prescribed by a given benchmark, data loading.
\item[Workload (Benchmark)] This is the totality of the tasks a particular benchmark performs against an SUT. This includes data loading as well as the query/update workload. This does not include preparatory stages such as generating benchmark data with a data generator or transferring the data to the platform constituting the SUT.
The terms workload and benchmark are synonyms in this context.
\item[Time Compression Ratio (TCR)]
This parameter of the Interactive workload compresses (or stretches) durations between operation start times to increase (or decrease) operation rate, thereby allowing systems to reach their maximum throughput for a given workload. The smaller this number is, the higher compression ratio it represents (\eg 2.0 means run benchmark $2\times$ slower, while 0.1 = run benchmark $10\times$ faster). Systems are expected to compete on achieving the \emph{lowest possible} TCR (\ie the highest $\text{TCR}^{-1}=\frac{1}{\text{TCR}}$).
\item[Query mix] The ratio of read and update queries of a workload, and the frequency at which they are issued.
\item[Scale Factor (SF)] The \ldbcsnb is designed to target systems of different size and scale. The scale factor determines the size of the data used to run the benchmark. The scale factor refers to the measured size of the data in Gigabytes when serialised in CsvSingularProjectedFK.
\item[Validation Step] The benchmark specifies a scale factor for which ACID test cases are executed and the query results are compared to a reference result set (\ie expected output). This step is required to use the very same set of queries and data structures (this includes both PDS, IADS and EADS -- defined below) that are used in the actual benchmark runs.
\item[Schema (Database Schema)] A schema is the totality of the non-built-in declarations which are fed into the SUT prior to running a workload. For a relational system, the schema consists of tables, indices, views, materialised views and declarative constraints (\eg foreign key and not null constraints). An ontology for an RDF system counts as a schema if it is loaded on the SUT. An RDF SUT may have no schema at all and still run the workload. However, any declaration or setting (\eg indices) that is not on by default in the SUT, but is used in at least one case of the benchmark run counts as part of the schema.
The schema does not include stored procedures, triggers, or other imperative (procedural) application specific code that may reside on the SUT and could impact the benchmark results. The schema is required to be the same across all benchmark runs using the same scale factor for a given workload.
\item[Primary Data Structure (PDS)]
This is anything that may influence the result of a database query or may be changed by an update of the database. These may be resident in RAM or durable media or both. Examples of data structures are database base tables and adjacency lists.
\item[Implicit Auxiliary Data Structure (IADS)] This is a data structure for providing more efficient access to all or parts of the primary data structure. IADS are created by the DBMS automatically and the system may allow them to be turned off.
\begin{quote}
Some systems, such as many RDF stores have multiple covering indices on the primary data structure. The definition in this case is that the primary data structure consists of all the differently ordered full copies of the base table; a table of subject predicate object graph (SPOG) in the RDF case. In this same instance, Auxiliary data structures comprise any data structure which materialise a subset of the SPOG.
\end{quote}
\item[Explicit Auxiliary Data Structure (EADS)] These are any application or workload profile specific structures that are declared in addition to the PDSs and IADSs managed by the SUT. These duplicate the data and are created with explicit statements. Secondary indices, materialised views, with or without aggregates, are all examples of this in a relational context.
%In an RDF context an EADS is a data structure which holds some of the triples/quads in whole or in part but does not hold some others.
The decision about the used EADS is always part of the schema declaration.
\begin{quote}
In the case of relational systems, an ADS may be an index from primary key values to a heap table, if the system in question has such concepts.
A secondary index of a relational table, in its memory based and durable media based manifestations is an example for EADS. Such a secondary index is not considered an ADS since it must be declared, which makes its creation explicit. An ADS must be implicit and not created by any specific DDL statement or directive. In the case of RDF systems, if the implementation supports user definable index schemes, as long as these are defined once and apply to all triples/quads, such structures are designated as ADS. If an RDF system selectively makes data structures which apply to some quads but not to others, then such structures are designated as EADS.
\end{quote}
\item[SUT-Resident Logic] This is any application specific code that is resident on the SUT, whether by static linking, dynamic loading, JIT, interpretation or any other means of embedding application specific logic into a generic DBMS. Examples of this are stored procedures, hosting Java, CLR or other run times in the SUT process (or processes), loading application specific libraries to extend native functions or data structures etc. A special case is that of a database exclusively accessed via an in-process API. In these cases, any code that is not the test driver or a workload implementation expressed against a generally supported API of the DBMS is deemed SUT resident logic in addition to any other code which may fit the above definitions.
\item[Test Sponsor] The party which initiates an audit of a benchmark implementation over an SUT. This is typically the vendor of a key component of the SUT, \eg DBMS or hardware.
\item[Full Disclosure Report (FDR)] This is a document which allows reproduction of any audited benchmark result by a third party. It contains complete description of the circumstances of the benchmark run, including version and configuration of SUT, dataset and test driver.
\end{description}