Tripod config is typically defined as a JSON file or stream which is added to the Config class somewhere early in your application, typically in your includes file or front controller:
$conf = json_decode(file_get_contents('tripod_config.json');
\Tripod\Config::setConfig($conf); // set the config, usually read in as JSON from a file
RDF namespaces are defined by a top level property namespaces
which defines a simple object, the keys of which are the prefix, the value of which are the namespace.
Example:
{
"namespaces" : {
"rdf":"http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"dct":"http://purl.org/dc/terms/"
}
}
Tripod relies on namespaces for subjects so any subject URIs must have a pre-declared namespace in the config.
TODO: Future versions will detect non-namespaced subjects and assign a namespace to it in config. For this to happen we must start storing the config in the database rather than as an external file.
Tripod supports named graphs. The default context property defines the default named graph to use if one is not specified.
Example:
{
"defaultContext" : "http://example.com"
}
Defines and names the data connections for Tripod.
Example:
{
"data_sources" : {
"rs1" : {
"type" : "mongo",
"connection": "mongodb:\/\/localhost",
"replicaSet": ""
},
"rs2" : {
"type" : "mongo",
"connection": "mongodb:\/\/example.com:27017",
"replicaSet": "repset1"
}
}
}
Defines the Tripod stores (for Mongo Tripod, these would be databases) and names the pods (e.g. MongoDB collections) Tripod can work with. Also includes the ability to define indexes and OWL-like cardinality rules on predicates within each collection. Each store must declare a data_source.
Example:
This example defines one store with two CBD_
pods along with associated indexes and cardinality rules.
{
"stores" : {
"my_app_db" : {
"pods" : {
"CBD_orders" : {
"cardinality" : {
"dct:created" : 1
},
"indexes" : {
"index1": {
"dct:subject.u":1
},
"index2" : {
"rdf:type.u":1
}
}
},
"CBD_users" : {
"cardinality" : {
"foaf:name.l" : 1
},
"indexes" : {
"index1": {
"rdf:type.u":1
}
}
}
},
"data_source" : "mongoCluster1"
}
}
}
View specifications define the shape of the materialised views that Mongo manages. For a full explanation of views, read the primer. In short, they mimic the functionality of DESCRIBE
or CONSTRUCT
-style SPARQL queries.
The convention for view spec identifiers is to prefix them with v_
.
Specs are defined as an array in the store level of the config document:
{
"stores": {
"some_store": {
"pods" : {},
"view_specifications": [
{
"_id": "v_spec_1"
},
{
"_id": "v_spec_2"
}
]
}
}
}
TODO:
- Implement versioned view specifications to allow automatic migration of data that meets an earlier specification
Table specifications define the shape of the tabular data that Mongo manages. For a full explanation of tables, read the primer. In short, they mimic the functionality of SELECT
-style SPARQL queries.
The convention for table spec identifiers is to prefix them with t_
.
Specs are defined as an array in the store level of the config document:
{
"stores": {
"some_store": {
"pods" : {},
"view_specifications" : [],
"table_specifications": [
{
"_id": "t_spec_1"
},
{
"_id": "t_spec_2"
}
]
}
}
}
TODO:
- Implement versioned table specifications to allow automatic migration of data that meets an earlier specification
Previous versions of Tripod integrated with ElasticSearch to provide indexing and full-text search. This was removed early on whilst Tripod was still closed source within Talis, as the complexity was not required. However some primitive regex-style searching is still provided. For a full explanation of views, read the primer.
The search config is defined in the store level and consists of two parts - the search_provider
and search_specifications
.
The provider was intended to allow pluggable implementations of search services (ElasticSearch, straight Lucene, Solr perhaps) but today the only option is MongoSearchProvider
.
The search specifications define the shape of the the data that underpins searches.
The convention for search spec identifiers is to prefix them with i_
.
{
{
"stores": {
"some_store": {
"pods" : {},
"view_specifications" : [],
"table_specifications": [],
"search_config": {
"search_provider" : "MongoSearchProvider",
"search_specifications" : [
{
"_id": "i_spec_1"
},
{
"_id": "i_spec_2"
}
]
}
}
}
}
TODO:
- Clean up the search specifications as they are not quite in line with tables and views (notably filter/condition)
- At some point re-instate full-text capability via ElasticSearch or similar.
Each of the specifications above are built from a specification language defined by the keywords below
The unique identifier of the spec
Specifies the version of the spec. Unused until we implement version specification.
For the current operation specifies the collection the operation should be performed on. Within joins
this allows you to join data across collections. It is mandatory at the top level of a specification and gives the starting collection from where the specification should operate. For example, to join from one collection to another:
{
"_id": "v_someview",
"from": "CBD_mydata
"joins" : {
"foaf:knows": {
"from":"CBD_myotherdata"
}
}
}
If type is defined, will limit the resources to those that have the specified rdf:type. The value can be a curie string or array of curie strings. For example:
{
"_id" : "v_people",
"type" : ["foaf:Agent", "foaf:Person"],
"from" : "CBD_people"
}
{
"_id" : "t_books",
"type" : "bibo:Books",
"from" : "CBD_resources"
}
etc.
A property of the joins
predicate object, is an array of predicate values to pluck from the joined CBD and include in the result. If ommitted, all values from the CBD will be included. Allows you to mimic the behaviour of a CONSTRUCT
style SPARQL query by slimming down the resultant graph only to predicates you specify.
joins the current resource to another. The keys of the "joins" object correspond with the predicate whose object URI you wish to join on. The "right join" will be on the _id property in the joined resource. You can specify the collection to join on with the "from" property (defaults to the current collection).
Note: you can only join a URI object (or _id) to an _id.
Example:
{
"_id" : "t_people",
"type" : "foaf:Person",
"from" : "CBD_people",
"fields" : [
{
"fieldName" : "name",
"predicates" : ["foaf:name"]
}
],
"joins" : {
"foaf:knows" : [
{
"fieldName" : "knows",
"predicates" : ["foaf:name"]
}
]
}
}
A property of the joins
predicate object, determines the maximum amount of times a join will be performed. Where the amount of available matches exceeds maxJoins
there are no guarantees on which of the available matches will be included. The exception to this is when used with a sequence via followSequence
, here sequences rdf:_1..rdf:_{maxJoin}
will be included.
RDF sequences are triple structures with a type of rdf:Seq
and enumerate entities with the predicates rdf:_x
, e.g. rdf:_1, rdf:_2
etc. These have always been tricky to work with in SPARQL queries.
In tripod, when joining to a node which is actually a sequence, you would have to manually join again from the sequence to each element rdf:_1
etc. which is hard because you'll need to know the length of the sequence up-front, and view specs are not dynamic (they are specified in config, not at runtime).
followSequence
simplifies this by providing a shortcut for following sequences and joins automatically until either the last sequence element is reached, or maxJoins
is exceeded. For example:
"bibo:authorList":{
"joins" : {
"followSequence":{
"maxJoins":50
}
}
}
The properties of the followSequence
object are identical in behaviour to those that can be specified in the joins
predicate object.
An array of predicates to use in the current action. There are a few functions that work with "predicates", as well, that do post processing on the results, such as "lowercase" and "join".
A property of the join
object, specifies a query condition which must be matched for the join to be performed.
ttl
can be used in view specifications at the top level to indicate the time of expiry of the data. Views generated with a ttl
will not have an impact index, that is, write operations will not automatically expire them. Instead, when they are read, tripod will look at the timestamp of the view's creation and if it exceeds the ttl it will discard the view and regenerate (and store) a new one and return that instead.
This is very useful if you have specific volatile views and the freshest data is not always cruicial - you can avoid excessive view re-generation by specifying a ttl
value which exceeds the mean time between writes to your data.
ttl
cannot be used within table specifications, because tablerows are often operated on in paged sets. It would be impossible to tell if table rows on further counts should still exist without paging through the whole set first.
"value" defines a function to run on the property data.
Creates a fully qualified URI from the alias value of the current resource. E.g.:
{
"id" : "t_foo",
"from" : "fooCollection",
"fields" : [
{
"fieldName" : "fooLink",
"predicates" : [""],
"value" : "link"
}
]
}
would give the fully qualified URI of the base resource in field fooLink
. In a join:
{
"id" : "t_foo",
"from" : "fooCollection",
"joins" : {
"foo:bar" : {
"from" : "barCollection",
"fields" : [
{
"fieldName" : "barLink",
"predicates" : [""],
"value" : "link"
}]
}
}
}
_link_ would provide the fully qualified URI of the resource joined at foo:bar
in the field barLink
the "predicates" property is required, but ignored, so use an array with a single empty string: [""]