This repository contains code to define new RDF Writers for Jena which is turtle always sorted in the same way. It has been developed to reduce the diff noise when the data is stored on a git repository, we are confident there are plenty of other use cases where it will be useful.
The repository contains two writers, for the Turtle and TriG formats.
There is always some arbitrary decisions to be taken for some cases. We took the following when sorting objects:
- first URIs (sorted) then literals (sorted) then blank nodes
- first
rdf:langString
s thenxsd:string
s then numbers then everything else, sorted by type uri then value rdf:langString
s are sorted by lang then value, in the root unicode collator (not in the locale corresponding to the language)- numbers are sorted first by value then by type uri (
"+1"^^xsd:integer
<"1"^^xsd:integer
<"+1"^^xsd:nonNegativeInteger
<"1.2"^^xsd:float
<"2"^^xsd:integer
)
Using maven:
<dependency>
<groupId>io.bdrc</groupId>
<artifactId>jena-stable-turtle</artifactId>
<version>0.7.2</version>
</dependency>
build and deploy:
mvn clean package
mvn deploy -DperformRelease=true
Then go to https://oss.sonatype.org/ and do the close and release
// register the STTL writer
Lang sttl = STTLWriter.registerWriter();
// build a map of namespace priorities
SortedMap<String, Integer> nsPrio = ComparePredicates.getDefaultNSPriorities();
nsPrio.put(SKOS.getURI(), 1);
nsPrio.put("http://purl.bdrc.io/ontology/admin/", 5);
nsPrio.put("http://purl.bdrc.io/ontology/toberemoved/", 6);
// build a list of predicates URIs to be used (in order) for blank node comparison
List<String> predicatesPrio = CompareComplex.getDefaultPropUris();
predicatesPrio.add("http://purl.bdrc.io/ontology/admin/logWhen");
predicatesPrio.add("http://purl.bdrc.io/ontology/onOrAbout");
predicatesPrio.add("http://purl.bdrc.io/ontology/noteText");
// pass the values through a Context object
Context ctx = new Context();
ctx.set(Symbol.create(STTLWriter.SYMBOLS_NS + "nsPriorities"), nsPrio);
ctx.set(Symbol.create(STTLWriter.SYMBOLS_NS + "nsDefaultPriority"), 2);
ctx.set(Symbol.create(STTLWriter.SYMBOLS_NS + "complexPredicatesPriorities"), predicatesPrio);
// the base indentation, defaults to 4
ctx.set(Symbol.create(STTLWriter.SYMBOLS_NS + "nsBaseIndent"), 4);
// the minimal predicate width, defaults to 14
ctx.set(Symbol.create(STTLWriter.SYMBOLS_NS + "predicateBaseWidth"), 14);
// longest length for subject to be on the same line with the predicate, defaults to 20
ctx.set(Symbol.create(STTLWriter.SYMBOLS_NS + "longSubject"), 20);
// put multiple objects on separate lines each, defaults to false
ctx.set(Symbol.create(STTLWriter.SYMBOLS_NS + "objectsMultiLine"), false);
// put final dot on new line for named subjects, defaults to false
ctx.set(Symbol.create(STTLWriter.SYMBOLS_NS + "namedDotNewLine"), false);
Graph g = ... ; // fetch the graph you want to write
RDFWriter w = RDFWriter.create().source().context(ctx).lang(sttl).build();
w.output( ... ); // write somewhere
Note that for TriG order, you must use the same context namespace as for turtle: STTLWriter.SYMBOLS_NS
.
Set the symbol STTLWriter.SYMBOLS_NS + "onlyWriteUsedPrefixes"
to true
to only write prefixes that are actually used.
Put the compiled .jar
file into the jena class path and then call
riot --pretty sttl yourfile.ttl
All the code on this repository is under the Apache 2.0 License.
The original parts are Copyright © 2017-2019 Buddhist Digital Resource Center
, and the files TurtleShell.java
(coming from the Jena repository) and TriGShell.java
(extracted from this file) are Copyright © 2011-2017 Apache Software Foundation (ASF)
, see NOTICE for more information.