You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When serializing a DatasetGraph into NQ format, I find that all blank nodes with specified labels get a "B" prepended to the label, e.g. a blank node with a label "students" would be serialized as "_:Bstudents".
This is somewhat annoying for my use case: an RML engine needs to follow a particular spec, including filling in blank node patterns.
My workaround currently consists of Regex replacing, but this is far from ideal.
I'd like to suggest a more granular control of how the NQ writer (and all writers in general) handle Blank nodes: give the user an option to preserve the original blank node without prepending a "B" in front of the label.
Blank nodes from data from a parser will be large random numbers. So I'm assuming you are controlling the RDF production and setting the blank node label yourself.
The RDFWriter builder doesn't currently provide a way to set the NodeFormatter. It would be good to add this.
If you want to read such data in, and preserve the label (with care!), then use RDFParser.create().labelToNode(labelToNode) with LabelToNode.createUseLabelAsGiven(). Your code is responsible for blank node label uniqueness and the rules about what happens on graph merge and reading files multiple times.
For writing: NodeFormatter is the interface for controlling the RDF term output.
In extending RDFWriterBuilder, interfaces WriterGraphRIOT and WriterDatasetGraphRIOT, the low level per-format interfaces, will need changing.
There several kinds of writer for the N-Triples/Turtle family of syntax - streamed, flat, batching and collecting - all use a NodeFormatter.
At the RDFWriter level, there isn't the "writer profile" abstraction like there is when reading (where there is a node maker FactoryRDF carried by ParserProfile).
N-Quads is the simplest output form. It is streamed and uses WriterStreamRDFPlain.
Below is the code that is used for N-Quads. You could use that, modified at NodeFmtLib.encodeBNodeLabel to just use the label. Be careful - some characters aren't legal in a blank node label string.
Version
4.10.0
Feature
When serializing a
DatasetGraph
into NQ format, I find that all blank nodes with specified labels get a "B" prepended to the label, e.g. a blank node with a label "students" would be serialized as "_:Bstudents".This is somewhat annoying for my use case: an RML engine needs to follow a particular spec, including filling in blank node patterns.
My workaround currently consists of Regex replacing, but this is far from ideal.
I'd like to suggest a more granular control of how the NQ writer (and all writers in general) handle Blank nodes: give the user an option to preserve the original blank node without prepending a "B" in front of the label.
Code example that performs the serialization:
Are you interested in contributing a solution yourself?
Perhaps?
The text was updated successfully, but these errors were encountered: