Wednesday, October 3, 2018

Improving consistency in TTL and JSON-LD output from Nomisma

In preparation of a new and improved data model for capturing the provenance of data within the Linked Open Data ecosystem (more on that later as I move these updates into production), I have revisited the RDF Turtle and JSON-LD exports from Nomisma.

The new model for provenance, as well as a new method of linking URIs that reflect uncertain mints that may be attributed to known places for Hellenistic Royal Coinages (see will include some blank nodes. It isn't necessary to assign a permanent, addressable URI to every possible modification event for a SKOS Concept. The Provenance Statement (dcterms) has a fragment identifier, but individual activities do not.

Up to this point, the Turtle and JSON-LD serializations from Nomisma were executed via XSLT transformation from the canonical RDF/XML data (which does follow a standard model, as these are generated via controlled processes in the XForms engine). However, the complexity of dealing with blank nodes was not handled in the XSLT stylesheets for these alternative serializations, and so I sought to outsource this transformation process to the Python RDFLib library and its JSON-LD plugin.

Getting this working through Orbeon's XML Pipeline system was a little bit tricky. Orbeon has long had a processor for executing scripts on the command line (execute-processor), but it is not well documented. After a morning of trial and error, I have managed to successfully implement the Turtle and JSON-LD transformations through RDFLib.

The config for the processor is generated by XSLT that reads the URL path structure from the HTTP request headers in order to ascertain that Nomisma ID and Concept Scheme, which is then passed to the Python script as part of the absolute path for the RDF/XML file, which is then serialized into the format of choice.

The XPL file for the JSON-LD transformation is here.

And the simple Python script also in the Nomisma Github folder, under 'script'.

#!/usr/bin/env python
import sys
from rdflib import Graph, plugin
from rdflib.serializer import Serializer

#get argument
id = sys.argv[1]
scheme = sys.argv[2]
file = "file:///usr/local/projects/nomisma-data/" + scheme + "/" + id + ".rdf"

graph = Graph()

graph.parse(file, format='application/rdf+xml')
print(graph.serialize(format='json-ld', indent=4))

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.