Thursday, October 4, 2018

Linking uncertain mints to probable matches in Nomisma

We have formalized a new extension to the Nomisma.org data model to link identifiable mints whose geographic location is uncertain to the Nomisma URI for the place most likely to have been the production site for the mint. Usually, when a mint is unknown, we simply assign the http://nomisma.org/id/uncertain_value URI to the nmo:hasMint property in the Nomisma ontology, but several type corpora do contain lists of distinct "uncertain" mints. For example, the Republican mints of Sicily 1 and Sicily 2 have been attributed by Michael Crawford in Roman Republican Coinage (1974).

With the publication of Seleucid Coins Online last December and the impending publication of Ptolemaic Coins Online this fall, we have already or will have created more than 100 additional "uncertain" mints. In many cases, we can use skos:broader to link to a parent region, as the region is usually known to a reasonably high degree of certainty. In some cases, the place can be attributed as "probable", and for research and visualization purposes, it is useful to be able to capture this relationship.

Since skos:related has not been used in Nomisma to link loosely related concepts internally, we have adopted this property to generate a blank node that carries two further properties: the rdf:value that is the URI of the mint and an un:hasUncertainty of nm:uncertain_value, which is used rather generally throughout Nomisma RDF. In theory, we can expand the number of instances of un:Uncertainty to handle various gradations of certainty/uncertainty.



nm:uncertain_52_sco a nmo:Mint,
        skos:Concept ;
    dcterms:isPartOf nm:greek_numismatics ;
    dcterms:source nm:seleucid_coins_online ;
    skos:broader nm:mesopotamia ;
    skos:changeNote <http://nomisma.org/id/uncertain_52_sco#provenance> ;
    skos:definition "Uncertain Mint 52, perhaps a Subsidiary of Seleucia on the Tigris"@en ;
    skos:inScheme nm: ;
    skos:prefLabel "Uncertain Mint 52"@en ;
    skos:related [ un:hasUncertainty nm:uncertain_value ;
            rdf:value nm:seleuceia_ad_tigrim ] ;
    skos:scopeNote "This concept should be used only in the context of Seleucid coinage."@en .


In addition to this, we now make it possible to link an attributable workshop to the broader concept of the mint within its field of numismatics. In Seleucid coinage, for example, there are two distinct workshops of Babylon, Babylon I and II. Both of these mints link to nm:babylon as a skos:broader. In these cases, as with the uncertain mints, the RDF for these concepts does not contain specific geographic coordinates, but it is possible to extract them via SPARQL or other API calls.



On the user interface side of things, Numishare has been updated to extract the coordinates from parent or related mints and include them in various geographic data serializations, like KML and GeoJSON. Uncertain mints are styled in gray to differentiate from the usual blue color.

See http://numismatics.org/sco/id/sc.2.2365 for example.

Wednesday, October 3, 2018

Major Nomisma.org data model update: provenance

At long last, we have implemented provenance directly within the Nomisma.org RDF data model. This is something the scientific committee has discussed for some time, and finally implemented. This was no easy task, as it meant reverse engineering the entire editing history from the Nomisma data Github repository in order to establish a chronology of creation dates and significant modifications to the content of the SKOS concepts.

The provenance is encoded primarily in the W3C Provenance Ontology. Each concept now bears a skos:changeNote that points to a dcterms:ProvenanceStatement. This ProvenanceStatement includes a prov:wasGeneratedBy activity for the date of creation and zero or more prov:activity properties that indicate subsequent modifications. Each activity has a timestamp derived from the Github commit history.

When possible, each activity also includes a prov:wasAssociatedWith property that links to a URI in the new http://nomisma.org/editor/ namespace. Any Nomisma ID created at the time of the first Github commit was presumed to have been created by Andy Meadows and/or Sebastian Heath, but it becomes complicated after this. Many IDs minted since August 2015 have been generated by a spreadsheet import mechanism. It is important to be able to link a concept to a Google spreadsheet that created or modified it. We therefore use prov:used to link to the public HTML version of the spreadsheet, and we also include some basic metadata about the spreadsheet (the URIs of the Nomisma editors that contributed to its creation, the description of the spreadsheet, etc.). Try a DESCRIBE SPARQL query for the URI, https://docs.google.com/spreadsheets/d/19N59I8u6CnwDYfSHsr10xDt50fIRp_EyqZP_BGVaH4U/pubhtml, for example.

By collating the Github commit history with all of the known spreadsheet imports, we have been able to link thousands of concepts to a few dozen spreadsheet uploads. Other groups of manually-created IDs in several categories have been attributed to known editors: Medieval and Modern German IDs to Karsten Dahmen and Walter Bloom; Byzantine rulers to Dennis Mathie. This reverse engineering of all of the Nomisma IDs took about two weeks, and further modification of the Nomisma framework codebase was undertaken to update the HTML output to display provenance, and the back-end editor and import XForms apps had to be modified to accommodate the creation and updating of provenance events.

Here's an example of an ID created on or before 2012 and subsequently updated by two different spreadsheets:


<http://nomisma.org/id/seleuceia_ad_tigrim#provenance> a dcterms:ProvenanceStatement ;
    prov:activity [ a prov:Activity,
                prov:Modify ;
            dcterms:type "spreadsheet" ;
            prov:atTime "2015-10-24T04:00:03+00:00"^^xsd:dateTime ;
            prov:used <https://docs.google.com/spreadsheets/d/1zg7HnWqYSzVk8oSLythIOvqix0CYf0Xxv6VMb4Edeho/pubhtml> ;
            prov:wasAssociatedWith <http://nomisma.org/editor/egruber> ],
        [ a prov:Activity,
                prov:Modify ;
            dcterms:type "spreadsheet" ;
            prov:atTime "2015-08-26T04:00:03+00:00"^^xsd:dateTime ;
            prov:used <https://docs.google.com/spreadsheets/d/19N59I8u6CnwDYfSHsr10xDt50fIRp_EyqZP_BGVaH4U/pubhtml> ;
            prov:wasAssociatedWith <http://nomisma.org/editor/ameadows> ] ;
    prov:wasGeneratedBy [ a prov:Activity,
                prov:Create ;
            dcterms:description "This is among the original Nomisma XHTML+RDFa fragments, most likely created between 2010-2012 by Sebastian Heath and/or Andy Meadows."@en ;
            dcterms:type "manual" ;
            prov:atTime "2012-10-28T21:43:36+00:00"^^xsd:dateTime ;
            prov:wasAssociatedWith <http://nomisma.org/editor/ameadows>,
                <http://nomisma.org/editor/sfsheath> ] ;
    foaf:topic nm:seleuceia_ad_tigrim .

Importantly, since editor URIs are stored for individual manually-edited events as well as creators/contributors to the spreadsheets themselves, it is possible to execute a SPARQL query to extract a list of a Nomisma IDs created or modified by an individual contributor to the Nomisma.org project. By connecting an editor to their ORCID URI in the underlying editor RDF (see http://nomisma.org/editor/ameadows), we will be able to generate a DOI that reflects the intellectual contribution of that person to the field of numismatics, and that DOI (as a dataset) will appear on the ORCID profile of a scholar in the same manner as a traditional printed journal article or monograph. This will be an important advancement for Nomisma that I hope might serve as a proof of concept for other Digital Humanities projects.

Improving consistency in TTL and JSON-LD output from Nomisma

In preparation of a new and improved data model for capturing the provenance of data within the Nomisma.org Linked Open Data ecosystem (more on that later as I move these updates into production), I have revisited the RDF Turtle and JSON-LD exports from Nomisma.

The new model for provenance, as well as a new method of linking URIs that reflect uncertain mints that may be attributed to known places for Hellenistic Royal Coinages (see http://nomisma.org/id/uncertain_26_sco) will include some blank nodes. It isn't necessary to assign a permanent, addressable URI to every possible modification event for a SKOS Concept. The Provenance Statement (dcterms) has a fragment identifier, but individual activities do not.

Up to this point, the Turtle and JSON-LD serializations from Nomisma were executed via XSLT transformation from the canonical RDF/XML data (which does follow a standard model, as these are generated via controlled processes in the XForms engine). However, the complexity of dealing with blank nodes was not handled in the XSLT stylesheets for these alternative serializations, and so I sought to outsource this transformation process to the Python RDFLib library and its JSON-LD plugin.

Getting this working through Orbeon's XML Pipeline system was a little bit tricky. Orbeon has long had a processor for executing scripts on the command line (execute-processor), but it is not well documented. After a morning of trial and error, I have managed to successfully implement the Turtle and JSON-LD transformations through RDFLib.

The config for the processor is generated by XSLT that reads the URL path structure from the HTTP request headers in order to ascertain that Nomisma ID and Concept Scheme, which is then passed to the Python script as part of the absolute path for the RDF/XML file, which is then serialized into the format of choice.

The XPL file for the JSON-LD transformation is here.

And the simple Python script also in the Nomisma Github folder, under 'script'.



#!/usr/bin/env python
import sys
from rdflib import Graph, plugin
from rdflib.serializer import Serializer

#get argument
id = sys.argv[1]
scheme = sys.argv[2]
file = "file:///usr/local/projects/nomisma-data/" + scheme + "/" + id + ".rdf"

graph = Graph()

graph.parse(file, format='application/rdf+xml')
print(graph.serialize(format='json-ld', indent=4))