Wednesday, October 3, 2018

Major Nomisma.org data model update: provenance

At long last, we have implemented provenance directly within the Nomisma.org RDF data model. This is something the scientific committee has discussed for some time, and finally implemented. This was no easy task, as it meant reverse engineering the entire editing history from the Nomisma data Github repository in order to establish a chronology of creation dates and significant modifications to the content of the SKOS concepts.

The provenance is encoded primarily in the W3C Provenance Ontology. Each concept now bears a skos:changeNote that points to a dcterms:ProvenanceStatement. This ProvenanceStatement includes a prov:wasGeneratedBy activity for the date of creation and zero or more prov:activity properties that indicate subsequent modifications. Each activity has a timestamp derived from the Github commit history.

When possible, each activity also includes a prov:wasAssociatedWith property that links to a URI in the new http://nomisma.org/editor/ namespace. Any Nomisma ID created at the time of the first Github commit was presumed to have been created by Andy Meadows and/or Sebastian Heath, but it becomes complicated after this. Many IDs minted since August 2015 have been generated by a spreadsheet import mechanism. It is important to be able to link a concept to a Google spreadsheet that created or modified it. We therefore use prov:used to link to the public HTML version of the spreadsheet, and we also include some basic metadata about the spreadsheet (the URIs of the Nomisma editors that contributed to its creation, the description of the spreadsheet, etc.). Try a DESCRIBE SPARQL query for the URI, https://docs.google.com/spreadsheets/d/19N59I8u6CnwDYfSHsr10xDt50fIRp_EyqZP_BGVaH4U/pubhtml, for example.

By collating the Github commit history with all of the known spreadsheet imports, we have been able to link thousands of concepts to a few dozen spreadsheet uploads. Other groups of manually-created IDs in several categories have been attributed to known editors: Medieval and Modern German IDs to Karsten Dahmen and Walter Bloom; Byzantine rulers to Dennis Mathie. This reverse engineering of all of the Nomisma IDs took about two weeks, and further modification of the Nomisma framework codebase was undertaken to update the HTML output to display provenance, and the back-end editor and import XForms apps had to be modified to accommodate the creation and updating of provenance events.

Here's an example of an ID created on or before 2012 and subsequently updated by two different spreadsheets:


<http://nomisma.org/id/seleuceia_ad_tigrim#provenance> a dcterms:ProvenanceStatement ;
    prov:activity [ a prov:Activity,
                prov:Modify ;
            dcterms:type "spreadsheet" ;
            prov:atTime "2015-10-24T04:00:03+00:00"^^xsd:dateTime ;
            prov:used <https://docs.google.com/spreadsheets/d/1zg7HnWqYSzVk8oSLythIOvqix0CYf0Xxv6VMb4Edeho/pubhtml> ;
            prov:wasAssociatedWith <http://nomisma.org/editor/egruber> ],
        [ a prov:Activity,
                prov:Modify ;
            dcterms:type "spreadsheet" ;
            prov:atTime "2015-08-26T04:00:03+00:00"^^xsd:dateTime ;
            prov:used <https://docs.google.com/spreadsheets/d/19N59I8u6CnwDYfSHsr10xDt50fIRp_EyqZP_BGVaH4U/pubhtml> ;
            prov:wasAssociatedWith <http://nomisma.org/editor/ameadows> ] ;
    prov:wasGeneratedBy [ a prov:Activity,
                prov:Create ;
            dcterms:description "This is among the original Nomisma XHTML+RDFa fragments, most likely created between 2010-2012 by Sebastian Heath and/or Andy Meadows."@en ;
            dcterms:type "manual" ;
            prov:atTime "2012-10-28T21:43:36+00:00"^^xsd:dateTime ;
            prov:wasAssociatedWith <http://nomisma.org/editor/ameadows>,
                <http://nomisma.org/editor/sfsheath> ] ;
    foaf:topic nm:seleuceia_ad_tigrim .

Importantly, since editor URIs are stored for individual manually-edited events as well as creators/contributors to the spreadsheets themselves, it is possible to execute a SPARQL query to extract a list of a Nomisma IDs created or modified by an individual contributor to the Nomisma.org project. By connecting an editor to their ORCID URI in the underlying editor RDF (see http://nomisma.org/editor/ameadows), we will be able to generate a DOI that reflects the intellectual contribution of that person to the field of numismatics, and that DOI (as a dataset) will appear on the ORCID profile of a scholar in the same manner as a traditional printed journal article or monograph. This will be an important advancement for Nomisma that I hope might serve as a proof of concept for other Digital Humanities projects.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.