Thursday, July 18, 2013

Nomisma: Using XForms to Manage and Publish Linked Open Data

One of the main improvements in the newly-redesigned Nomisma web architecture is in the administrative backend, not visible to the public.  The previous iteration of Nomisma was built on top of open source wiki software.   Each id was an XHTML+RDFa fragment in the filesystem, created and edited through the wiki.  There was no validation, and the hand-coding of XHTML fragments occasionally led to human error: invalid XML documents which occasionally broke page loads or RDF distillation.  We needed to move to a more stable and scalable infrastructure.

The XHTML+RDFa fragments remain a part of the new architecture of Nomisma, now maintained in a GitHub repository.  The fragments are now edited in an XForms interface with the Orbeon processor, which enables not only editing of XML, but a variety of REST interactions to get and post data into the Apache Fuseki RDF triplestore and SPARQL endpoint, and post data into the Solr search index, which powers the Atom feed.

While the XForms web forms handle the simplest of XHTML templates, such as those for authorities, mints, regions, etc., it does not yet handle editing of more the more complex data models, such as those for IGCH hoards (like http://nomisma.org/id/igch0200) or coin types (for example, http://nomisma.org/id/rrc-174.1).  However, hoards and coin types are least likely to be manually edited, so the editing interface is most useful for those numismatic concepts which are most likely to be enhanced with additional labels and references to other linked open data identifiers (like VIAF or Pleiades ids).

Validation


One of the main features of XForms is advanced validation.  The @typeof attribute in the XHTML root div is tied to a drop down menu.  The values in this drop down menu are generated dynamically before the form has finished loading (xforms-model-construct-done) directly from a SPARQL query to acquire all of the nm:numismatic_term ids in Nomisma:

PREFIX rdf:      <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dcterms:  <http://purl.org/dc/terms/>
PREFIX nm:       <http://nomisma.org/id/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?uri ?label WHERE {
?uri  rdf:type <http://nomisma.org/id/numismatic_term>.
?uri skos:prefLabel ?label .
FILTER (lang(?label) = "en")}
ORDER BY ASC(?label)

A similar query is passed from Orbeon to the endpoint to generate an XForms instance for nm:field_of_numismatics (e.g., Greek Numismatics, Roman Numismatics, etc.).  Languages (xml:lang in the div) are also tied to an instance which contains every ISO language code and label.


XForms bindings and XPath also ensure that other requirements of the XHTML document are met: there must be an English preferred label, the labels cannot be blank, there can be no repetitive languages for preferred labels, latitude and longitude must be decimal values between -180 and 180, and related links must be valid URIs.

One of the new features of this interface is the interaction between XForms and dbpedia.  It is possible to import labels in languages not already in the Nomisma id from dbpedia RDF.  The XForms submission is fairly straightforward:

<xforms:submission id="get-dbpedia-rdf" action="http://dbpedia.org/data/{instance('control-instance')/dbpedia}.rdf" ref="instance('dbpedia')"
                replace="instance" method="get">
                <xforms:message ev:event="xforms-submit-error" level="modal">Failed to get Dbpedia RDF.</xforms:message>
                <xforms:action ev:event="xforms-submit-done" xxforms:iterate="instance('dbpedia')//rdfs:label">
                    <xxforms:variable name="lang" select="@xml:lang"/>
                    <xforms:action if="not(instance('doc')/xhtml:div[@property='skos:prefLabel'][@xml:lang=$lang])">
                        <xforms:insert context="instance('doc')" nodeset="./xhtml:div[@property='skos:prefLabel'][last()]" origin="instance('prefLabel-template')"/>
                        <xforms:setvalue ref="instance('doc')/xhtml:div[@property='skos:prefLabel'][last()]" value="context()"/>
                        <xforms:setvalue ref="instance('doc')/xhtml:div[@property='skos:prefLabel'][last()]/@xml:lang" value="$lang"/>
                    </xforms:action>
                </xforms:action>
</xforms:submission>
Thus it is easily to rapidly and easily incorporate new labels into Nomisma to facilitate multilingual interfaces in other projects which depend on it for data (like OCRE and the UVA collection).

Workflow


Since the ids need to be maintained in GitHub in the long-term, the editing workflow requires the loading and saving of XHTML+RDFa fragments in the filesystem rather than through a REST interface like eXist.

The workflow is as follows:
  • Load existing id from filesystem or create new one
  • Edit the id
  • Save id. When the document is valid, the save button becomes enabled, and clicking the save button initiates several processes:
  1. Serialize the ids to XML and save back to the filesystem
  2. Serialize the XHTML+RDFa into RDF
  3. Using SPARQL/Update, POST the RDF back into the endpoint.  Since using POST adds new triples into the subject (e.g., http://nomisma.org/id/rome) (creating duplicate triples), the subject must first be flushed from the endpoint before the RDF is sent to Fuseki.  Therefore the following SPARQL query must be sent to the endpoint before the newly-edited RDF is inserted (wonky and unintuitive, but necessary with SPARQL/Update):
DELETE {?s ?p ?o} WHERE { <http://nomisma.org/id/rome> ?p ?o . ?s ?p ?o . FILTER (?s = <http://nomisma.org/id/rome>) } 
  • After the RDF is updated in the endpoint, the XHTML+RDFa is serialized into a Solr XML document and posted into the search index (for the Atom feed, although we may implement faceted search/browse eventually). After the doc is sent, a commit is sent to Solr.
  • Finally, a nightly cron job adds new files into the GitHub repo, and then changes are committed and pushed into GitHub.  Another job then runs to generate RDF dumps of the Nomisma data, which are available on the nomisma.org home page.
This is the gist of the editing workflow in the new version of Nomisma.  I plan to improve the XHTML+RDFa editing templates to support a greater degree of complexity in the data model.  Additionally, I aim to create an administrative interface to better manage datasets provided by other institutions.  The endpoint includes not only Nomisma ids, but RDF provided by OCRE, UVA, CHRR, the ANS, and a portion of the Berlin coinage for Augustus.  I want to be able to get VoID RDF files from new data contributors and do consistency checks on RDF dumps before ingesting them into Fuseki.  I also want to be able to delete or update all triples from a single institution.  This functionality will come eventually.  It will become a higher priority once there are more contributors of numismatic data to Nomisma.

All the code discussed above is, of course, open source: https://github.com/ewg118/nomisma/tree/master/xforms

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.