Tuesday, November 7, 2017

Numishare supports OpenRefine reconciliation APIs for OCRE, PELLA, and CRRO

After building a reconciliation service for Nomisma concepts, I began working on applying the same methodologies to creating an OpenRefine reconciliation API for coin type corpora projects published in the Numishare platform. The API has been extended to support suggestions for properties. These properties are facet/string (exact match) or text (keyword anywhere) fields for mints, rulers, denominations, etc. that have been indexed into Apache Solr. It may be possible to extend this property list to dates, legends, or other indexed fields.

Property suggestion API is derived from available Solr facet fields

Test Case: University of Graz Roman imperial coins

I received a spreadsheet of about 2,000 Roman imperial coins with RIC numbers and emperors from Elisabeth Steiner at the University of Graz. I performed some cleanup of the RIC numbers and normalized the emperor list to English preferred labels via the Nomisma OpenRefine reconciliation service (more details below). About half of the coins normalized to OCRE IDs on the first pass (which took 45 seconds), but the majority of non-matches fell into two categories: RIC numbers that had been split by OCRE into separate URIs due to differences in denomination and RIC 6-8 volumes, where the numbering restarted based on mint rather than ruler. To ameliorate these issues, I got an updated spreadsheet that contained columns for mint and denomination.

Filter for uncertain attributions, 'od.'

My workflow was as follows:


Tuesday, October 31, 2017

ANS web projects, IIIF deployed on new server

In other big news today with the consolidation of Nomisma and ANS digital projects on numismatics.org onto the same dedicated server (which is much more powerful than the separate cloud servers each domain previously ran on), our IIIF image server (Loris) and presentation APIs are now running in production. IIIF functionality extends beyond simple zoom functionality and manifests for single objects in our own collection (e.g., http://numismatics.org/collection/1944.100.39026), but an entirely new array of features, some of which have been described in previous posts:

  • There are now 160,000 photographed objects in the ANS collection, the high res photos of which are available through IIIF. Of these, more than 55,000 are Greek and Roman coins linked to types defined in OCRE, CRRO, and PELLA. Like our other partners that publish IIIF, the zoomable images are available on the coin type landing page, but the ANS adds tremendous coverage in these domains. See dozens of our coins linked to Price 4.
  • Manifests for coin types are generated dynamically by a combination of NUDS typological metadata + SPARQL query results for associated physical specimens with IIIF service metadata. The manifest is linked at the top of the page, along with a link to view the manifest. These sorts of manifests are the jumping-off point for annotating symbols and monograms on coins.
  • The ANS Archives support IIIF through TEI (digitized coin hoard notebooks), EAD, and MODS resources (photographs). Photographs linked to ancient places defined in Pleiades can be ingested into Pelagios. More here: http://eaditor.blogspot.com/2017/10/eaditor-now-supports-ead-and-mods-to.html. The Newell notebooks are so far the only digital resources featuring annotations, so far.
  • Rainer Simon is reindexing the ANS coins linked to ancient places via Nomisma->Pleiades concordances into Peripleo. These extend beyond the 55,000 Greco-Roman coins linked to OCRE, CRRO, and PELLA to include all ancient coins linked to Nomisma IDs for mints. See https://twitter.com/aboutgeo/status/925292397986279425. Of the 140,000 coins in MANTIS linked to Pleiades places, about 83,000 have been photographed/provide IIIF service metadata to Pelagios.https://twitter.com/aboutgeo/status/925292397986279425

Nomisma launches OpenRefine reconciliation service

I recently received a spreadsheet of Classical and Archaic Greek coin hoard data, atomized and parsed for content in the Inventory of Greek Coin Hoard textual descriptions. I loaded this spreadsheet up into Open Refine in order to break it down further in order to separate the Authority column into separate columns for mints, regions, and rulers, parse uncertain attributions (looking for question marks), separate the numeric count of coins from denomination abbreviations, etc. The Authority column itself contained mostly mints, all of which are represented by concepts defined on Nomisma.org. The easiest way to reconcile this list would be to run against an OpenRefine reconciliation API, which did not exist--so I built one between Friday and Monday.

The new service is now listed among the Nomisma APIs, available at http://nomisma.org/apis/reconcile.



The API does not support every possible optional service yet, but it supports the most useful one for normalizing data to Nomisma concepts. Here's what it does do:

  • The main reconciliation service, returning the basic service JSON when there are no query parameters. Both the 'query' and 'queries' HTTP parameters are supported. These query parameters are parsed into one or more Solr queries to yield a response.
  • The Preview API, for displaying a little HTML popup when hovering the mouse over a reconciled candidate (a simple serialization from the concept RDF/XML into HTML).
  • The Entity Suggest API, which allows a user to type in a new term, yielded an autosuggest response (this is also Solr serialized into JSON). The suggest flyout is also supported, which is a serialization of the concept RDF/XML into a tiny JSON snippet to be displayed when hovering over the autosuggested term.

These are the most vital services, which enabled me to normalize about 3,500 of 4,000 lines of the CSV to existing Nomisma mints, regions, and rulers. About half remaining 500 are multiple mints of the same place name that need to verified on a case by case basis (by checking the IGCH or Coin Hoards record). After that, the only non-reconciled values remaining are rulers that do not have Nomisma IDs yet. We can extract a facet list of these values and generate the Nomisma IDs, and then reconcile the list.

I spent about two days working on these APIs, and then did almost 90% of the matching in five minutes.


Wednesday, October 4, 2017

University of Graz joins Nomisma partner consortium

The Institute of Ancient History and Classical Antiquities at the University of Graz houses a collection of nearly 4000 antique coins. Nearly 300 of these are Roman Republican coins identified with Michael Crawford's Roman Republican Coinage numbers. Working with Elisabeth Steiner, who is responsible for the digital archaeological collections, we were able to link these to URIs for RRC types defined in Coinage of the Roman Republic Online and ingest them into the Nomisma.org SPARQL endpoint. Additionally, the photos of the coins are available through IIIF services, and the coin metadata (canonically stored as TEI files in a Fedora repository) are transformed dynamically and served as Nomisma-compliant RDF directly from their information system.

There are several thousand Roman imperial coins with Roman Imperial Coinage references, but some work remains in normalizing emperor + RIC numbers to OCRE URIs.

You can see an example of two University of Graz coins linked to RRC 282/1 here.

Wednesday, September 13, 2017

SPARQL-derived IIIF Manifests for Coin Types

Building on Numishare's new IIIF manifest functionality, I have extended this feature to support the generation of a manifest for the IIIF resources for physical coins associated with coin types. The coin type manifest is generated in much of the same way as other queries for related coins: a SPARQL query, with the exception that the results are restricted only to physical specimens connected to IIIF services, according to the EDM extension.

The query is here: https://gist.github.com/ewg118/d4833fc0f61c17f84c723dcb3a7fe901

The metadata for the manifest are still derived from the typological information within the coin type NUDS document, but each canvas object is transformed from the SPARQL results for each coin.

The challenge here is that different organizations have different standards for packaging their obverse and reverse photographs. Most of our IIIF partners (including the ANS) have a separate image file for both the obverse and reverse, but several (like Harvard Art Museums) have combined the obverse and reverse photos into a single image file. In order to present the most cohesive and intuitive display of the manifest within a IIIF viewer, I opted to represent each coin rather than each image as a canvas (and incorporated collection, identifier, weight, diameter, and axis metadata within the canvas). Therefore, I experimented with generating a canvas as wide as the width of both the obverse and reverse images and placing both images on the same canvas, with the reverse image as a segment with its top left corner aligning with the top right corner of the obverse image. This is done by specifying placement and dimensions of the canvas:

"on": "http://gallica.bnf.fr/ark:/12148/btv1b10323768v#xywh=2646,0,2646,2646"

However, I have found that neither Universal Viewer nor Mirador support this functionality out of the box yet, even though it is valid according to the IIIF Presentation API. It does work in Masahide Kanzaki's IIIF viewer built on OpenSeadragon.



You can see an example of the Price 112 manifest rendered here: http://www.kanzaki.com/works/2016/pub/image-annotator?u=http://app1.numismatics.org/pella/manifest/price.112

IIIF coverage will be expanded significantly across PELLA, OCRE, and CRRO once the new version of Mantis goes live in the next one or two weeks. There is now a link at the top of each coin type page (when applicable) to the IIIF manifest.

Thursday, August 31, 2017

Extending Numishare for IIIF

The American Numismatic Society is in the process of migrating both its own production web applications and Nomisma.org to a central dedicated Rackspace server from the Rackspace cloud. The new server has a lot more resources, which will make our projects more stable and efficient. Furthermore, the extra storage, CPU, and RAM will now make it possible to publish high resolution images through IIIF. All of our high resolution images have been transferred to the new server (over 420GB), and when we flip the switch on the domain names over to the new server, Mantis will now feature high resolution zooming of the obverse and reverse images of about 150,000 objects in its collection. The RDF exports from Mantis for both Nomisma.org and Pelagios will include IIIF service metadata conforming to the Europeana Data Model extension (already detailed in older posts). This means that the high-res coverage of Roman Republican, Imperial, and coinage of Alexander the Great will be increased by 60,000 coins in CRRO, OCRE, and PELLA.

Updating NUDS and Numishare

The NUDS XSD schema has namespaced METS for capturing image links for thumbnails and reference images for both the obverse and reverse of the coin. A mets:file[@USE='iiif'] has been added into the mets:fileGrp for both the obverse and reverse. The mets:FLocat URL points to the IIIF service. This is a simple XML model modification that needed to be addressed in Numishare's code in several contexts:

IIIF images rendered in Mantis

  • If there's a mets:file[@USE='iiif'] detected when rendering the HTML page for an object, the obverse and reverse IIIF services are rendered with Leaflet (above)
  • As above, in the NUDS->RDF serialization, the EDM IIIF extension includes an edm:WebResource, svcs:Service to capture IIIF service metadata
  • The Solr->Nomisma, Pelagios RDF for large data exports will now include IIIF service metadata since the service URI for each image is indexed into Solr

 

Manifests

I have authored an XSLT stylesheet that transforms the NUDS XML document for a physical object into a JSON manifest. This transformation process extracts metadata from the NUDS typological and physical descriptions and generates a sequence of two canvases: one for the obverse and the other for the reverse.

This XSLT compiles a variable of an XML metamodel by applying templates to NUDS and METS elements. This metamodel is then piped through templates in order to construct JSON. The metamodel includes XML elements such as _array, _object, __@id, __@type, which are then transformed into proper JSON on the output with Orbeon's XML Pipeline Language. The model is similar in concept to how XForms constructs an XML model from a JSON service according to the XForms 2.0 specification.

You can view an example of a manifest at http://app1.numismatics.org/search/manifest/1995.11.1648. Here it is rendered in Universal Viewer.

I plan to expand this functionality so that IIIF manifests can be derived from NUDS + SPARQL results for coin types so that it will be possible to see a listing of all available images for a coin type, which will lend itself to more efficient annotation of iconographic features, monograms, and other sorts of symbols. These monograms will link to URIs defined in type corpora. We will be creating the full array of Hellenistic monograms as part of the NEH-funded Hellenistic Royal Coinages project. Ultimately, I would like these monograms to be annotated on images, where available. This isn't a feature of the project that we outlined in our application, but will come as an added bonus.

Tuesday, August 22, 2017

More mapping features for Nomisma SPARQL responses

Edited 23 August 2017: changed getGeoJsonForQuery to query.json and getKmlForQuery to query.kml

The Nomisma.org SPARQL query HTML interface, for both SELECT and CONSTRUCT/DESCRIBE responses, has been extended generate a map when the query response includes latitude and longitude geographic coordinates (when using the geo:lat and geo:long RDF properties).

The introduction of this new feature necessitated the creation of a new API, query.json, which is essentially an XSLT transformation from either the SPARQL XML response schema or RDF into GeoJSON. The XSLT was extended from existing templates for the other GeoJSON responses for getting the geographic distribution of mints, hoards, or individual finds for a Nomisma concept ID or a coin type.

The query (get a list of hoards that contained tetradrachm coin types [based on PELLA])


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX nm: <http://nomisma.org/id/>
PREFIX nmo: <http://nomisma.org/ontology#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT ?hoard ?findspot ?lat ?long WHERE {
  ?type a nmo:TypeSeriesItem ;
          nmo:hasDenomination nm:tetradrachm .
  ?coin nmo:hasTypeSeriesItem ?type ;
        dcterms:isPartOf ?hoard .
  ?hoard a nmo:Hoard ;
           nmo:hasFindspot ?findspot .
  ?findspot geo:lat ?lat ;
            geo:long ?long
}

is therefore rendered as the image below:

Map response for SPARQL query, http://bit.ly/2v36y9t
The GeoJSON response is rendered in Leaflet with the MarkerCluster plugin. Similarly, by replacing SELECT DISTINCT with DESCRIBE, the map will also render when coordinates are available within the RDF output from the SPARQL endpoint. See here.

Furthermore, I introduced a query.kml API, which accepts the same SPARQL query parameter and outputs KML instead of GeoJSON.

The SPARQL results page now has links to download the result set in CSV for SELECT queries and RDF/XML, Turtle, and JSON-LD for CONSTRUCT/DESCRIBE, as well as links to the GeoJSON and KML responses when geographic distributions are available.