Wednesday, January 8, 2020

A closer look at the Nomisma monogram data model

After the launch of more than 1,200 monograms that appear on the coinage of Alexander the Great as part of the PELLA project, I have made some updates to the data maintenance framework that underlies Nomisma.org. These changes enable monogram datasets to be added/removed via revised SPARQL/Update queries that incorporate CIDOC CRM properties to link to image files and the W3C PROV ontology for data provenance (similar to the model we already implement for Nomisma concept URIs). Note that the PROV ontology is not used for data provenance for coin type corpora or hoard databases, although this is something we should consider implementing at some point.

Introduction to the Monogram/Symbol data model

Monograms published to the web follow a similar pattern to other SKOS concepts in our ecosystem. The RDF class is nmo:Monogram from the Nomisma ontology. Our ontology will be updated soon to give nmo:Monogram a superclass of CIDOC CRM's E37_Mark. Any other symbol that appears on a coin (which can be letters used as control marks or mint marks and pictographic symbols, such as "torch" or "ram's head") are also E37_Marks. The definition of a Mark is as follows:

This class comprises symbols, signs, signatures or short texts applied to instances of E24 Physical Man-Made Thing by arbitrary techniques in order to indicate the creator, owner, dedications, purpose, etc.

All symbols are inherently concepts and have one required skos:prefLabel and one skos:definition in English. Like other concepts, there might be a Field of Numismatics (dcterms:isPartOf) or Bibliographic reference (dcterms:source) pointing to Nomisma URIs.

Since all symbols are E37_Marks (directly or indirectly), we are able to use some other CIDOC CRM properties. The crm:P106_is_composed_of property points to constituent letters or symbols. Typically, this is letter, but we have at least a few examples of RIC 10 monograms that are composed of letters and Christograms. When we publish a new edition of these monograms into OCRE, we are going to create the Christogram URIs in a new /symbol/ namespace in Nomisma.org. Using property paths, it will be possible to execute a SPARQL query for any monogram that includes a Greek rho, regardless of whether this letter appears directly in the monogram or is part of a monogram that is contained within a monogram.

<http://numismatics.org/pella/symbol/monogram.price.1000>
  a nmo:Monogram, skos:Concept ;
  skos:changeNote 
    <http://numismatics.org/pella/symbol/monogram.price.1000#provenance> ;
  void:inDataset <ttp://numismatics.org/pella/> ;
  crm:P106_is_composed_of "Κ", "Υ", "Ο" ;
  skos:prefLabel "Price Monogram 1000"@en ;
  skos:definition "Monogram 1000 from M.J. Price, Coinage in the Name of Alexander the 
    Great and Philip Arrhidaeus: A British Museum Catalogue. The monogram contains 
    Κ, Υ, and Ο as identified by Peter van Alfen."@en ;
  dc:source <http://nomisma.org/id/price1991> ;
  dc:isPartOf <http://nomisma.org/id/greek_numismatics> ;
  crm:P165i_is_incorporated_in 
    <http://numismatics.org/symbolimages/pella/monogram.price.1000.svg> .

We link to one or more digital image files representing an idealized view of the monogram or symbol with the property, crm:P165i_is_incorporated_in. At the recommendation of the CRM SIG, this digital image (an SVG file: see Github for the full repository of monograms SVGs) bears a crmdig:D1_Digital_Object class and some additional triples about the license (CC Public Domain Mark), the ORCID of the graphic artist who drew them (Mark Pyzyk), and the mime-type as dcterms:format.

<http://numismatics.org/symbolimages/pella/monogram.price.1000.svg>
  a crmdig:D1_Digital_Object> ;
  dc:format "image/svg+xml" ;
  dc:creator <https://orcid.org/0000-0001-7542-4252> ;
  dc:license <https://creativecommons.org/choose/mark/> .


Like Nomisma concepts, these monograms have some provenance metadata, linking them to Peter van Alfen's Nomisma editor URI as the contributor (of the constituent letters) and a link to a source Google Spreadsheet. These monograms were imported into PELLA through a new symbol spreadsheet import functionality implemented in the Numishare platform itself. It operates much like the spreadsheet import in Nomisma, parsing the spreadsheet and transforming rows into RDF files that get stored in Numishare's eXist-db XML database.

While there is a basic interface built into PELLA (and other monogram corpora when they get published in Numishare) to query by constituent letter based on XQuery of the XML database, ultimately, I plan to implement a unified interface for this sort of query directly in Nomisma.org which will be based, instead, on SPARQL (see this basic example query). This will open the door to querying across many type corpora (Seleucid and Ptolemaic coinage combined), as well as exploit the relationships between letters, monogram URIs, and coin types that have been linked to those monogram URIs, paving the way to extract lists of mints, authorities, etc. connected to certain letters, and sort these by chronology or other categories. This is merely the tip of the iceberg in new forms of query of numismatic data that were never previously possible at this scale, made possible by Linked Open Data methodologies.

Friday, December 20, 2019

1200 Hellenistic monograms posted to PELLA, and OCRE updates

As part of the larger NEH-funded Hellenistic Royal Coinages project, we have published over 1200 monograms (and open access SVGs) to PELLA through a new spreadsheet import mechanism in Numishare that is very much like the one we use for IDs in Nomisma. I have also updated the monograms for RIC 10 that had been published to OCRE already.


The data model follows recent agreements for crm:P106_is_composed_of for constituent letters and crm:P165i_is_incorporated_in for 1 or more digital images, as also recommended by the CIDOC CRM sig. Non-monograms are E37_Mark.
 
 

We haven't connected PELLA monograms to Price Alexander coin types yet, but that will be coming pretty soon. You can, however, take a look at what's possible in OCRE (http://numismatics.org/ocre/symbol/monogram.ric.10.marcian.1), with a monogram displaying a map of related mints and a list of coin types.

The /symbols interface has a very simple system of selecting constituent letters, e.g., http://numismatics.org/pella/symbols?symbol=%CE%A0&symbol=%CE%A3

I plan to build something more sophisticated in Nomisma.org that utilizes SPARQL of monograms aggregated from disparate datasets which will also allow more complex queries based on mint, authority, etc. based on connections between monograms and coin types (including geographic visualization).
 
This is the first step of facilitating some major research questions in Greek numismatics.

Merry Christmas!

Thursday, November 7, 2019

New Partners for Nomisma

Several new partners have joined the Nomisma.org numismatic Linked Open Data ecosystem through the database network developed through the Berlin Münzkabinett. This software framework, which is used by about 20 collections in Germany and Austria, now supports the direct-to-Nomisma RDF export detailed in Nomisma.org's documentation. Previously, I had written a PHP script to harvest LIDO XML files (one by one) that were listed in text files from each institution. At one HTTP request per second, it typically took about three hours to generate an RDF export for Berlin that I stored as a static file on the numismatics.org server. Now, it takes only a minute or two to ingest RDF VoID dataset metadata and data dumps directly from the Berlin database.

Now, about three-quarters of the 40 or so collections that contribute data to Nomisma offer direct RDF exports according to our specifications, which is a tremendous advancement toward sustainability of our ingestion workflow. KENOM offers an OAI-PMH API that I have scripted to harvest, and harvesting from the Bibliothèque nationale de France is a combination CSV processing/Gallica OAI-PMH harvesting. The remaining partners have been added into Nomisma by writing bespoke scripts for processing CSV into RDF and storing static files on the ANS server (often, this process includes having to use OpenRefine to map coin type references to URIs). I am hoping that in the next few years, we can transition completely to direct RDF ingestion via our VoID specification or Linked Art JSON-LD harvesting, which I have already begun to prototype in the Nomisma.or backend.

New partners include:
  • Augsburg University
  • Konstanz University
  • Mainz University
  • University of Vienna

These add more than 1,000 coins into Nomisma.org, primarily for OCRE and CRRO.

Friday, September 27, 2019

First pass at processing Linked Art JSON-LD to Nomisma RDF

Over the last few weeks, I have been developing a harvester for Linked Art-complaint JSON-LD simultaneously in both Nomisma.org and Kerameikos.org, which share similar frameworks that are built around Orbeon XForms for manually editing or transforming large quantities of data (usually CSV) to RDF, and connecting these workflows directly to Apache Solr and a SPARQL endpoint. These new features, in both platforms, load JSON-LD from a URL, which is transformed into the XForms 2.0 spec's JSON-to-XML model, and is then validated and parsed into RDF/XML on the way into the SPARQL endpoint.

I will write something more comprehensive about how this functions specifically on the Greek pottery side of things, but I have successfully tested transforming the Linked Art JSON-LD for a test coin (http://numismatics.org/collection/1944.100.76933.jsonld?profile=linkedart) into the Nomisma.org hybrid data model that is composed of properties and classes from our own numismatic ontology and properties from other ontologies, like Dublin Core Terms and the Europeana Data Model.

This transformation process removes much of the developer-oriented cruft out of the JSON to distill the model specifically to the essential literals and URIs necessary for connecting a coin, its measurements, images, and coin type URIs to the numismatic knowledge graph in the Nomisma.org SPARQL endpoint.

Basically, it performs the following functions:

  • Maps the preferred term for an object dcterms:title and the accession number to dcterms:identifier
  • Measurements (weight, axis, diameter) are mapped to the correct Nomisma property and validated to ensure that they conform to the correct units. Inches and centimeters will be converted to millimeters for diameter, height, width, and thickness.
  • Images for each "part" (obverse, reverse) are placed into the appropriate nmo:hasObverse or nmo:hasReverse data object as foaf:depiction. IIIF service URIs are expanded into the edm:WebResource and svcs:Service model that we have appropriated from the Europeana Data Model specification.
  • Any top-level "type" (classified_as) that is not a Getty or Nomisma URI is presumed to be a coin type. We would like to discuss this further with the Linked Art community to formalize a method by which we can flag coin type URIs in a more stable and consistent manner.

It should be noted that Linked Art hasn't delved deeply into provenance, which would be necessary for encoding coin hoard URIs and findspot metadata.

You can see the resulting RDF/XML (that would get sent into the Nomisma SPARQL endpoint) here: https://gist.github.com/ewg118/049046755a670c3645689c68c14e794b.

This harvester will be adapted as changes are made to the Linked Art model. We hope that this feature in Nomisma will open the door to more streamlined and consistent aggregation of numismatic materials from the broader museum community, especially as we begin to work on new projects that are relevant to the American Art Collaborative.

Tuesday, September 3, 2019

KENOM Updates in Nomisma.org Projects

The State Museum of Prehistory Halle (Landesmuseum für Vorgeschichte Halle) is the latest partner to join the Nomisma.org Linked Open Data cloud through the KENOM portal of German civic museums. Over 300 coins have been added to OCRE and CRRO from the State Museum of Prehistory Halle. In total, KENOM has made more than 10,000 coins available into the Nomisma numismatic ecosystem, for every type corpus project published by the American Numismatic Society--including Art of Devastation, to which no one besides the American Numismatic Society has contributed. There are 19 coins from two KENOM-affiliated museums made accessible through Art of Devastation.

The Holzthaleben Hoard in the distribution of RIC Claudius Gothicus 18.

The script that harvests LIDO XML from KENOM's OAI-PMH web service has been updated to make use of findspot metadata. About 150 coins are linked to Geonames URIs as single finds and another 100 are linked to two hoard URIs published by KENOM. These will ultimately link to the Oxford Coin Hoards of the Roman Empire project. The hoards are Schwabhausen and Holzthaleben.

Wednesday, August 14, 2019

Recommendations for numismatic spreadsheet standardization

Over the years, we have considerably refined the way in which we organize our spreadsheets for processing into NUDS XML files and upload into the Numishare platform. Our workflow started with Online Coins of the Roman Empire, where numerous interns worked over the course of four years (the final three funded by the NEH) to produce dozens of spreadsheets (typically one per emperor) encompassing more than 40,000 types.

Many of the primary typological categories, such as denomination, mint, and authority, contained Nomisma URIs, and textual categories, e.g. legend and type description, were columns of free text. These spreadsheets (Excel files) were exported into CSV and processed through a PHP script that I wrote to transform each row into a NUDS document, and then this batch of files would be uploaded with the eXist-db XML database client into the appropriate Numishare collection. After this, I would manually edit the code in the Admin panel in Numishare to index the most relevant batch of RIC IDs into Solr for the public-facing browse and search interfaces (so as not to reindex an entire collection of 40,000 types when a new or updated spreadsheet might only contains several hundred items).

With the publication of PELLA in 2015, we implemented a key->pair stylesheet that enabled us to connect obverse and reverse type description codes to each unique description, with columns for English, French, and German translations. The OCRE PHP script was modified to accommodate this new model. Subsequent type corpora have been published for Ptolemaic and Seleucid coinage, each with a slight variation of yet another PHP script. Furthermore, with partners in the Netherlands, Switzerland, England, and Italy deploying their own Numishare collections for type corpora and/or collections of physical specimens, the wide range of slightly different spreadsheet models require an ever-diverse set of scripts that need to be manually maintained. It has long been a goal of mine to implement a standardized spreadsheet import into Numishare itself, modeled on the XForms-based validation and transformation of Google Sheet's Atom XML API implemented several years ago in Nomisma.org.


Mapping Google Sheets columns to NUDS elements


Finally, after about a month of development and testing, a Google Sheets-based spreadsheet import is functional in Numishare. It is primarily focused on type corpora at the moment, as not all of the physical and administrative descriptors have been implemented for mapping spreadsheet column headings of numismatic objects.

Some things remain the same:
  • Typological categories must map to Nomisma URIs
  • References for physical objects can be a coin type URI of some sort, a plain literal, or a combination of a type series and type number separated by a | character. The type series must be a literal or a Nomisma URI for a type series, but I am to enable support for Zenon bibliographic URIs
  • Parent IDs (skos:broader) and deprecation-related IDs (dcterms:replaces or dcterms:isReplacedBy) must be contained in the spreadsheet.
  • A question mark can trail a Nomisma URI to denote uncertainty. This is parsed in the XForms engine to insert the appropriate uncertainty URI into the NUDS XML. 
  • Columns for symbols/monograms located at certain positions on the obverse and reverse can be mapped to the positions listed in the Numishare config.

Structured XML produced from a spreadsheet

Other types of information requirements must be met in order for the spreadsheet to validate, which means that certain data must be explicit and not automatically inserted by a script. For example, each NUDS XML document requires a title. This title was typically generated in the PHP script by some concatenation of a human-readable string with the type number parsed from an ID column. Similarly, all coin types and all physical specimens NOT linked to a coin type URI must have an Object Type URI in the spreadsheet, even if that URI for all objects is nm:coin.

All of this normalization can occur in a pre-processing phase in OpenRefine: automatic generation of titles through regular expressions, reconciliation of typological columns to Nomisma URIs through Nomisma's OpenRefine API, etc.

This new spreadsheet import also requires type descriptions to be present in the typological spreadsheet, which means rethinking the way in which descriptions are connected to the main typology spreadsheet. Instead of a separate stylesheet spreadsheet of key->pair combinations between codes and translations, this stylesheet is incorporated as a second sheet in the typological spreadsheet. It is therefore possible to create a VLOOKUP formula between the unique type description code in the typology sheet and the corresponding column in the description stylesheet (see https://docs.google.com/spreadsheets/d/e/2PACX-1vQoyHYDyh79oJuoW9m2g9BNbnysyVWjl13KQNEyTF5dgXswQwgekXMvIDTAH3onwN35c1P9eXeJAD4w/pubhtml). Therefore, the type descriptions can still be maintained with the ease of making one change to a description in Sheet #2, and the change will immediately propagate into the Atom feed.

VLOOKUP to control type descriptions

I have applied the same logic for concordances. A single concordance sheet can be maintained and propagated across multiple relevant  type corpora.

See for example the Svoronos 1904 corpus of Ptolemaic coinage: https://docs.google.com/spreadsheets/d/e/2PACX-1vSSxfdRUvq_PZOlvt3Od1T1gu29wOSQub6DwqQviq1TMRs2gDCWRA4u0i0cqHaHWchJ9Zt3pq03pc0t/pubhtml

This contains a partial concordance between Svoronos numbers and the types from Catharine Lorber's Coins of the Ptolemaic Empire vol I, part I (gold and silver from Ptolemy I - IV as published in Ptolemaic Coins Online).

By eliminating the intermediary scripting and XML upload/indexing process, scholars will be able to use OpenRefine to prepare their data without much technical intervention and publish their type or specimen data into Numishare without significant IT overhead. This alone will save me quite a lot of time: a month of development up front to save at least the same amount of time per year in redundant scripting and OpenRefine data cleaning.

After a spreadsheet is uploaded, it will be indexed directly into Solr, if the types are active (not deprecated by newer URIs) and the indexing option has been enabled.

Full documentation of the spreadsheet upload is forthcoming.

Monday, August 5, 2019

Museum of Fine Arts, Boston joins numismatic linked data cloud

The Museum of Fine Arts, Boston is the newest entrant into the Nomisma.org Linked Open Data cloud, providing data for more than 1,600 Roman Republican and Imperial coins to Coinage of the Roman Republic Online and Online Coins of the Roman Empire. The MFA's collection is particularly strong with respect to late Roman gold pieces, many of which represent the sole specimen available for that typology in OCRE.

Solidus of Constantius II (MFA 65.270), RIC VIII Rome 291.
Of these coins, roughly 1,400 are Imperial and a little over 200 are from the Republican period. The MFA's terms of service are linked from the datasets page in Nomisma.org itself and the contributors pages in OCRE and CRRO.

Data for these coins were provided by Laure Marest, Cornelius and Emily Vermeule Assistant Curator of Greek and Roman Art, and processed through OpenRefine to reconcile against the APIs available in both projects. The resulting CSV was transformed into RDF by a script I wrote and uploaded here and ingested into Nomisma's SPARQL endpoint.