Thursday, November 7, 2019

New Partners for Nomisma

Several new partners have joined the Nomisma.org numismatic Linked Open Data ecosystem through the database network developed through the Berlin Münzkabinett. This software framework, which is used by about 20 collections in Germany and Austria, now supports the direct-to-Nomisma RDF export detailed in Nomisma.org's documentation. Previously, I had written a PHP script to harvest LIDO XML files (one by one) that were listed in text files from each institution. At one HTTP request per second, it typically took about three hours to generate an RDF export for Berlin that I stored as a static file on the numismatics.org server. Now, it takes only a minute or two to ingest RDF VoID dataset metadata and data dumps directly from the Berlin database.

Now, about three-quarters of the 40 or so collections that contribute data to Nomisma offer direct RDF exports according to our specifications, which is a tremendous advancement toward sustainability of our ingestion workflow. KENOM offers an OAI-PMH API that I have scripted to harvest, and harvesting from the Bibliothèque nationale de France is a combination CSV processing/Gallica OAI-PMH harvesting. The remaining partners have been added into Nomisma by writing bespoke scripts for processing CSV into RDF and storing static files on the ANS server (often, this process includes having to use OpenRefine to map coin type references to URIs). I am hoping that in the next few years, we can transition completely to direct RDF ingestion via our VoID specification or Linked Art JSON-LD harvesting, which I have already begun to prototype in the Nomisma.or backend.

New partners include:
  • Augsburg University
  • Konstanz University
  • Mainz University
  • University of Vienna

These add more than 1,000 coins into Nomisma.org, primarily for OCRE and CRRO.

Friday, September 27, 2019

First pass at processing Linked Art JSON-LD to Nomisma RDF

Over the last few weeks, I have been developing a harvester for Linked Art-complaint JSON-LD simultaneously in both Nomisma.org and Kerameikos.org, which share similar frameworks that are built around Orbeon XForms for manually editing or transforming large quantities of data (usually CSV) to RDF, and connecting these workflows directly to Apache Solr and a SPARQL endpoint. These new features, in both platforms, load JSON-LD from a URL, which is transformed into the XForms 2.0 spec's JSON-to-XML model, and is then validated and parsed into RDF/XML on the way into the SPARQL endpoint.

I will write something more comprehensive about how this functions specifically on the Greek pottery side of things, but I have successfully tested transforming the Linked Art JSON-LD for a test coin (http://numismatics.org/collection/1944.100.76933.jsonld?profile=linkedart) into the Nomisma.org hybrid data model that is composed of properties and classes from our own numismatic ontology and properties from other ontologies, like Dublin Core Terms and the Europeana Data Model.

This transformation process removes much of the developer-oriented cruft out of the JSON to distill the model specifically to the essential literals and URIs necessary for connecting a coin, its measurements, images, and coin type URIs to the numismatic knowledge graph in the Nomisma.org SPARQL endpoint.

Basically, it performs the following functions:

  • Maps the preferred term for an object dcterms:title and the accession number to dcterms:identifier
  • Measurements (weight, axis, diameter) are mapped to the correct Nomisma property and validated to ensure that they conform to the correct units. Inches and centimeters will be converted to millimeters for diameter, height, width, and thickness.
  • Images for each "part" (obverse, reverse) are placed into the appropriate nmo:hasObverse or nmo:hasReverse data object as foaf:depiction. IIIF service URIs are expanded into the edm:WebResource and svcs:Service model that we have appropriated from the Europeana Data Model specification.
  • Any top-level "type" (classified_as) that is not a Getty or Nomisma URI is presumed to be a coin type. We would like to discuss this further with the Linked Art community to formalize a method by which we can flag coin type URIs in a more stable and consistent manner.

It should be noted that Linked Art hasn't delved deeply into provenance, which would be necessary for encoding coin hoard URIs and findspot metadata.

You can see the resulting RDF/XML (that would get sent into the Nomisma SPARQL endpoint) here: https://gist.github.com/ewg118/049046755a670c3645689c68c14e794b.

This harvester will be adapted as changes are made to the Linked Art model. We hope that this feature in Nomisma will open the door to more streamlined and consistent aggregation of numismatic materials from the broader museum community, especially as we begin to work on new projects that are relevant to the American Art Collaborative.

Tuesday, September 3, 2019

KENOM Updates in Nomisma.org Projects

The State Museum of Prehistory Halle (Landesmuseum für Vorgeschichte Halle) is the latest partner to join the Nomisma.org Linked Open Data cloud through the KENOM portal of German civic museums. Over 300 coins have been added to OCRE and CRRO from the State Museum of Prehistory Halle. In total, KENOM has made more than 10,000 coins available into the Nomisma numismatic ecosystem, for every type corpus project published by the American Numismatic Society--including Art of Devastation, to which no one besides the American Numismatic Society has contributed. There are 19 coins from two KENOM-affiliated museums made accessible through Art of Devastation.

The Holzthaleben Hoard in the distribution of RIC Claudius Gothicus 18.

The script that harvests LIDO XML from KENOM's OAI-PMH web service has been updated to make use of findspot metadata. About 150 coins are linked to Geonames URIs as single finds and another 100 are linked to two hoard URIs published by KENOM. These will ultimately link to the Oxford Coin Hoards of the Roman Empire project. The hoards are Schwabhausen and Holzthaleben.

Wednesday, August 14, 2019

Recommendations for numismatic spreadsheet standardization

Over the years, we have considerably refined the way in which we organize our spreadsheets for processing into NUDS XML files and upload into the Numishare platform. Our workflow started with Online Coins of the Roman Empire, where numerous interns worked over the course of four years (the final three funded by the NEH) to produce dozens of spreadsheets (typically one per emperor) encompassing more than 40,000 types.

Many of the primary typological categories, such as denomination, mint, and authority, contained Nomisma URIs, and textual categories, e.g. legend and type description, were columns of free text. These spreadsheets (Excel files) were exported into CSV and processed through a PHP script that I wrote to transform each row into a NUDS document, and then this batch of files would be uploaded with the eXist-db XML database client into the appropriate Numishare collection. After this, I would manually edit the code in the Admin panel in Numishare to index the most relevant batch of RIC IDs into Solr for the public-facing browse and search interfaces (so as not to reindex an entire collection of 40,000 types when a new or updated spreadsheet might only contains several hundred items).

With the publication of PELLA in 2015, we implemented a key->pair stylesheet that enabled us to connect obverse and reverse type description codes to each unique description, with columns for English, French, and German translations. The OCRE PHP script was modified to accommodate this new model. Subsequent type corpora have been published for Ptolemaic and Seleucid coinage, each with a slight variation of yet another PHP script. Furthermore, with partners in the Netherlands, Switzerland, England, and Italy deploying their own Numishare collections for type corpora and/or collections of physical specimens, the wide range of slightly different spreadsheet models require an ever-diverse set of scripts that need to be manually maintained. It has long been a goal of mine to implement a standardized spreadsheet import into Numishare itself, modeled on the XForms-based validation and transformation of Google Sheet's Atom XML API implemented several years ago in Nomisma.org.


Mapping Google Sheets columns to NUDS elements


Finally, after about a month of development and testing, a Google Sheets-based spreadsheet import is functional in Numishare. It is primarily focused on type corpora at the moment, as not all of the physical and administrative descriptors have been implemented for mapping spreadsheet column headings of numismatic objects.

Some things remain the same:
  • Typological categories must map to Nomisma URIs
  • References for physical objects can be a coin type URI of some sort, a plain literal, or a combination of a type series and type number separated by a | character. The type series must be a literal or a Nomisma URI for a type series, but I am to enable support for Zenon bibliographic URIs
  • Parent IDs (skos:broader) and deprecation-related IDs (dcterms:replaces or dcterms:isReplacedBy) must be contained in the spreadsheet.
  • A question mark can trail a Nomisma URI to denote uncertainty. This is parsed in the XForms engine to insert the appropriate uncertainty URI into the NUDS XML. 
  • Columns for symbols/monograms located at certain positions on the obverse and reverse can be mapped to the positions listed in the Numishare config.

Structured XML produced from a spreadsheet

Other types of information requirements must be met in order for the spreadsheet to validate, which means that certain data must be explicit and not automatically inserted by a script. For example, each NUDS XML document requires a title. This title was typically generated in the PHP script by some concatenation of a human-readable string with the type number parsed from an ID column. Similarly, all coin types and all physical specimens NOT linked to a coin type URI must have an Object Type URI in the spreadsheet, even if that URI for all objects is nm:coin.

All of this normalization can occur in a pre-processing phase in OpenRefine: automatic generation of titles through regular expressions, reconciliation of typological columns to Nomisma URIs through Nomisma's OpenRefine API, etc.

This new spreadsheet import also requires type descriptions to be present in the typological spreadsheet, which means rethinking the way in which descriptions are connected to the main typology spreadsheet. Instead of a separate stylesheet spreadsheet of key->pair combinations between codes and translations, this stylesheet is incorporated as a second sheet in the typological spreadsheet. It is therefore possible to create a VLOOKUP formula between the unique type description code in the typology sheet and the corresponding column in the description stylesheet (see https://docs.google.com/spreadsheets/d/e/2PACX-1vQoyHYDyh79oJuoW9m2g9BNbnysyVWjl13KQNEyTF5dgXswQwgekXMvIDTAH3onwN35c1P9eXeJAD4w/pubhtml). Therefore, the type descriptions can still be maintained with the ease of making one change to a description in Sheet #2, and the change will immediately propagate into the Atom feed.

VLOOKUP to control type descriptions

I have applied the same logic for concordances. A single concordance sheet can be maintained and propagated across multiple relevant  type corpora.

See for example the Svoronos 1904 corpus of Ptolemaic coinage: https://docs.google.com/spreadsheets/d/e/2PACX-1vSSxfdRUvq_PZOlvt3Od1T1gu29wOSQub6DwqQviq1TMRs2gDCWRA4u0i0cqHaHWchJ9Zt3pq03pc0t/pubhtml

This contains a partial concordance between Svoronos numbers and the types from Catharine Lorber's Coins of the Ptolemaic Empire vol I, part I (gold and silver from Ptolemy I - IV as published in Ptolemaic Coins Online).

By eliminating the intermediary scripting and XML upload/indexing process, scholars will be able to use OpenRefine to prepare their data without much technical intervention and publish their type or specimen data into Numishare without significant IT overhead. This alone will save me quite a lot of time: a month of development up front to save at least the same amount of time per year in redundant scripting and OpenRefine data cleaning.

After a spreadsheet is uploaded, it will be indexed directly into Solr, if the types are active (not deprecated by newer URIs) and the indexing option has been enabled.

Full documentation of the spreadsheet upload is forthcoming.

Monday, August 5, 2019

Museum of Fine Arts, Boston joins numismatic linked data cloud

The Museum of Fine Arts, Boston is the newest entrant into the Nomisma.org Linked Open Data cloud, providing data for more than 1,600 Roman Republican and Imperial coins to Coinage of the Roman Republic Online and Online Coins of the Roman Empire. The MFA's collection is particularly strong with respect to late Roman gold pieces, many of which represent the sole specimen available for that typology in OCRE.

Solidus of Constantius II (MFA 65.270), RIC VIII Rome 291.
Of these coins, roughly 1,400 are Imperial and a little over 200 are from the Republican period. The MFA's terms of service are linked from the datasets page in Nomisma.org itself and the contributors pages in OCRE and CRRO.

Data for these coins were provided by Laure Marest, Cornelius and Emily Vermeule Assistant Curator of Greek and Roman Art, and processed through OpenRefine to reconcile against the APIs available in both projects. The resulting CSV was transformed into RDF by a script I wrote and uploaded here and ingested into Nomisma's SPARQL endpoint.

Wednesday, July 24, 2019

More than 600 BnF Ptolemaic coins added to PCO

More than 660 Ptolemaic coins from the Bibliothèque nationale de France have been added into the Nomisma.org numismatic Linked Open Data cloud and are accessible through Ptolemaic Coins Online and the broader Hellenistic Royal Coinages umbrella site. There are now about 2,400 Ptolemaic coins in PCO (which includes at this phase the gold and silver coinage of Ptolemy I - IV, ca. 330-200 B.C.), and roughly 75% of these are from the BnF and American Numismatic Society. Therefore, high resolution, public domain images are available for reuse for these objects through IIIF web services. In total, 572 of 984 total Ptolemaic types are linked to at least one photographed specimen--almost 60% of the corpus in total.

Tetradrachm of Ptolemy IV, CPE I.1, 925.

Friday, June 7, 2019

Upgrades to research context in Nomisma's user interface

After several days of development, I have pushed some significant changes to the Nomisma.org user interfaces regarding additional context for certain types of entities defined in the system. Building on recent advancements that I made in increasing the complexity of typological and metrical visualizations in both Nomisma and the Numishare platform (specifically for Hellenistic Royal Coinages), I have introduced the same sorts of queries for the geographic APIs (that serialize SPARQL queries for mints, findspots, and hoards associated with a Nomisma concept into GeoJSON for display in Leaflet) and the list of related coin types.

Using the relationships inherent in Nomisma's data, we are now able to visualize the geographic distribution of corporate authorities, dynasties, and people appearing on portraits. For example, in order to generate a map illustrating the distribution of mints for the entire Seleucid Empire, the SPARQL query will search for coin types with an nmo:hasAuthority of a person who has an org:hasMembership/org:organization of http://nomisma.org/id/seleucid_empire.

In introducing this new level of complexity, I rewrote a significant portion of the pipelines underlying these URIs to migrate from a large series of hand-coded SPARQL templates in which small portions were replaced with simple string replacements to a more generalizable and flexible system built around using XSLT to generate a complex XML metamodel for a SPARQL query, which is then serialized by another set of XSLT templates into the SPARQL text that is POSTed to the endpoint.

New interface for http://nomisma.org/id/seleucid_empire, showing related coin types and a map of all Seleucid mints and known IGCH hoards and one single find.


As a result, the following improvements in mapping and/or related coin types have been applied to the following categories of SKOS concept:


Additionally, some updates were made to the distribution and metrical analysis SPARQL templates to query based on portraiture from the ID page (http://nomisma.org/id/faustina_i). This had previously not been possible--one had to use the purpose-built visualization interfaces and select "Portrait" as a facet.

The distribution of deities between Faustina and Antoninus Pius, as generated on the ID page for Faustina.

Having migrated to this new metamodel system for the generation of GeoJSON for geographic queries, it will be possible to enhance the complexity to iterate beyond queries about one particular Nomisma concept to queries that involve more than one parameter (such as those in the distribution and metrical visualization interfaces). That is to say, it will be possible eventually to not only generate a map showing the mints that produced tetradrachms, and where tetradrachms have been found in hoards, but where the tetradrachms of Ptolemy I have been produced and found. While Numishare contains a map interface that enables the display of mints pertaining to a query (driven by Apache Solr, the search index for Numishare), the indexing of findspots into Solr was disabled a year or two ago due to problems with scaling and the wait time for indexing a type corpus as large as OCRE. The next step is to rewrite the Numishare map interface to interact with Nomisma's SPARQL endpoint directly to display mints and findspots (which always reflects the current data ingested into the Nomisma linked data cloud), rather than rely on Solr.

Another major update is looming on the horizon, probably to come within the next few weeks: enhanced data for Roman Imperial persons. Presently, Hellenistic kings have been thoroughly integrated with Nomisma URIs for dynasties and corporate entities, but these are lacking in the Roman world. The Nomisma.org Roman committee is currently working on a revised spreadsheet of people in order to add new dynasties and corporate bodies into the system, as well as start and end dates for the reigns of Roman emperors. This means that we can compare visual motifs between the Julio-Claudians and the Flavians or compare the change in weights of antoniniani between the Gallic Empire and the Roman Empire (Valerian to Gallienus) over the same time period.

Additionally, dynasties and corporate bodies will be introduced as facets in OCRE. We are also working on a spreadsheet of Roman provinces, which will also be introduced as facets in addition to historical regions. The Region facet in OCRE is currently a conflation of provinces and historical regions owing to inconsistencies in RIC's structure.