Wednesday, August 14, 2019

Recommendations for numismatic spreadsheet standardization

Over the years, we have considerably refined the way in which we organize our spreadsheets for processing into NUDS XML files and upload into the Numishare platform. Our workflow started with Online Coins of the Roman Empire, where numerous interns worked over the course of four years (the final three funded by the NEH) to produce dozens of spreadsheets (typically one per emperor) encompassing more than 40,000 types.

Many of the primary typological categories, such as denomination, mint, and authority, contained Nomisma URIs, and textual categories, e.g. legend and type description, were columns of free text. These spreadsheets (Excel files) were exported into CSV and processed through a PHP script that I wrote to transform each row into a NUDS document, and then this batch of files would be uploaded with the eXist-db XML database client into the appropriate Numishare collection. After this, I would manually edit the code in the Admin panel in Numishare to index the most relevant batch of RIC IDs into Solr for the public-facing browse and search interfaces (so as not to reindex an entire collection of 40,000 types when a new or updated spreadsheet might only contains several hundred items).

With the publication of PELLA in 2015, we implemented a key->pair stylesheet that enabled us to connect obverse and reverse type description codes to each unique description, with columns for English, French, and German translations. The OCRE PHP script was modified to accommodate this new model. Subsequent type corpora have been published for Ptolemaic and Seleucid coinage, each with a slight variation of yet another PHP script. Furthermore, with partners in the Netherlands, Switzerland, England, and Italy deploying their own Numishare collections for type corpora and/or collections of physical specimens, the wide range of slightly different spreadsheet models require an ever-diverse set of scripts that need to be manually maintained. It has long been a goal of mine to implement a standardized spreadsheet import into Numishare itself, modeled on the XForms-based validation and transformation of Google Sheet's Atom XML API implemented several years ago in Nomisma.org.


Mapping Google Sheets columns to NUDS elements


Finally, after about a month of development and testing, a Google Sheets-based spreadsheet import is functional in Numishare. It is primarily focused on type corpora at the moment, as not all of the physical and administrative descriptors have been implemented for mapping spreadsheet column headings of numismatic objects.

Some things remain the same:
  • Typological categories must map to Nomisma URIs
  • References for physical objects can be a coin type URI of some sort, a plain literal, or a combination of a type series and type number separated by a | character. The type series must be a literal or a Nomisma URI for a type series, but I am to enable support for Zenon bibliographic URIs
  • Parent IDs (skos:broader) and deprecation-related IDs (dcterms:replaces or dcterms:isReplacedBy) must be contained in the spreadsheet.
  • A question mark can trail a Nomisma URI to denote uncertainty. This is parsed in the XForms engine to insert the appropriate uncertainty URI into the NUDS XML. 
  • Columns for symbols/monograms located at certain positions on the obverse and reverse can be mapped to the positions listed in the Numishare config.

Structured XML produced from a spreadsheet

Other types of information requirements must be met in order for the spreadsheet to validate, which means that certain data must be explicit and not automatically inserted by a script. For example, each NUDS XML document requires a title. This title was typically generated in the PHP script by some concatenation of a human-readable string with the type number parsed from an ID column. Similarly, all coin types and all physical specimens NOT linked to a coin type URI must have an Object Type URI in the spreadsheet, even if that URI for all objects is nm:coin.

All of this normalization can occur in a pre-processing phase in OpenRefine: automatic generation of titles through regular expressions, reconciliation of typological columns to Nomisma URIs through Nomisma's OpenRefine API, etc.

This new spreadsheet import also requires type descriptions to be present in the typological spreadsheet, which means rethinking the way in which descriptions are connected to the main typology spreadsheet. Instead of a separate stylesheet spreadsheet of key->pair combinations between codes and translations, this stylesheet is incorporated as a second sheet in the typological spreadsheet. It is therefore possible to create a VLOOKUP formula between the unique type description code in the typology sheet and the corresponding column in the description stylesheet (see https://docs.google.com/spreadsheets/d/e/2PACX-1vQoyHYDyh79oJuoW9m2g9BNbnysyVWjl13KQNEyTF5dgXswQwgekXMvIDTAH3onwN35c1P9eXeJAD4w/pubhtml). Therefore, the type descriptions can still be maintained with the ease of making one change to a description in Sheet #2, and the change will immediately propagate into the Atom feed.

VLOOKUP to control type descriptions

I have applied the same logic for concordances. A single concordance sheet can be maintained and propagated across multiple relevant  type corpora.

See for example the Svoronos 1904 corpus of Ptolemaic coinage: https://docs.google.com/spreadsheets/d/e/2PACX-1vSSxfdRUvq_PZOlvt3Od1T1gu29wOSQub6DwqQviq1TMRs2gDCWRA4u0i0cqHaHWchJ9Zt3pq03pc0t/pubhtml

This contains a partial concordance between Svoronos numbers and the types from Catharine Lorber's Coins of the Ptolemaic Empire vol I, part I (gold and silver from Ptolemy I - IV as published in Ptolemaic Coins Online).

By eliminating the intermediary scripting and XML upload/indexing process, scholars will be able to use OpenRefine to prepare their data without much technical intervention and publish their type or specimen data into Numishare without significant IT overhead. This alone will save me quite a lot of time: a month of development up front to save at least the same amount of time per year in redundant scripting and OpenRefine data cleaning.

After a spreadsheet is uploaded, it will be indexed directly into Solr, if the types are active (not deprecated by newer URIs) and the indexing option has been enabled.

Full documentation of the spreadsheet upload is forthcoming.

Monday, August 5, 2019

Museum of Fine Arts, Boston joins numismatic linked data cloud

The Museum of Fine Arts, Boston is the newest entrant into the Nomisma.org Linked Open Data cloud, providing data for more than 1,600 Roman Republican and Imperial coins to Coinage of the Roman Republic Online and Online Coins of the Roman Empire. The MFA's collection is particularly strong with respect to late Roman gold pieces, many of which represent the sole specimen available for that typology in OCRE.

Solidus of Constantius II (MFA 65.270), RIC VIII Rome 291.
Of these coins, roughly 1,400 are Imperial and a little over 200 are from the Republican period. The MFA's terms of service are linked from the datasets page in Nomisma.org itself and the contributors pages in OCRE and CRRO.

Data for these coins were provided by Laure Marest, Cornelius and Emily Vermeule Assistant Curator of Greek and Roman Art, and processed through OpenRefine to reconcile against the APIs available in both projects. The resulting CSV was transformed into RDF by a script I wrote and uploaded here and ingested into Nomisma's SPARQL endpoint.

Wednesday, July 24, 2019

More than 600 BnF Ptolemaic coins added to PCO

More than 660 Ptolemaic coins from the Bibliothèque nationale de France have been added into the Nomisma.org numismatic Linked Open Data cloud and are accessible through Ptolemaic Coins Online and the broader Hellenistic Royal Coinages umbrella site. There are now about 2,400 Ptolemaic coins in PCO (which includes at this phase the gold and silver coinage of Ptolemy I - IV, ca. 330-200 B.C.), and roughly 75% of these are from the BnF and American Numismatic Society. Therefore, high resolution, public domain images are available for reuse for these objects through IIIF web services. In total, 572 of 984 total Ptolemaic types are linked to at least one photographed specimen--almost 60% of the corpus in total.

Tetradrachm of Ptolemy IV, CPE I.1, 925.

Friday, June 7, 2019

Upgrades to research context in Nomisma's user interface

After several days of development, I have pushed some significant changes to the Nomisma.org user interfaces regarding additional context for certain types of entities defined in the system. Building on recent advancements that I made in increasing the complexity of typological and metrical visualizations in both Nomisma and the Numishare platform (specifically for Hellenistic Royal Coinages), I have introduced the same sorts of queries for the geographic APIs (that serialize SPARQL queries for mints, findspots, and hoards associated with a Nomisma concept into GeoJSON for display in Leaflet) and the list of related coin types.

Using the relationships inherent in Nomisma's data, we are now able to visualize the geographic distribution of corporate authorities, dynasties, and people appearing on portraits. For example, in order to generate a map illustrating the distribution of mints for the entire Seleucid Empire, the SPARQL query will search for coin types with an nmo:hasAuthority of a person who has an org:hasMembership/org:organization of http://nomisma.org/id/seleucid_empire.

In introducing this new level of complexity, I rewrote a significant portion of the pipelines underlying these URIs to migrate from a large series of hand-coded SPARQL templates in which small portions were replaced with simple string replacements to a more generalizable and flexible system built around using XSLT to generate a complex XML metamodel for a SPARQL query, which is then serialized by another set of XSLT templates into the SPARQL text that is POSTed to the endpoint.

New interface for http://nomisma.org/id/seleucid_empire, showing related coin types and a map of all Seleucid mints and known IGCH hoards and one single find.


As a result, the following improvements in mapping and/or related coin types have been applied to the following categories of SKOS concept:


Additionally, some updates were made to the distribution and metrical analysis SPARQL templates to query based on portraiture from the ID page (http://nomisma.org/id/faustina_i). This had previously not been possible--one had to use the purpose-built visualization interfaces and select "Portrait" as a facet.

The distribution of deities between Faustina and Antoninus Pius, as generated on the ID page for Faustina.

Having migrated to this new metamodel system for the generation of GeoJSON for geographic queries, it will be possible to enhance the complexity to iterate beyond queries about one particular Nomisma concept to queries that involve more than one parameter (such as those in the distribution and metrical visualization interfaces). That is to say, it will be possible eventually to not only generate a map showing the mints that produced tetradrachms, and where tetradrachms have been found in hoards, but where the tetradrachms of Ptolemy I have been produced and found. While Numishare contains a map interface that enables the display of mints pertaining to a query (driven by Apache Solr, the search index for Numishare), the indexing of findspots into Solr was disabled a year or two ago due to problems with scaling and the wait time for indexing a type corpus as large as OCRE. The next step is to rewrite the Numishare map interface to interact with Nomisma's SPARQL endpoint directly to display mints and findspots (which always reflects the current data ingested into the Nomisma linked data cloud), rather than rely on Solr.

Another major update is looming on the horizon, probably to come within the next few weeks: enhanced data for Roman Imperial persons. Presently, Hellenistic kings have been thoroughly integrated with Nomisma URIs for dynasties and corporate entities, but these are lacking in the Roman world. The Nomisma.org Roman committee is currently working on a revised spreadsheet of people in order to add new dynasties and corporate bodies into the system, as well as start and end dates for the reigns of Roman emperors. This means that we can compare visual motifs between the Julio-Claudians and the Flavians or compare the change in weights of antoniniani between the Gallic Empire and the Roman Empire (Valerian to Gallienus) over the same time period.

Additionally, dynasties and corporate bodies will be introduced as facets in OCRE. We are also working on a spreadsheet of Roman provinces, which will also be introduced as facets in addition to historical regions. The Region facet in OCRE is currently a conflation of provinces and historical regions owing to inconsistencies in RIC's structure.

Monday, June 3, 2019

ANS releases Hellenistic Royal Coinages

The American Numismatic Society (ANS) is pleased to announce the launch of a new online resource, Hellenistic Royal Coinages (HRC)(http://numismatics.org/hrc/). A National Endowment for the Humanities funded project based at the ANS in New York City, HRC is a web-based resource for users to learn about, research, and conduct different types of statistical analyses on the coinages produced by the different dynasties and rulers of the ancient Mediterranean and Near East during the Hellenistic period (ca. 323–31 BC). These include the coins struck by (and in the name of) Alexander the Great and those struck by his successors, such as the Seleucids in the Near East and the Ptolemies in Egypt.

The new HRC website serves as a Union Catalogue of existing online resources devoted to Hellenistic coinages and allows users to search across all these sites simultaneously. These sites include: PELLA (http://numismatics.org/pella/), a resource that currently focuses on the coinage in the name of Alexander the Great; Seleucid Coins Online (http://numismatics.org/sco/), a resource devoted to the coinage of the Seleucid dynasty; and Ptolemaic Coins Online (http://numismatics.org/pco/), a resource for the coinage of the Ptolemaic dynasty. In the future we hope to add additional resources for the coinages of other Hellenistic dynasties and rulers including the Antigonid, Attalid, and Bactrian dynasties.

Currently over 31,200 individual coins from seventeen institutions are illustrated and described in the HRC catalogues. While the American Numismatic Society’s collection serves as the core of all these searchable catalogues, thousands of examples are illustrated by links to coins in other major collections including those in the Bibliothèque nationale de France, the British Museum, the Münzkabinett der Staatlichen Museen zu Berlin, and other public collections in the US and Europe.

ANS Executive Director Ute Wartenberg notes that “the HRC website promises to transform the way in which scholars, collectors, and others research and learn about Hellenistic Coinages.”  

The American Numismatic Society, organized in 1858 and incorporated in 1865 in New York State, operates as a research museum under Section 501(c)(3) of the Internal Revenue Code and is recognized as a publicly supported organization under section 170(b)(1)(A)(vi) as confirmed on November 1, 1970.

Wednesday, May 15, 2019

Extending distribution and metrical analyses across corporate entities

I recently pushed some significant changes to the distribution and metrical analysis visualization features both in the Numishare platform and Nomisma.org itself to differentiate personal from corporate authories when querying typological data.

Previously, the authority could be selected as a query parameter for generating a visualization, but the underlying SPARQL query merely extracted the values associated explicitly with the nmo:hasAuthority property for coin types. This means it was impossible to compare one kingdom to another since the relationship between a type and an overarching corporate entity is nearly always made between the ruler designated as the nmo:hasAuthority and the ruler's Nomisma RDF that links the ruler concept to the corporate entity using the W3C organization ontology. For example, Ptolemy I is linked to the Ptolemaic Empire with the following model:


nm:ptolemy_i org:hasMembership ?membership .
?membership a org:Membership ;
    org:organization nm:seleucid_empire;
    org:role nm:authority.


Using this model, we are able to use the Nomisma SPARQL endpoint to extract the distinct corporate entities that minted tetradrachms with the following query:


SELECT DISTINCT ?kingdom ?label WHERE {
  ?coinType a nmo:TypeSeriesItem;
              nmo:hasDenomination nm:tetradrachm .
  {?coinType nmo:hasAuthority ?kingdom}
  UNION {?coinType nmo:hasAuthority ?auth .
        ?auth org:hasMembership/org:organization ?kingdom }
  ?kingdom a foaf:Organization ;
             skos:prefLabel ?label FILTER (langMatches(lang(?label), "en"))
}


Bear in mind that we have to use a UNION query to join coin types that may have the corporate authority explicitly expressed in the nmo:hasAuthority. This is the case for later Seleucid coinage issued under the authority of the Roman Republic.

Now that we are able to exploit the relationships between people and corporate entities in the Nomisma data, we can begin to construct new queries and visualizations across broader periods of time, for example to compare the average weights of tetradrachms issued broadly by the Seleucid vs. Ptolemaic Empires over nearly three centuries. Or to compare the distribution of deities that appear on Seleucid vs. Ptolemaic coinage (for example, http://numismatics.org/hrc/visualize/distribution?dist=deity&type=percentage&compare=authCorp+nm%3Aptolemaic_empire&compare=authCorp+nm%3Aseleucid_empire).

Distribution of deities as appearing on Seleucid and Ptolemaic coinage

Here we can see the Ptolemaic affinity toward Athena compared to the prevalence of Apollo on Seleucid coinage, at least according to the the incomplete typological data we have published from the Ptolemaic Empire (Ptolemaic Coins Online only coins the gold and silver coinage through Ptolemy IV so far). This is one of a number of recent improvements to the query mechanisms in Numishare and Nomisma, and more should be expected in the coming months, especially to include the querying of legends and monograms.

Tuesday, April 9, 2019

4,450 Seleucid coins from the Bibliothèque nationale de France added to SCO

This morning I received a new spreadsheet from Julien Olivier of the Bibliothèque nationale de France with approximately 6,500 coins connected to URIs defined in PELLA and Seleucid Coins Online. Some 2,000 Alexanders from the BnF have been incorporated in PELLA for quite some time, but we are happy to announce this latest export includes 4,450 coins from the Seleucid Empire. This nearly doubles the number of specimens available in SCO. The ANS has contributed about 4,800 itself. There are now nearly 9,700 physical coins linked to about 2,500 parent types in SCO.

Furthermore, all of the coins from the BnF are photographed and high resolution imagery is available through the IIIF protocol.

SC 379 (Antiochus I tetradrachm) is one of the best represented specimens.