Thursday, January 7, 2016

OCRE at the AIA/SCS in San Francisco

Andrew Meadows, one of the project managers for the Online Coins of the Roman Empire project, was in San Francisco today to present on a panel of NEH-funded projects in the archaeological realm, as part of the larger AIA/SCS conference. It generated a string of tweets (and replies) from Eric Kansa of OpenContext, which you can read at https://twitter.com/search?f=tweets&vertical=default&q=%23aiascs%20%40menetys&src=typd. Andy's presentation is more or less represented on Google Docs: https://docs.google.com/document/d/1QXSeOSNnV6-Zxe3dU_desqouTDcIEPIcT_XySZ8gMiA/edit. The presentation is effectively a summary of the state of the digital world of numismatics, including involvement in Pelagios and a crowdsourced coin identification project with MicroPasts to facilitate future integration of Portable Antiquities Scheme coins into OCRE.

Thursday, November 19, 2015

The American Numismatic Society Announces the Launch of PELLA




The American Numismatic Society (ANS) is excited to announce the launch of its latest digital platform, PELLA (numismatics.org/pella/), an important new research tool for ancient Greek numismatics that provides a comprehensive, easily accessible online catalogue of the coinage produced by the kings of the Macedonian Argead dynasty (c.700–310 BC). Cataloguing the individual coin types of the kings from Alexander I (ruled 498–454 BC), the first of the Macedonian kings to strike coins, down to Philip III Arrhidaeus (ruled 323–317 BC), PELLA allows users to conduct research on specific types, view examples from multiple collections, conduct statistical analyses of weight and other measurement data, and see maps of where the type was minted and where examples have been found in hoards.

As a linked data project, PELLA connects to the relevant pages within the ANS's collection website, MANTIS (numismatics.org/search/), as well as Inventory of Greek Coin Hoards Online (coinhoards.org), and incorporates material from other public collections. The current version of PELLA provides links to examples of the coinage (in the name) of Alexander the Great and Philip III Arrhidaeus present in the ANS collection, the M├╝nzkabinett of the State Museums of Berlin, and the British Museum totaling nearly 10,000 examples of individual coins. The current version of PELLA uses the numbering system and typology originally created and published by Martin Price in The Coinage in the Name of Alexander the Great and Philip Arhidaeus, London 1991, with the addition of modifications that greatly enhance the volume’s usefulness as an online resource.

PELLA is made possible by stable numismatic identifiers and linked open data methodologies established by the Nomisma.org project. Coin type data are made available with an Open Database License.

Dr. Peter van Alfen, Margaret Thompson Associate Curator of Greek Coins, commented on the announcement. "The Macedonian kings of the Argead dynasty struck arguably the most influential coinages of the ancient Greek world, so it’s appropriate that our first digital project in Greek numismatics focuses on their coinage. We also wanted to provide a specific platform for facilitating research on their coinages, particularly since the ANS holds one of the largest and most important collections of Argead coinage in the world. By being able to link to other important collections, the research potential is significantly enhanced. The ANS is committed to enhancing its online presence and digitizing its collection - PELLA is another example of our progress, and we are proud it well help educate those with general numismatic interest as well as academic researchers.”

Friday, November 13, 2015

Aurelian, Tacitus, and Florian Added to OCRE

Three new emperors from RIC V have been added into Online Coins of the Roman Empire: Aurelian, Tacitus, and Florian. This accounts for about 800 new coin types. Additionally, both the Berlin and British Museum collections have been reprocessed to link to these newly-minted URIs. A handful of coins from Berlin have been added, and about 200 coins from the British Museum have been made available. This is the first time the BM coins have been added since the introduction of emperors from RIC V, so the range is from Valerian to Florian.

While the BM has a tremendous number of Crisis of the Third Century coins, not all of their radiates consistently reference RIC numbers, since RIC is pretty out of date with respect to coinage from this period.

We expect to publish a major update to the ANS collection soon. Thousands of coins have recently been photographed, and these images should make their way onto Mantis as early as next week.

Friday, November 6, 2015

Using XForms to transform Google Spreadsheets into RDF

With XML Amsterdam happening and reading about a lot of the XForms action happening at the conference, it has occurred to me that I haven't written any blog posts in the last few months about technical advances in Nomisma or other projects (like the Digital Library application, which features an XForms interface for MODS that accepts a PDF upload, and sends the PDF into Solr for full-text indexing, as well as dynamically generates EPUB files from TEI). The ETDPub application will get a full technical write-up in a journal eventually, probably code4lib.

VoID CRUD

On Nomisma, I worked on a few new features in the backend that have greatly reduced my workload. First, I finally implemented a system that makes use of the VoID metadata RDF for data dumps that are contributed into the Nomisma SPARQL endpoint to facilitate the aggregation of coins for OCRE, CRRO, and our other large projects. The VoID RDF is validated to ensure it contains the required title, description, license, etc., and if valid, the data dump is ingested into the SPARQL endpoint. The dump can be refreshed with a click of a button, or a dataset can be removed from the triplestore entirely by passing in a SPARQL/Update query:

PREFIX nmo:    <http://nomisma.org/ontology#>
PREFIX void:    <http://rdfs.org/ns/void#>
PREFIX dcterms:    <http://purl.org/dc/terms/>
DELETE {?s ?p ?o} WHERE {
{ ?object void:inDataset <DATASET> ;
  nmo:hasObverse ?s . ?s ?p ?o }
UNION { ?object void:inDataset <DATASET> ;
  nmo:hasReverse ?s . ?s ?p ?o }
UNION { ?object void:inDataset <DATASET> ;
  dcterms:tableOfContents ?s . ?s ?p ?o }
UNION { ?s void:inDataset <DATASET> . ?s ?p ?o}


This is a fairly simple workflow, but it isn't yet complete in that it only accommodates RDF/XML at the moment (need to expand for Turtle and JSON-LD), nor does it validate the data dumps. This saves me a lot of time in that I can simply click a button in the user interface to re-ingest a dump when new matches are made between coins in that dump and new types published in OCRE--or simply refresh the OCRE dump when we publish new types. Before, I used to have to shut down the triplestore for a few minutes, delete the data directory, and then manually upload each RDF dump into the triplestore via command line.

Google Spreadsheets to RDF


The other really significant advancement has reduced my workload significantly with respect to batch publication of new concepts (as RDF) into Nomisma. I would occasionally receive spreadsheets of data to upload into Nomisma, which required me to author a PHP script to transform CSV into RDF, and there were invariably validation problems in the original data.

I spent 1-2 weeks developing an XForms application that could read a Google Spreadsheet (published as an Atom feed) in order to validate the data and import as RDF. First, one begins with a spreadsheet like this.

The user will be presented with an interface like the one below:


The user may map the spreadsheet headings to allowable RDF properties. There are some basic requirements--that there be a Nomisma ID that conforms to xs:anyURI, that there be one English preferred label, one English definition, that there may be no duplicate languages for preferred labels or definitions. There must be both a latitude and a longitude if uploading mint IDs. That sort of thing. The full list of allowable properties and more specific instructions are at https://github.com/nomisma/framework/wiki/Import-Update-IDs.

After selecting a valid data mapping (XForms bindings), the user may proceed to the next screen, which then validates each row in the spreadsheet to ensure the data values conform to other bindings. For example, there cannot be blank values for English preferred labels, and skos:exactMatch, skos:closeMatch, skos:broader, and the like must be URIs that begin with the https?:// regular expression (XPath matches() function). If everything is valid, the XForms engine will transform the Atom XML into the appropriate RDF/XML model, save to the filesystem (for versioning in Github), post to the SPARQL endpoint, and then transform the RDF/XML into an XML document for indexing into the Solr search index.

There's a neat additional feature that executes when there's a skos:closeMatch or skos:exactMatch with a wikipedia or dbpedia URL. An XForms submission executes that queries Wikidata based on the article title to extract titles in other languages (essentially facilitating multilingual interfaces by mapping alternate languages into skos:prefLabel in RDF) as well as matching concepts in other vocabulary systems, like VIAF, the Getty AAT/TGN/ULAN, Geonames, etc. In the end of this process, we can some pretty sophisticated RDF that can link people to URIs in other systems and model their relationship to a political entity or dynasty with the org ontology, e.g., http://nomisma.org/id/muhammad_ahmad_al-mahdi.

We have created nearly 1,000 new Nomisma concepts this summer through this new spreadsheet import mechanism--a great investment in two weeks' of labor to free me from having to write data processing scripts and pushing the responsibility of creating and updating IDs to the numismatic subject specialists.

This import mechanism is open source of course: https://github.com/nomisma/framework/blob/master/xforms/import.xhtml

Friday, October 23, 2015

More than 700 Greco-Roman mints updated in Nomisma

Thanks to Ryan Baumann's work of creating a concordance between geographic identifiers in the Pleiades Gazetteer of Ancient Places and the Getty Thesaurus of Geographic Names, Dan Pett of the British Museum was able to build on this work to incorporate these concordances into the Portable Antiquities Scheme database. Dan's Nomisma-Pleiades-TGN concordance R script is on Github.

Dan then emailed the Nomisma listserv with a large CSV document of all mints in the PAS database, with associated Nomisma IDs, Getty, BM, Geonames, dbPedia, Pleiades, etc. I stripped away all of the mints that don't already have Nomisma IDs so that I could upload the CSV into Google Sheets, which then makes it possible to import data from the Atom representation of this spreadsheet into the Nomisma RDF. I expanded all of the concordance ID columns into full URIs for the Nomisma spreadsheet validation process, and then successfully updated 721 Greco-Roman mints to add Getty, BM, Geonames, and dbPedia URIs as skos:closeMatch objects. Further, the spreadsheet import process parsed the dbPedia URIs to perform Wikidata lookup, enabling us to add further concordances extracted from Wikidata--including the Wikidata URI itself, plus GND, BnF, and Freebase identifiers. The Wikidata lookup also adds additional translations as skos:prefLabels in from article titles in other languages.

As a result, we have added more than a dozen new translations for Zeugma and a few additional URIs.

Wednesday, October 14, 2015

ANS Launches Online Catalogue with Dar al-Kutub, the Egyptian National Library

The American Numismatic Society (ANS) is pleased to announce, in collaboration with Dr. Jere Bacharach, Department of History at the University of Washington, and Dr. Sherif Anwar, College of Archaeology, Cairo University, the digital publication of the non-hoard numismatic collection of the Egyptian National Library (http://enl.numismatics.org).

The catalog consists of more than 6,500 objects, ranging from late Roman glassware and pre-Islamic Sasanian coinage to the modern Egyptian coinage of Anwar Sadat. The collection is particularly strong in Medieval Islamic coinage across all major dynasties. The catalog differs from its predecessors in a number of ways. The collection has been photographed in color, with inscriptions read and transcribed from these images. The database includes references to the 1982 catalog of the collection undertaken by Dr. Norman D. Nicol.

The interface is available in both English and Arabic, owing to translations provided by Dr. Sherif Anwar. The multilingual interface is driven by numismatic concepts defined by Nomisma.org. Over the course of this project, more than 700 Islamic entities—people, dynasties, corporate entities, mints, etc.—were created in Nomisma, with labels in English, Arabic, and other languages, forming the technical foundation for the aggregation of other Islamic numismatic collections. Geographic coordinates have been included for the majority of Islamic mints, permitting the mapping of the Egyptian National Library collection.

According to Ethan Gruber, the ANS Director of Data Science, "the effort undertaken in defining Islamic entities in a Linked Open Data environment will make it possible to improve the Islamic department in the ANS database, and may make Islamic type corpora similar to Online Coins of the Roman Empire (http://numismatics.org/ocre/) possible in the future." Like other ANS digital projects, the data are freely available with an Open Database License, and are published in the Numishare framework.

The ANS acknowledges the contributions of the individuals who are named at http://enl.numismatics.org/pages/acknowledgments.

(Image information: Glass – Mamluk, Sultanate of Egypt, CE 1250-1517.6057, Egyptian National Library)

For more information contact Joanne Isaac at 212-571-4470 ext. 112 or isaac@numismatics.org.


The American Numismatic Society, organized in 1858 and incorporated in 1865 in New York State, operates as a research museum under Section 501(c)(3) of the Internal Revenue Code and is recognized as a publicly supported organization under section 170(b)(1)(A)(vi) as confirmed on November 1, 1970.

Friday, October 2, 2015

On Open Data and Numismatic Typologies

edit (2 October 2015, 4PM): I want to make it clear that we have been collaborating with numerous members of the Coins and Medals departments for several years now on a few digital projects, including building a close relationship with the Portable Antiquities Scheme. Data usage concerns have been expressed by a small handful of individuals at the British Museum and are not, as far as I can tell, driven by the Trustees of the British Museum.
 

Can the British Museum make their data available with a Creative Commons license, but then restrict how the data are used?


The short answer is yes.

But the long answer in this case is a bit more complicated. The British Museum has authorized the reuse of their data and images under a CC 4.0 BY-NC-SA license, meaning that anyone has the right to use these data for non-commercial purposes as long as the BM is attributed and the creative works derived from these data and images and likewise freely and openly shared. ANS collaborative projects have always adhered to these requirements. For OCRE, CRRO, and PELLA, we have extracted data from the British Museum SPARQL endpoint and transformed these data into the Nomisma ontology. The full list of datasets are available at http://nomisma.org/datasets, and so one may download the entire BM RDF data dump at once or extract any associated data via the Nomisma SPARQL endpoint. Individual coins are also attributed to their collection throughout the various interfaces in our digital type corpus projects.

As the British Museum license currently stands, we (or anyone) have the right to use these images and data in this manner, without the need to ask the BM permission to do so.

Only if the BM changed their license to the more restrictive ND (No Derivatives) would they be able to exert absolute control over the reuse of their data. This means that the public can only download a dump of their CIDOC-CRM RDF in N-Quads. It would not even be permissible to transform these data into RDF/XML for XSLT processing. One could not match the places in their thesaurus to Pleiades URIs and transform the CIDOC CRM into the Open Annotation model used for the Pelagios project. One could generate CSV out of the data to load into Open Refine, Google Fusion Tables for visualization, or to analyze data with R. Of course, a CC ND license would obliterate any potential for reuse of British Museum data, and this is certainly why they have not sought to place this draconian license on their data.

What does this have to do with typologies?


All of the numismatic data in the British Museum SPARQL endpoint are open, and nearly every individual specimen contains at least one reference to a coin type number. By poking around the BM data, I was able to figure out that the reference URI containing 'GC30' as a short title refers to Price's The Coinage in the Name of Alexander the Great and Philip Arrhidaeus. I developed a simple SPARQL query that allowed me to extract a list of nearly 3,000 coins from the British Museum that contained Price references. One could extend this query to gather a list of unique Price references rather than objects, and therefore anyone would be able to generate a significant portion of the typologies from the Price catalog. Now, this catalog would not be complete because Price derived some of his typologies from other collections, such as the American Numismatic Society. The BM endpoint also does not contain a full account of all Alexander coins in the BM collection.

However, these typologies from Price can be derived from descriptions of individual specimens, and the BM CC 4.0 BY-NC-SA license still applies. This begs the question: can the British Museum exert copyright control over typologies published in print when these same typologies can be freely and openly derived from its own collection database?

In fact, it would be possible to derive other typologies not under British Museum copyright by the same mechanisms. The same goes for the ANS database, which is freely and openly available with an Open Database License. Can the British Museum and ANS even include type numbers within their public databases if it is possible to derive typological data that might be under copyright of another publisher? In the United States, data aren't even copyrightable. And the use of reference numbers in databases, specifically, falls within the realm of Fair Use. If we begin to debate whether or not type numbers may even be referenced on the Web, the only real loser in this debate is the general public.

Proof of Concept: Seleucid Coinage, an American Numismatic Society publication


The URI for Houghton and Lorber's Seleucid Coins: A Comprehensive Catalogue Part I Volume I is http://collection.britishmuseum.org/id/bibliography/6336.

Poking around at the CIDOC CRM structure of the coins associated with SC references, I constructed a SPARQL query that would extract most of the typological data from the endpoint. It is a bit messy, as the SPARQL XML response tends to be with the expression of triples, so I took this XML response and wrote some basic XSLT to convert the response into CSV that better reflects individual typologies.

There are only 11 Seleucid coins in the BM system with Houghton and Lorber 2002 references, but I was able to generate a CSV file for all of the typological data for the SC types. The metadata are strings, but one could easily drop this CSV into Google Spreadsheets to clean up. If we were dealing with a typological dataset that consistent of thousands of types, it could be cleaned up in Open Refine in, probably, less than an hour to fully link all concepts to Nomisma URIs.

I have written a number of PHP scripts (e.g., like this one) to transform CSV into NUDS, and so one of these scripts could be adapted to transform the Nomisma-linked CSV into NUDS for direct publication in Numishare. It is possible to go from BM SPARQL queries to a fully-functional digital type corpus like OCRE in about a day's worth of work.

So basically, what I have done here is use the BM SPARQL endpoint to extract open data that comprise typologies that have been published by the ANS and are under ANS copyright. I mean, who cares, right?