Using a combination of the University of Cambridge Fitzwilliam Museum's API to query for Roman Republican and Alexander the Great coins and PHP-based screen-scraping of reference numbers and measurement data, I was able to harvest more than 3,500 coins from the Fitzwilliam the other day. The script, which I have published to Github, took a few hours to write and another hour or so to fully execute.
The script queries the API and iterates through each page of the JSON response. Since coin type reference numbers are not stored or indexed into their ElasticSearch application, the script must request each HTML page from their public-facing database. With some XPath to parse the HTML and regex to look for RRC or Price numbers, matching numbers are checked against CRRO and PELLA to confirm the validity of the reference.
The metadata for each record are stored in an array, which is serialized into Nomisma-compliant RDF and a CSV concordance list when the process completes.
I hope to tweak the script and re-apply to Cambridge's Roman imperial coins. This will be a tremendous enhancement to OCRE, as we look to crossing over the 100,000 coin threshold in that project alone.
The script queries the API and iterates through each page of the JSON response. Since coin type reference numbers are not stored or indexed into their ElasticSearch application, the script must request each HTML page from their public-facing database. With some XPath to parse the HTML and regex to look for RRC or Price numbers, matching numbers are checked against CRRO and PELLA to confirm the validity of the reference.
The metadata for each record are stored in an array, which is serialized into Nomisma-compliant RDF and a CSV concordance list when the process completes.
I hope to tweak the script and re-apply to Cambridge's Roman imperial coins. This will be a tremendous enhancement to OCRE, as we look to crossing over the 100,000 coin threshold in that project alone.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.