<< Chapter < Page | Chapter >> Page > |
But there is a world of difference between being really available, really accessible, really reusable, really capable of elaboration and free republication, and being so in theory only. The difference is the interface. Although Bagnall does not say this, really open data actually does depend on an interface: but an interface very different from the interfaces we have seen up to now. I agree with Bagnall that the interfaces provided by projects are the enemy: they lock away the data in silos. Worse, as the interfaces die, the data locked in them dies too. As interfaces are far the most vulnerable of any aspects of a website to decay, with bits falling off them every time a browser or operating system updates, this is a major problem. So, Bagnall is quite right to assert that we must allow anyone who wants to write an interface. But he does not spell out how this is to be done. He speaks in the sentences cited above of “data and code” being “fully exposed.” What does exposed mean? And where will the data be? And I am somewhat puzzled by the reference to “code” here (unless, of course, we are speaking of the XML encoding within or attached to the data, which makes it part of the data itself).
Here I am pleased to say: I think we are ahead of Bagnall, in developing an architecture for really open data. “Exposed” means an interface: but not an interface such as those we see everywhere. Instead, in the architecture we are developing for the workspace for collaborative editing, the interface is metadata, so constructed as to allow intelligent navigation of the data. A full description of this lies beyond the scope of this paper: Federico Meschini and I will be presenting it as a paper at the next Digital Humanities conference in London. Briefly: Federico and I have developed an ontology of works, documents, and texts, which allows us to identify precisely, down to the level of the individual mark, exactly what texts of what parts of what works are found in just what documents, and exactly what web resources there are out there relating to those texts. Following the lead of NINES (and many others) we have implemented this ontology in OWL (Web Ontology Language) as RDF subject-predicate-object statements, as follows:
The work the Canterbury Tales contains the General Prologue, line 1
The document the Ellesmere manuscript, page 1r, contains an instance of the text of the General Prologue, line 1
The web address (External Link) contains a transcript of the instance of the text of the General Prologue, line 1, as it appears on folio 1r the Ellesmere manuscript.
The web address (External Link) contains an image of folio 1r of the Ellesmere manuscript
Statements such as these, retrieved (let us say) from an RDF store using SPARQL or some equivalent technology, will allow a web browser to find (for example) all pages of manuscripts containing the first line of the Canterbury Tales ; then to find images of all these pages; and then to find transcripts of all those lines in those manuscripts, etc. I should add too that we have designed this system to be compatible with the major existing systems of cataloguing documents, works and texts, particularly the FRBR and CIDOC CRM schemes. Thus (as we have imagined it) you could find the Canterbury Tales and thence all the resources relating to it, down to the individual transcript of this line in this manuscript, through your online catalogue.
Notification Switch
Would you like to follow the 'Online humanities scholarship: the shape of things to come' conversation and receive update notifications?