<< Chapter < Page | Chapter >> Page > |
The database management system chosen for the REKn prototype was PostgreSQL. As a standard system commonly used by the academic community, PostgreSQL allows for future collaboration with other researchers and integration with other projects. PostgreSQL’s open source status caters to the possibility of writing custom functions and indexes that cannot be supplied by other means. Moreover, PostgreSQL offers scaling and clustering of database systems and the data in the systems. Redundancy is also possible with PostgreSQL—that is, if one server in a cluster crashes, the others will continue processing queries and data uninterrupted.
A similar rationale dictated writing the web service in PHP, since PHP is a commonly used and well-understood framework for database access via the Internet, in addition to being open source. The data-entry application is likewise based on Perl scripts to use the web service as a database access proxy, since in addition to being open source software, Perl is well suited for string processing.
The gathering of primary materials for the knowledgebase was initially accomplished by pulling down content from open-access archives of Renaissance texts, and by requesting materials from various partnerships (researchers, publishers, scholarly centers) interested in the project. These materials included a total of some 12,830 texts in the public domain or otherwise generously donated by EEBO-TCP (9,533), Chadwyck-Healey (1,820), Text Analysis Computing Tools (311), the Early and Middle English Collections from the University of Virginia Electronic Text Centre (273 and 27 respectively), the Brown Women Writers Project (241), the Oxford Text Archive (241), the Early Tudor Textbase (180), Renascence Editions (162), the Christian Classics Ethereal Library (65), Elizabethan Authors (21), the Norwegian University of Science and Technology (8), the Richard III Society (5), the University of Nebraska School of Music (4), Project Bartleby (2), and Project Gutenberg (2). A master list of the primary text titles and their sources is included as Appendix 2. The harvesting and initial integration of these materials took a year, during which time various formats of almost 4 gigabytes of files were standardized into a basic TEI-compliant XML format. Roughly a dozen different implementations of XML, SGML, COCOA, HTML, plain text, and more eclectic encoding systems were accommodated.
For example, accommodating the XML TEI P4 conforming documents obtained from the University of Virginia Electronic Text Center’s Early English Collection required the following three-step process:
Notification Switch
Would you like to follow the 'Online humanities scholarship: the shape of things to come' conversation and receive update notifications?