<< Chapter < Page | Chapter >> Page > |
The proof-of-concept build could display text data in a variety of forms (plain-text, HTML, and PDF) and display images of various formats ( Figures 3 and 4 ). Users could zoom in and out when viewing images, and scale the display when viewing texts ( Figure 5 ). If REKn contained different versions of an object—such as images, transcriptions, translations—they were linked together in PReE, allowing users to view an image and corresponding text data side-by-side ( Figure 6 ).
This initial version of PReE also offered composition and communication functions, such as the ability for a user to select a portion of an image or text and to save this to a workflow, or the capacity to create and store notes for later use. Users were also able to track their own usage and document views, which could then be saved to the workflow for later use. Similarly, administrators were able to track user access and use of the knowledgebase materials, which might be of interest to content partners (such as academic and commercial publishers) wishing to use the data for statistical analysis.
After the success of our proof of concept, we set out to imagine the next steps of modeling as part of our research program. Indeed, growing interest amongst knowledge providers in applying the concept of a professional reading environment to their databases and similar resources brought us to consider how to expand PReE beyond the confines of REKn. After evaluating our progress to date, we realized that we needed to take what we had learned from the proof of concept and apply that knowledge to new challenges and requirements. Our key focus would be on issues of scalability, functionality, and maintainability.
In the proof-of-concept build, all REKn data was stored in binary fields in a database. While this approach had the benefit of keeping all of the data in one easily accessible place, it raised a number of concerns—most pressingly, the issue of scalability. Dealing with several hundred gigabytes is manageable with local infrastructure and ordinary tools. However, we realized that we had to reconsider the tools when dealing in the range of several terabytes. Careful consideration would also have to be given to indexing and other operations which might require exponentially longer processing times as the database increased in size.
Even with a good infrastructure, practical limitations on database content are still an important consideration, especially were we to include large corpora (the larger datasets of the Canadian Research Knowledge Network were discussed, for example) or significant sections of the Internet (via thin-slicing across knowledge domain-specific data). Setting practical limitations required us to consider what was essential and what needed to be stored—for example, did we have to store an entire document, or could it be simply a URL? Storing all REKn data in binary fields in a database during the proof-of-concept stage posed additional concerns. Incremental backups, for example, required more complicated scripts to look through the database to identify new rows added. Full backups would require a server-intensive process of exporting all of the data in the database. This, of course, could present performance issues should the total database size reach the terabyte range. Equally, to distribute the database in its current state amongst multiple servers would pose no mean feat.
Notification Switch
Would you like to follow the 'Online humanities scholarship: the shape of things to come' conversation and receive update notifications?