<< Chapter < Page | Chapter >> Page > |
The digital material generated from and used by academic and other research is to an increasing extent being held in formally managed digital repositories. Digital repository systems arose in the self-archiving community – for example, arXiv and Cogprints , the latter of which gave rise to the EPrints repository software – and in their earlier incarnations they were used to manage relatively simple content, primarily pre-prints and post-prints, sometimes less formal material such as presentations or lecture notes. A major motivation in setting up and populating such repositories was (and continues to be) to make the results of research available to a wider audience, by encouraging or mandating deposit and open access principles. In any case, from the point of view of the system these were individual objects, unrelated except via having metadata fields in common.
However, digital repositories have been changing, both in the type of content that they hold, and in the ways in which they are used; indeed, these two things are connected. Repository software has become more sophisticated, allowing complex digital content to be stored in such a way that its internal structure and external context can be explicitly represented, managed and exposed. Institutions are beginning to use them to manage research data in a variety of disciplines, including physical sciences, social sciences, and the arts and humanities, in part as a result of various programmes funded by the Joint Information Systems Committee ( JISC ) in the UK.
Such systems allow us to move on from the model of a stand-alone repository, where objects are simply deposited for subsequent access and download. Instead, researchers are developing more sophisticated models in which repositories are integrated components of larger infrastructures, incorporating advanced tools and workflows. They are being used to model complex webs of information and capture scholarly or scientific processes in their entirety, from raw data through to final publications.
Within e-Science communities, much of the focus regarding data management has been on techniques for the efficient organisation of and access to large and distributed data sets, an issue that has been well addressed by various flavours of grid middleware. The particular challenge raised here, however, is not just size, but rather the very nature of the data, which can be highly diverse, complex, fuzzy and context-dependent, as well as the highly interpretative character of research in many disciplines, for example the humanities.
Another issue to be addressed is the silo mentality. Even if data is held in formally managed digital repositories, these are often managed on an institutional basis, resulting in information that is widely dispersed and not easy for researchers to locate and access. Although the repository content is in principle accessible via the internet, it is often held at a “deep” level that is not amenable to traditional discovery techniques. If, as we expect, digital repositories take on a central and pivotal role in the research lifecycle, then there is a clear strategic need to develop methods and tools to enable collaborative research through the coordination and federation of such complex and dispersed resources. This chapter will present case studies of repositories to show the range of ways in which they are used.
Notification Switch
Would you like to follow the 'Research in a connected world' conversation and receive update notifications?