Author: Dominic Oldman
Version: First Draft / Work in progress
Status: Confidential - not for further distribution
The ResearchSpace (RS) project has completed an initial requirements gathering exercise in consultation with stakeholders. The next stage of the RS project is to create a detailed functional and technical specification based on those requirements. However, the RS concept itself is well developed and therefore it is possible to provide a realistic idea of how current business requirements would be converted into practical collaborative research tools. This document provides an overview of how RS would work in practice and, as a result, will help stakeholders to provide the additional feedback needed to complete a full specification.
ResearchSpace in Practice
RS combines research tools with social networking applications such as forums, wikis and blogs. The research tools are launched from a custom toolbar embedded into the collaboration application itself and therefore they are easily incorporated into a project’s workflow . Essentially these research tool ‘plug-ins’ provide an additional layer of functionality so that data and digital assets can be searched and edited while engaging in more informal collaboration and communication activity. Conversely, it is the collaboration activity that creates opportunities for more formal data recording.
Setting up a ResearchSpace Site
A project researcher would set up a new project site using a site ‘wizard’ in the same way that, for example, Sakai administrators set up a Sakai course site. This wizard allows the site creator to select various site tools required for their project, as well as configure business and security rules. For example, the wizard will provide different collaboration tools options and also allow the configuration rules for their use. This might be to enforce, for example, a mandatory metadata input form that contributors are required to complete when uploading a new digital asset to the site or ensure that activities are completed in a particular sequence.
Annotating an Image – Example 1
When contributing information within a collaboration tool the RS toolbar will be available alongside the normal collaboration editor tools. The example below of creating annotations against points or regions on an image illustrates the way in which many research tools would operate. In this case the process would be something like this;
1. Click the “Annotate Image” option on the RS toolbar
2. Use the search screen to locate the image using either image or object metadata
3. Select the correct image from the image light box results pane and choose “Annotate”
4. Select a point or area on the image and type the annotation
5. Click save and choose to import to the current collaboration editing area
The image would be embedded into the discussion board (and acts as a link back to the annotation editor) with the annotations for other to see. The annotations would be saved to the database together with the author’s information for attribution.
Viewing formal work
Within a collaboration environment like a discussion board, changes to data and digital assets are mixed in with more informal discussion and notes. However RS might provide a view of all the formal edits that have been saved to the database so that these can be grouped and reviewed independently. This would happen something like this:
1. Select a discussion thread
2. Choose “view data and image edits”
3. Search the view to locate the edit you are interested in and select it
4. Create your own annotation or start a new discussion thread based on the selected data
All other tools such as image comparison, data linking, semantic search, etc will be available in the same way from the RS toolbar. As new tools are added these will also appear on the toolbar. Not all tools will work within a particular collaboration application and will be ‘greyed out’ as appropriate. Other more general tools will be available for more administrative or publishing purposes outside the collaboration applications.
Image Compare – Example 2
Image comparison would work in a similar way to the image annotation tool above except that images
could be overlayed and filters applied before annotations are applied. This would be based on pixel differences between the two images in the overlay. Again, users would be able to launch the search plug-in within a particular project collaboration area, locate and select the images that they wish to examine and then launch the image comparison tool. Notes and annotations would be saved to the database and be available to other users. The activity would be recorded in the project log which could be browsed and searched also allowing others, with the appropriate rights, to make additional edits of the same overlay.
ResearchSpace applied to British Museum Research Projects
The BM Research Portfolio
The British Museum has approximately 78 ongoing collaborative research projects . These project study cultures and objects from around the world and are broadly divided into the following groups:
4. Collaborative doctorial awards
A British Museum team, sponsored by the author, are currently conducting a one year research project into how these curatorial projects operate and, in particular, are investigating the effectiveness of the Museum’s curatorial staff in collaborating with other organisations who may be providing resources, expertise and data. The 78 projects represent partnerships and collaborations with a large number of other organisations and the length of these projects can range considerably, all the way up to 10 years and beyond, although five years is more common.
Case Study – Naukratis
The Naukratis Project is essentially a project to catalogue and digitise the Naukratis pottery collection at the BM as well as the Naukratis material at the Ashmolean Museum, currently on loan to the BM. In addition, the project aims to collate and integrate identified Naukratis object and image data (pottery and non-pottery objects) from other external bodies into a centralised data resource on the BM website for access by the general public and scholars worldwide. The timetable and scope of the project will be adjusted according to the availability of external funding.
The total number of objects across all the collaborating institutions is estimated at 12,000, and 7,000 of these are already recorded on the British Museum’s ‘Merlin’ collection system. In total there are over 50 organisations who could potentially contribute to this data. A list is supplied at Appendix 3. These museums and other cultural institutions hold anything from a few items to over 200. Many do not have any electronic record of the object. This list of contributors include: The Museum of Classical Archaeology, Cambridge; The Louvre, Paris; Akademisches Kunstmuseum, Bonn; Boston MfA; Nicholson Museum, Sydney; and so on.
Case Study - Halaf –A period site in Turkey
This project is primarily research in the field, gathering of new data that helps to understand the BM Halaf –period collection. New archaeological research will look at the time span of the Halaf period. The Domuztepe chronology and geographical interaction with the south (Mesopotamia) and eastern Turkey will provide a better understanding of this early period. The project initially aims to publish details of the excavations in three in agreement with the University of California at Los Angeles (UCLA).
Issues arising from case studies
Although the analysis into Museum research projects is still underway there are some common themes and issues arising from interviews conducted with BM curators leading the BM’s research projects. These come under the following headings:
Participating organisations would find it far easier to contribute data to the project through a central web portal. Currently, in one particular project, an MS Access database is sent to organisations which (due to lack of technical support) has no mechanism for controlling terms or validating data. A large part of the project funding is used to employ people for data input and data cleansing. Many projects have to send data through the post using USB sticks and CD-ROMs. Simply putting together a project presentation can mean emailing and posting material between researchers in different organisations. Automated integration of data from different sources is very rare.
Communicating with a large number of participating organisations is extremely difficult and other organisations can often feel that, although they are being asked to contribute to a project, they are not really part of it. This may affect the motivation of institutions to give their time. An environment that allowed all organisations to participate on an equal footing would provide better motivation to achieve objectives in full. This would be particularly important for projects operating in the developing world where communication links are of variable quality during the day. The ability for institutions in these locations to choose when to access information would be a large improvement in itself.
Project find it increasingly difficult to store the data from their work, particularly as large digital assets become an important part of research. In many project s there is a real risk of data loss because of varying standards applied to storage, backup and disaster recovery.
Projects that are lucky enough to have dedicated hosting facilities are rarely able to go much beyond simple shared databases and invariably much of the data analysis work is conducted locally in different locations, not necessarily using a consistent set of tools or methods. The effectiveness of any shared environment can be dependent upon IT support and the lack of it can, and does, have a large impact on project progress.
Often researchers are dependent on finding free tools on the internet which are not necessarily the best for the job and mean that the projects are partly dictated by unsuitable software rather than the software that fits the requirements of the project. In addition, free hosted software tools are often spread across different sites meaning that the research infrastructure can be distributed over many different environments with little or no integration.
Some funding authority will not, as a matter of policy, fund IT tools and infrastructure, and expect organisations to provide IT resources themselves. In any event, many academic staff who submit funding applications do not request sufficient funds to cover future IT costs due to lack of IT knowledge.
ResearchSpace and the Mellon Museums and Art Conservation (MAC)
The Mellon MAC prototypes are:
• The Cranach Digital Archive
• The National Gallery’s Raphael Research Resource,
• The Courtauld’s Master of the Fogg collaborative research system
• TheRKD’s online Rembrandt resource.
The BM project examining internal research projects also aims to interview projects managed by other institutions and this will include the MAC prototype projects. Experience of these projects to date suggest that some of the issues and problems facing BM research projects are common place and therefore will be reflected to some degree in these initiatives.
In these cases the projects are currently waiting (after agreeing to forego a certain amount of IT funding) for the development of an IT infrastructure to support continued work. The expectation has been that IT infrastructure support would be provided by the development of the Mellon’s, ‘Shared Infrastructure’ proposed in September 2009 – this proposal was the original blueprint for ResearchSpace.
Impact of ResearchSpace
The availability of a system that can support many different collaborative projects is likely to have a significant impact on the way in which they are initiated and managed. The availability of the British Museum’s data within the ResearchSpace environment means that many important research projects can be started by academics, independently of the British Museum, with relatively small start up costs. This supports the British Museum’s strategic objective of being a, “Museum of the world, for the world”.
By bringing British Museum research projects into the ResearchSpace environment the system would be exposed to a large number of museums and cultural heritage organisations at an early stage and therefore, by virtue of the valuable data store contained within it and a consistent research toolkit, would be self promoting and likely to attract large numbers of subscribers ensuring its continued sustainability.
Annex 1 - ResearchSpace Environment Overview
RS would provide and interface between, semantic data storage, collaboration applications and research tools. These features can be used independently but to maximise the benefits of RS they should be used together as complimentary and integrated elements. The following provides more detail on how these components fit together.
Independently, RS data sources can be accessed by third party software tools that conform to the same open standard. This open standard is based around the Resource Description Framework (RDF) and by hosting data in this format on ResearchSpace, stakeholders (and others that are able to access the data) can query and reuse the data from remote locations. For example, the conservation and scientific data available on the British Museum’s Collection Online system is served from a remote RDF database hosted by the University of Southampton.
A separate integrated digital asset management system would also be available and interface with RS data. Image metadata would also be stored in the RDF format.
Collaboration tools would be provided by a core Content Management System that provides social networking tools such as central document libraries, discussion boards, wikis and blogs (amongst other collaboration applications).These tools, by themselves, can be used for normal collaborative communication purposes. This is already the case with the Courtauld research system that uses a discussion board as the core collaboration tool. However, when linked to the RS data repositories, these collaboration applications become more powerful channels for collaborative research and can be extended to include additional plug-ins that deal with data and image manipulation and management.
Research tools would be developed to allow users to search, analyse and manipulate content stored using the RDF standard. These tools would allow activities such as semantic searching (using the power of semantic inference and reasoning), annotation, data comparison, image manipulation, and so on. The development of RS tools would continue over time and the initial phase of RS would prioritise a core set of research tools required to perform the most common research activities.
Example Use Case
When all these elements are brought together they form a powerful environment for collaborative scholarly research. Collaboration applications would have access to data and digital assets that can be utilised into a shared working space to support and generate group activity. This activity can be specifically directed by the use of workflow tools that can be configured, if required, to provide structure and control to a particular process. In addition, research tools would be available from within a collaborative application so that users can access them directly. Data from the RS repository would inform group activity and the results would be saved to inform other ongoing activities or contribute towards conclusions. This work can then be brought together to form the basis of more formal publication.
A research project wishes to examine some of the works of a particular artist. Data about some of the paintings is held by two organisations that wish to collaborate and share their information. Other organisations also hold relevant data that would provide an important contribution to the research. The two main institutions store structured data in different database systems and export their data to the RS data repository. Data access and security rights are agreed and configured. One researcher starts a discussion thread on a particular painting. By using a search tool the researcher can quickly identify where information is lacking and can initiate group activity to help improve the data. The particular dataset under examination can be identified through a search link or can be imported into the discussion workspace for others to see and comment on.
As the discussion continues, the researchers record some formal annotations against the data. They can do this by using an annotation tool available directly within the forum. The annotations are recorded with the researcher’s identity for attribution purposes and are available to other researchers who search and view the data, and who can record their own annotations should they wish.
The other contributing organisations do not hold their data in a structured format but use the RS gateway interface to input their information manually. As data comes into RS from other organisations, the RS system will inform the members of the research group prompting further examination and searching, perhaps to reveal any new significant links between the existing and new data. Where semantic relationships are not found by the system, researchers can create their own links and relationships and annotate these links to explain their significance.
As more information is generated missing metadata can be entered against the object records according to a pre-defined workflow system that ensures proper sign off by content owners. This data will be recorded as having been created within RS, or if it cannot be agreed, will be assigned another appropriate status.
The discussion forum will also provide direct access to image manipulation tools. Users can select images imported or uploaded to the project and add metadata, or again create annotations. These image tools can extend to providing comparison functionality allowing annotation of images that have been overlayed, perhaps a normal image and an x-ray. Links to this work can be embedded into the discussion so that others can view the annotations using the same view as the original researcher.
All the embedded links can be grouped and arranged in particular ways to provide other researchers with an insight into the steps taken to analyse the data. To ensure that users do not have to trawl through an entire discussion they can use a search tool, or run a report, to bring required elements together in a more structured way.
Where changes to original data sources have been agreed, authorised institutions can choose to update their own local systems with the new information. However, RS would keep a copy of previous versions of the data.
Some projects may wish to organise activity on RS using a particular method. For example, changes to data and/or annotations may need to go through an authorisation process by one or more senior researchers before they are released for use by others working on the project. This may be to ensure that the research keeps to its brief and does not wander off in directions that are not core.
Another example may require that a certain set of data or an image needs to go through some formal documentation steps before it can be released for general collaborative activity. This would ensure that documentation is completed up front. Also, it may be advantageous for certain individuals to be alerted to changes or updates so that they have an opportunity to comment before the work moves on through the workflow. RS will allow projects to define their workflows and attach people and activities to them.
Annex 2 - Priority Requirements – Notes
Project Managers representing the current Museums and Art Conservation prototype projects were asked to prioritise the 100 requirements documented in a Requirements Catalogue (based on the project’s design meeting and other materials) and choose their top 20 requirements to help inform the initial development stages of RS. Now complete, these priorities represent a relatively well know set of research activity, some of which are very well suited to the ResearchSpace environment. The following are general notes on some of these requirements and will be expanded in due course.
High Resolution Images
High resolution image viewing across the internet is normally accomplished using zoom tools like Microsoft Deep Zoom and Zoomify. This type of system uses many layers of an image at different resolutions and divides these layers into tiles. As the user zooms in only the relevant tile is loaded reducing download times to the browser
The OpenZoom project (www.openzoom.org) provides an open source development kit which includes the ability to overlay the tiled sections with annotations. It is likely that RS would provide a client side tool to prepare a high resolution images for online zoom and upload them. An online control would provide viewing and allow so that, as the user zooms in, a note can be made against a particular region of the image. This text would be viewable by others but also saved to the data repository with the appropriate identifying reference.
Annotations can refer to different types of source material from documents and images to database fields. A document annotation would refer to a particular part of a document. An image annotation might refer to a particular region of an image and a data annotation may simply be another entry in the database which is linked to a data record or data field in question.
The W3C currently sponsor a document annotation system using RDF called Annotea. Annotea has various implementations including a standalone client and a Firefox plug-in. Annotations can be stored within a configured RDF store.
Authorship Attribution & Provenance
The RDF format allows layer upon layer of information to be added to a database with a high level of flexibility and with few restriction. It also provides for more than one mechanism for recording attribution and provenance information. These are summarised in the W3C document, http://www.w3.org/2001/12/attributions/
The ResearchSpace project will need evaluate and choose a Digital Asset Management system. Typically these come with some web based image manipulation tools but these will be more general in nature than geared towards particular research activity. However, metadata management will be a core element of any DAM system and requirements such as light boxes, and certain image comparison tools, may also come within the standard functionality of a DAM system.
The DuraSpace / Fedora project (part funded by the Mellon Foundation) is a possible option which already uses an RDF system to store digital asset metadata. However, this system may be too complex, at least for initial development. Simpler options include using the DAM module bundled with the chosen Content Management System. For example, Nuxeo which is used in the CollectionSpace project, http://www.nuxeo.com/en/products/dam/features
[Annex to be continued]