Proposal to the Andrew W. Mellon Foundation
ResearchSpace Working Prototype (Stage 3)
Author: Dominic Oldman, Principal Investigator & Deputy Head of Information Systems, British Museum
Date: 8th October 2010
Collaboration – A virtual environment which allows scholars to communicate and work together online to achieve common research goals and objectives.
Harmonised data – Collection and conservation data held in a common semantic form so that it can be shared and reused between projects without expensive integration software.
Research tools – The provision of components that can be used independently or with other components for data analysis and manipulation to support a particular collaborative research workflow.
The project is initially aimed at supporting existing Mellon prototype projects initiated under the Museums and Art Conservation program; It would also be able to support a much wider range of cultural heritage projects to secure the long term sustainability of the environment, and ensure that costs are kept to an absolute minimum. This in turn increases its accessibility, particularly for smaller organisations.
The ResearchSpace project is divided into 4 stages.
Stage 1 - Definition of the ResearchSpace vision and requirements.
Stage 2 - Definition of the ResearchSpace specifications and continuing RDF feasibility work.
Stage 3 - Development of a working prototype of the ResearchSpace system.
Stage 4 – Development of a production version of the ResearchSpace system.
Stage 1 has been completed and the deliverables published to the ResearchSpace project site at www.researchspace.org. Stage 2 is currently underway with the support of an Andrew W. Mellon Foundation, Officer’s Grant.
This proposal addresses stage 3 of the project and would create a version of the ResearchSpace system that would practically demonstrate some of the main project concepts and objectives. The system would integrate an existing open source collaboration application with an RDF (Resource Definition Framework) database, and incorporate a number of research tools. The final selection of these research tools would take place at the end of stage 2 and be informed by the stage deliverables along with a review by ResearchSpace stakeholders. A short description of the possible tools are described below. The prototype would be of sufficient robustness to support limited online collaborative research and would be reusable as the foundation for a final production version of ResearchSpace.
The deliverables of stage 2 are listed below in the background to the main proposal. The overall achievement would be;
“..a practical and costed design for a shared technology infrastructure, ready for procurement, development and implementation.” [Stage 2 proposal]
This would entail a more detailed and practical definition of the research tools that ResearchSpace would provide to scholars; a more technical specification describing the interaction of the 3 ResearchSpace elements and ultimately a specification that can be used by software developers to complete stage 3. The provision of a set of research tools, within a consistent and collaborative environment, is an important part of the ResearchSpace vision. A document providing additional information on the intended use of ResearchSpace is provided at Appendix 5.
In addition, stage 2 will provide additional information about the use of the Resource Definition Framework (RDF) and the CIDOC-CRM (Conceptual Reference Model) ontology, mainly through the conversion and testing of the British Museum’s collection data. This knowledge will also make an important contribution to the final specification. The staged approach of the project aims to reduce risks and ensure that the development of ResearchSpace is not compromised by an approach which is overly innovative.
The Museum was based on the practical principle that the collection should be put to public use and be freely accessible. It was also grounded in the Enlightenment idea that human cultures can, despite their differences, understand one another through mutual engagement. The Museum’s current strategy includes two significant objectives;
To manage and research the collection more effectively
To enhance access to the collection
As such the ResearchSpace project, originally initiated by the Andrew W. Mellon Foundation, has particular resonance for the British Museum. The Museum actively encourages collaborative research with other organisations and facilitates this by continually improving access to the collection and associated knowledge. The Museum’s collection online system was launched in October 2007 providing the scholarly community with the same data that previously was only searchable within the Museum’s own walls. This online database is the culmination of 30 years work but is still in its early stages. ResearchSpace would allow organisations like the British Museum to improve collection information by allowing a wider range of experts to more easily participate in its development.
The ResearchSpace project would allow the British Museum, and other organisations with the same ethos of collaboration and accessibility, to reach an unprecedented level of scholarly investigation. By creating an infrastructure that allows data from many different sources to be semantically integrated, and with a set of tools designed to take full advantage, it will uncover knowledge that would be impossible to discover otherwise.
In a document issued in August 2009 by the Mellon Foundation’s Museums and Art Conservation programme (MAC), in association with the Research in Information Technology programme, stated the following;
“The most effective way to make each project as productive and sustainable as possible is to reduce the ongoing costs of the technology infrastructure required to support your work. The most effective way to do that is to build a single infrastructure that would support all of the current projects, as well as those that we hope will follow.”
In addressing the data technology to be used in the shared infrastructure the same document concluded,
“To date the projects have explored two options for the back-end: Relational Databases and Resource Definition Framework or RDF. On balance, the Foundation‘s recommendation is that the projects choose RDF, an approach that maximizes the chances that scholars will be able to pursue their research in whatever new directions may emerge.”The Principal Investigator, Dominic Oldman, has taken the original Mellon proposal and defined a more detailed vision of the shared infrastructure now known as ResearchSpace. In addition, the following initiatives have helped develop the project further;
The endorsement of British Museum Directors of a business case for publishing the Museum’s collection in the RDF ‘semantic’ format, as computer readable data, in accord with the Museum’s accessibility strategy.The shared infrastructure approach, together with the use of semantic technology, would reduce both time and costs (for example, the time and costs involved in setting up a separate IT infrastructure for each project, and savings made by reuse of technology across many projects) as well as remove risks for many projects that would normally have to develop software independently and in isolation. The gradual build up of these ‘community’ research tools and services would allow the Mellon prototype projects (and other research projects) to concentrate on the collection, organisation, and presentation of their data, rather than deal with technology decisions and issues that most of them are not equipped to manage.
The delivery of a project successfully publishing conservation and science data on the British Museum’s Collection Online system in RDF format. The data is served alongside data sourced from a traditional database format.
The initiation of a British Museum project to better understand the technology requirements of collaborative research projects using evidence from both internal and external sources.
Advocacy of the project to other museums and institutions.
ResearchSpace would also be, in principle, available to any cultural heritage organisation, regardless of their funding source, facilitating the creation of a significant and sustainable community of people, data and tools. ResearchSpace would be owned and managed by the organisations that use it rather than being managed by any single institution. These factors would make research proposals more attractive to a whole range of funding organisations and ensure that ResearchSpace is continually updated and refreshed with new data.
It is of particular concern that many collaborative projects rely on the technology and information systems of other organisations. This means that data can be locked away and not accessible to its owners or originators. This creates a problem, not only for the data contributor, but also the managers of the project in that, in many cases, there is overhead in managing other organisations data that cannot be sustained. It also creates a potential data authority risk, threatening the long term sustainability and credibility of the project.
The project objectives and costs have been subject to a peer review and the report is included in this proposal at Appendix 2. In addition the data technology choice has also been independently reviewed and a report appears at Appendix 3.
The system deliverables include;
• The creation of RDF data services within an off-the-shelf Content Management System using an off-the-shelf RDF database management system.
• A selection of research tools that make use of the RDF data services and the inherent benefits that RDF provides.
• Integrate the research tools into the social networking tools (Blogs, wikis, discussion forums, etc) provided by the CMS such that the tools can be launched from within the CMS and integrate with collaboration activity.
• Migration of data into the system from various projects and organisations.
Since semantic technology is relatively new to many software consultancies, the tender will require that suppliers demonstrate their expertise and experience in both. It is possible that no one supplier possesses all the skills necessary to complete the contract and that the semantic services element are satisfied by a different supplier or that a partnership arrangement is established.
• The supplier delivered everything that was agreed? – This should be checked against the agreed specification and agreed list of deliverables in the contract.
• The deliverables are of an agreed quality in terms of functionality, usability and performance. – The supplier and project team will develop test scripts and scenarios for testers to use to verify these aspects.
• The supplier is using the agreed standards and approach described in the specification document. The specification will provide guidance on the standards that should be employed to complete a particular function. For example, the specification will refer to the standard to be used in incorporating semantic version control.
The deliverables to be tested will include;
• The system and application software.
• The system, installation, administrative and user documentation.
• The source code and configuration information.
The contract should also specify two formal peer review points, one during the software development stage and one at the end. The review should be conducted by an independent third party (or British Museum programming staff) to establish the following;
• The supplier is on course for completing the contract on time.
• The supplier is working according to the specification and standards specified.
• The software itself complies with coding standards in terms of efficiency and clarity, is self documenting and makes correct use of design patterns.
It is also intended that Martin Doerr, who is currently advising the Museum on the implementation of semantic technology for cultural heritage data, should also review the work relevant to his expertise.
The contract will also specify that the software coding is test driven and that unit tests are incorporated and handed over as part of the final solution.
The software development work will be closely monitored by the Project Manager and QA (Quality Assurance) consultant, and reviewed at regular intervals with the help of project stakeholders. The contract would include the production of acceptance tests based on compliance with the ResearchSpace specifications together with performance and usability targets. The specifications delivered as part of stage 2 will be used as part of the statement of requirements within the invitation to tender (ITT), to which prospective suppliers must respond. A draft scope of the works to be included in the tender are attached at Appendix 6.
3.3.7 The development of the working prototype is likely to uncover issues not anticipated by the requirements and not captured by the feasibility work of stage 2. As a result the project will revisit the specifications and make the appropriate recommendations in preparation for stage 4.
The anticipated model for maintaining ResearchSpace is based on an annual subscription fee. The level of this subscription fee is directly related to the number of organisations who subscribe. This will determine the final hosting and hardware costs, and the size of the support and maintenance contract. Although the working prototype will provide organisations with a more practical demonstration of the system, it will not be known until later how many organisations will sign up, and only rough estimates can be made at this point. The advocacy activities of the Principal Investigator and the support of the British Museum (including the large repository of data from the Museum) are aimed at maximising the number. It is possible that some transitional funding would be sought in stage 4 to cover initial costs while the ResearchSpace system is established within the cultural heritage community.
The annual current cost of the British Museum’s hosting environment is around £80,000 per year and between a third and a half of this supports the Collection Online system (approx £25000 to £40000). It is anticipated that the subscription fee for using ResearchSpace (which could potentially replace the Collection Online system) would be a fraction of this figure, providing a cost effective alternative with many additional features. It is expected that ResearchSpace will provide similar cost savings to other organisations.
At the end of Stage 2, when more information is known, the ResearchSpace Principal Investigator will provide different cost models based on different assumptions about the size of the infrastructure and the number of organisations. The stakeholders will then agree on the model to be taken forward.
Although the project has identified various open source tools that could be reused by ResearchSpace, the potential of these have not yet been investigated; This is a task assigned to stage 2 of the project. However, many of the research tool requirements are well known and examples can be found on the internet (and are cited in the Requirements Catalogue). The main difference is that ResearchSpace will develop tools within a consistent and integrated environment. It is essential that ResearchSpace tools use a consistent set of standards and design, and operate within a single collaborative environment integrated directly with the data provided by participating organisations. This also allows managed workflows to be applied to support a variety of different process models.
In conclusion, adjustments to the ResearchSpace project and development plan may require some changes once stage 2 of the project has been completed. This is reflected in some of the activity descriptions.
The following items are in scope for stage 3:
The following items will be in scope if funds are sufficient;
Full attribution and security features.
The software development is made up of the following main elements:
• Research Tool Data Services – This data layer would perform a similar purpose to a traditional data and business layer used in traditional ‘n-tier’ programming. However, because the services are using RDF data the challenges for retrieving, creating, modifying data are different. It is anticipated that these services will require optimisation to ensure the right level of performance. The service layer might also need to provide parallel read and write functions to the other ResearchSpace databases for ancillary services where the RDF database would not be used. Although the research tools will primarily be using collection and conservation data from the RDF database, it is possible that some ancillary data will not initially be stored as RDF in the working prototype. However, it would be a stage 4 objective to store all data as RDF in the final production version to ensure that data and logic are completely self-contained and portable.Digital Asset Management metadata – Digital Assets would be linked through an RDF reference. In future stages the project may look at the use of XMP, if it is ratified as an open standard, or other open standards which have a semantic version (e.g., MPEG-7).
• Research Tools - The separation of the data services means that the developers working on the research tools do not need in-depth knowledge of RDF technology themselves. An observation from other projects is that developers who are not familiar with RDF database technology will come across a different and unfamiliar set of issues when compared to traditional database technologies.
• Discussion Narrative - It is not anticipated that any of the narrative generated as part of the collaborative forums would be saved in the RDF format at this stage, and this would also include the data recording how the narrative cross references with the structured RDF data. In later versions of ResearchSpace tagging of narrative for storage in RDF may be an option (and is already a feature of some CMS systems) but it is the intention to initially keep the working prototype RDF data schema as simple as possible.
Online Data Entry Form – This form would allow users of ResearchSpace to upload data records directly to the ResearchSpace repository. The form would provide a temporary solution to cater for current projects.
Migration Services – The selected software contractor would also be asked to migrate data and images into ResearchSpace. This would include the British Museum’s data as well as data from the MAC projects.
The tender for software development services will use British Museum tendering rules. It is expected that this will include at least two different general software development organisations. The preference would be that one company owns the contract but they would need to demonstrate that they have the required knowledge and resources to cover all development areas. It would be open for a supplier to bid with a partner, or sub-contract certain aspects of the contract, to ensure that all the relevant skills are available.
Semantic Search Tool – The semantic search tool will provide the main mechanism for finding and accessing ResearchSpace data and assets. It is envisaged that this tool will provide a number of different mechanisms for searching the ResearchSpace repository including the ability to assert rules used for inference and display relationships to other data based on initial results.
Data Input Tool – This will allow standard data creation, modification and deletion based on data business rules and taxonomies.
Image Annotation – The ability for researchers to annotate an image either against the whole asset or against a particular point or region of the image. The data would be stored as RDF and be searchable through the semantic search tool. Annotations should be both controlled using the CRM (Conceptual reference Model) framework but also allow the option for uncontrolled comment. Note: http://www.cidoc-crm.org/docs/fin-paper.pdf & http://www.w3.org/2005/Incubator/mmsem/XGR-vocabularies/#existing-SI & http://eprints.aktors.org/425/01/OntoMedia.pdf
Image Zoom (and Annotation) – Many ResearchSpace users will want to upload large high resolution images and allows colleagues the opportunity to zoom in revealing detail. The ability to annotate at different levels of zoom would also be useful. In the event the annotations should be available to the user and take him or her to the exact point where the annotation was made.
Data Annotation – The ability to add additional information using alternative ontologies or simple free text.
Note on Annotation – The ResearchSpace annotation requirement for both data and digital assets is a strong one. The stage 2 specification work will look carefully at existing work in this area and particularly the Open Annotation Collaboration project (www.openannotation.org) which is developing an approach to RDF annotation. The specification will be used for the working prototype.Image Compare – The ability to compare different images in order to uncover areas of interest for annotation. This may involve operations such as;
• Scaling two different images so that they fit one over another.It would also be assumed that basic image editing features would also be available.
• The ability to compare images through transparency or pixel difference.
Relationship / Link Editor (internal and external) – The ability, through controlled terms, to link object records and group these relationships so that a defined story or pathway can be recorded for others to follow.
Version Comparison (Track Changes) – It should be possible to see all the different versions of a data field and see who created it.
Geographical Mapping – It should be possible to map data objects according to different types of recorded location, whether that is a production place, a place of birth, the origin of a material and so on. The user should be able to define the scope and scale of the map and the data elements that should be plotted.
Timeline Mapping – The ability to map objects against a graphical timeline. It should be possible to have different modes for recording frequency as well as individual object mapping. For example, it should be possible for a researcher to plot a large number of objects on a timeline to show graphically information such as the number of a particular objects found.
The ResearchSpace Requirements Catalogue also implies a number of other tools for collaboration, such as video conferencing and online educational tools.
• Principal Investigator – Dominic Oldman, Deputy Head of Information Systems and IS Development Manager, British Museum. Responsibility for direction, reporting and completion of the project.
• Project Manager – Position to be appointed. Responsibility for the planning, execution, and closing of the project.
• Domain and QA Consultant - Dr Austin Nevin (Andrew W. Mellon Fellow and coordinator for the 'Master of the Fogg Pieta' Pilot Project). Providing expertise on the project’s scholarly requirements, liaising with stakeholders and ensuring the quality of the deliverables.
• Application Support Analyst – Position to be appointed. Providing IT installation and configuration skills.
Software development resources will be provided by a supplier appointed through a tendering exercise.
3.13.1 External suppliers will be provided with the broad project milestones and asked to submit plans for their part of the project, as part of the tender process. The project milestones are as follows;
3.13.2 The following provides a more detailed breakdown of the work;