Shared Infrastructure

Proposal for a Shared Technology Infrastructure

August 25, 2009


Since we last met together in New York on April 23, the Foundation and its advisors have been thinking very hard about the future direction of what began as our pilot project initiative in conservation documentation in digital form. Many of these projects have exceeded original expectations, particularly for the quality and depth of inter-related art historical, technical, archival, and scientific content, such that they can no longer be considered solely in the realm of “conservation documentation.” We are now on the verge of transitioning what began as “the pilots” to a small but robust group of “prototypes” which we hope will serve the field internationally as the basis for new and expanding on-line interdisciplinary research.

As you know, among the Foundation‘s primary considerations for continuing funding for any projects involving software development is ensuring their sustainability. While the projects to date have been strong in terms of their various accomplishments and institutional commitment, their further development and future success demands that we carefully consider efforts to manage costs and leverage development resources. Writing and maintaining software code is not nor should not become the primary business of museums or art-history institutes. Consequently, when we meet in London on September 14, we would like to initiate a discussion about the ways in which we could collectively lead these projects to develop in a way that:

  • assures they incorporate best practices for production software
  • results in software that is broadly useful and useable
  • leverages other software and software development resources, including Mellon funded projects that have resulted in applicable tools and models
  • recognizes the commitment to continued development over a potentially long time and formulates a plan to support that

After careful consideration of a number of possible paths forward, we have concluded that we would very much like to see all of the projects brought onto a single, common technology foundation.
In order to frame the September discussion, it is important for us to provide in advance a potential framework for such a path that anticipates the most important questions, so that you and your colleagues can think hard about the plan and come prepared to discuss your reactions. We are extremely eager to discuss these ideas with you as well as its impact on your institutions and the expectations or concerns for dealing with a larger technology effort.

Proposal for a Shared Technology Infrastructure

The Mellon Prototype Projects

Rather than integrate software development efforts into each of the prototype projects, we propose to “spin off” a separate technical project under your joint governance.

The most effective way to make each project as productive and sustainable as possible is to reduce the ongoing costs of the technology infrastructure required to support your work. The most effective way to do that is to build a single infrastructure that would support all of the current projects, as well as those that we hope will follow. In order to achieve the greatest possible degree of efficiency and economy, as well as to deliver the highest quality results, we propose to contract the software development work out, under your guidance and governance, to a group of exceptional technologists who are already working on museum projects. In effect, they would become your “vendor” for this project.

Another important purpose in spinning off the technology component is to ensure that project staff remain as free as possible to concentrate efforts on the scholarship going forward. As we looked at what would be required to continue individualized technical development for the various projects, it became clear that doing so would not only severely challenge the available resources of some of your institutions, but would also force several of your people into what amounts to two full-time jobs: scholar and technologist. A separate software initiative would still ensure that your technical people oversee the technical aspects of the project; in fact, their participation and governance would be essential. However, unlike the “every-project-for-itself” model, their participation in project-related technical activities would be manageable within the confines of a normal work-week.

We are persuaded that any other strategy (even a reduction to as few as two platforms) would lead to constantly escalating costs and would almost certainly hamper scholarly collaboration over time.

How would the governance work?

Each project participating in the next phase would delegate one individual to serve as a Trustee of the technical project. These appointments would be confirmed by the institution directors, who would guarantee the release time required. Together, these individuals would serve as the Board of Trustees of the technical initiative. We estimate that Trustees would serve an estimated 5 hours/week over the 18-month lifetime of the initiative, with that service likely to occur in batches of activity toward either end of the project, interspersed with long, quiet periods in the middle. One institution would volunteer to manage and disburse the funds for the technical grant from Mellon under the direction of the Trustees.

Together with other participants from the projects, and with the assistance of technical experts provided by the Research in Information Technology (RIT) program of the Mellon Foundation, the Trustees would work to finalize the specifications for a shared infrastructure and to negotiate delivery of the infrastructure with a chosen vendor. The Trustees would oversee design and development efforts at a high level (involving the individual projects for all testing and acceptance), would provide feedback to the vendor when and as any need for changes may arise, and would authorize payment to the vendor as the work is performed to their satisfaction.

Once the infrastructure software has been delivered and the terms of the technology grant have been fulfilled, the Trustees could disband—although, at the discretion of the institutions, the Board may choose to continue as a means of coordinating further collaborations.

What about the advisory group for software development?

We have already discussed with most of you the creation of an advisory group to oversee software development. The addition of a Board of Trustees does not negate the need for additional expertise; indeed, the constitution of an advisory group can go a long way to helping with technology choices as well as contribute to the sustainability goals. With the structure for consortium governance suggested above, focused on technology in the service of scholarship, an advisory group would continue to have value, and the consortium board would give it a specific locus of responsibility.

We feel that the selection of members of the advisory group should favor practicing technologists rather than “academic” computer scientists (whose bias would inevitably be toward the new and unproven rather than toward reliability and performance). In particular, it would be useful to include software engineers from the CollectionSpace project, and those who are building RDF technologies and have the enterprise design and development experience that has yet to come into scope for the prototype projects. [Please see page 5 for a separate but related discussion of RDF technology].

At least one or two advisory board members should be selected from the user community; preferably from among management‐level personnel to help the entire advisory board understand better the real‐world problems, focused on cultural content, that this software development is attempting to solve.
Given the unmet needs that are specific to the research interests of these prototype projects, the advisory panel might well include expertise in managing images and image manipulation as well as in emerging technologies and community standards in online academic publishing.

Would we have to change how our project works?

The technology project would be designed, with assistance from the RIT Program of the Mellon Foundation, to ensure that every project is able to continue to develop its own scholarly environment without constraint: there would be no effort made to force projects into a common mold, and every effort would be made to preserve the scholarly uniqueness of each project. In fact, one reason for the Board of Trustees is to ensure that the scholars continue to govern the infrastructure, and not vice versa. Projects built on existing relational databases can continue to participate equally in this new formulation, and we plan to explore this aspect in greater detail when we meet in September.

At the same time, however, certain purely internal aspects of the projects must necessarily be reconciled: for instance, a means must be developed for the combined storage of data from multiple projects. These changes would be the responsibility of the Trustees, but they should not be noticeable to anyone other than the technologists working on the project.

Does one infrastructure mean one host?

At present, the cheapest and most collaborative way to sustain these projects going forward is a shared hosting model in which all projects are run on the same infrastructure on the same server. However, technology changes rapidly, and this may not always be true. Even though shared hosting may be a maximally efficient use of resources, other factors may prevent it from being fully realized. Therefore, the technology infrastructure would be built in such a way that it could be used to host one or many projects, and be hosted inside a single museum or remotely, as the project‘s management prefer. As for collaboration, any project built on this infrastructure would be able to collaborate with any other project built on the same infrastructure, whether or not they are hosted on the same server, at the same institution, or even on the same continent.

Would splitting off the technical work affect the size of the scholarly grants?

The funding for the technology work comes from the same pool as the scholarly budget. However, the impact is not what you might expect; in fact, splitting off the technology initiative would actually reduce its total share of the budget, as compared to having each project continue its own technical development. This would allow the projects to retain more funding for scholarship—which is one of the primary attractions of the spinoff approach.

What happens to the projects while the new technology is being built?

For the most part, each project would continue to operate and grow on their existing technology infrastructure until the new technology is ready. Any of the projects could volunteer to be pilot-testers for the new infrastructure, and some funding would be available in the technology grant to compensate for the additional costs involved in that activity. For those projects not involved in pilot testing, there would be a brief period of migration over to the new technology once it is finished. At that time, projects would have the option of creating a new appearance/interface to take fuller advantage of their new technology, or to make the transition essentially invisible to the scholars working on the project.

How would we coordinate our shared technology going forward?

We anticipate that several vendors (through Mellon‘s RIT program) would be available to support the new technology infrastructure, so that projects could simply purchase services as would be done for any other software product—except that prices could conceivably be lower because the vendors would find themselves in a competitive market. Alternatively, the projects could continue to collaborate together on operating, supporting, and enhancing the software, which should keep costs lower still.

What about the user-facing aspects (front-ends) of the projects?

As with back-end development, we would like to see a rationalization in the development of user-facing front-ends, recognizing that there may continue to be differences in features and functions from project to project.

User‐facing functionality changes faster than any other portion of a software project, far faster than back-end technology. Consequently, none of the projects will likely ever consider themselves “finished”, as users will be clamoring for new features almost immediately, leading to an iterative front-end development. What the sustaining infrastructure needs is not a single, ideal front‐end, but rather an infrastructure that makes the continual refinement and customization of front-ends to meet evolving scholarly needs as quick, powerful, and inexpensive as possible. Two projects funded by the Mellon Foundation RIT Program, Fluid and FluidEngage, offer that kind of infrastructure as well as a software design community already involved with museum-related projects that can assist in realizing particular design aspirations.

In a shared project infrastructure, the project teams and users would be able to agree– in all likelihood rather quickly and effortlessly – upon (a) the distinctively best features of each of the projects; and (b) the features that would prove problematic in another setting or with another target user population. Any consensus would then become the basis for common user interface elements, remembering that the community of end users is relatively focused in its interests and goals.

Rationale for RDF Triples

To date the projects have explored two options for the back-end: Relational Databases and Resource Definition Framework or RDF. On balance, the Foundation‘s recommendation is that the projects choose RDF, an approach that maximizes the chances that scholars will be able to pursue their research in whatever new directions may emerge. RDF “triple stores‖—the equivalents of relational databases for RDF data—impose no strict requirements on data models and permit models to be revised and even replaced with minimal effort, a set of features well‐suited to the open‐ended nature of scholarly research.

This approach raises concerns that must be met:

1. RDF is a comparatively new set of technologies, lacking the mature standards of the relational database world and having a pool of competent professionals that is far smaller than the pool for traditional databases.

The labor pool, while still small compared to relational databases, is in absolute terms reasonably large and multinational; moreover, the National Gallery has a relationship with the University of Southampton, which is one of the international leaders in RDF technologies and a nexus for the sorts of skilled labor required to make the project successful.

2. The lack of widely accepted standards for search and retrieval still poses something of a risk.
The pace of improvement is rapid, and the costs of picking what turns out later to be the “wrong” approach can be mitigated substantially by careful system design.

3. Uncertainty has been expressed about scalability, suggesting that current RDF triple‐store technologies may not be able to handle the large number of triples that the prototype projects would generate. RDF is a prolix data standard which translates even comparatively simple relationships into a large number of “triples,” which must be stored, managed, and retrieved expeditiously if an RDF solution is to be workable.

This argument is largely out-dated as the technology is maturing rapidly, and growth in scalability has been particularly rapid over the last 2 years. Currently available triple‐stores, both commercial and open source, provide storage that is more than adequate to the needs of these projects at present, and the growth in performance is accelerating at a pace that should ensure that even massive scholarly adoption of the software produced by the prototype projects would not tax an RDF‐based infrastructure.