Skip to content Skip to navigation Skip to collection information


You are here: Home » Content » Online Humanities Scholarship: The Shape of Things to Come » As Transparent as Infrastructure: On the research of cyberinfrastructure in the humanities


Table of Contents


What is a lens?

Definition of a lens


A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • Rice Digital Scholarship

    This collection is included in aLens by: Digital Scholarship at Rice University

    Click the "Rice Digital Scholarship" link to see all content affiliated with them.

Recently Viewed

This feature requires Javascript to be enabled.

As Transparent as Infrastructure: On the research of cyberinfrastructure in the humanities

Module by: Geoffrey Rockwell. E-mail the authorEdited By: Frederick Moody, Ben Allen

The Shape of Things to Come -- buy from Rice University Press.

“research infrastructure” means equipment, specimens, scientific collections, computer software, information databases, communications linkages and other intangible property used or to be used primarily for carrying on research, including housing and installations essential for the use and servicing of those things." (From the Budget Implementation Act, 1997, c. 26)

In 1997, the Canada Foundation for Innovation (CFI) was established by Act of Parliament to fund the development of research infrastructure. Since then it has committed $5.27 billion for more than 6,600 projects across Canada. This massive investment in research infrastructure was intended to build capacity for innovation, attract and retain top researchers (often from the USA), train graduate students and research staff, foster collaboration, and make sure Canadian institutions made good use of research infrastructure.1 CFI was a welcome and new approach to funding research after the cutbacks of the early 1990s. What was different was that this funding was for something few of us had thought about, let alone applied for: namely, “research infrastructure.”2 We were supposed to apply not to do research but to set up labs that would attract researchers, train graduate students and transform our research. As Robert Giroux, the Past President and CEO of the Association of Universities and Colleges of Canada, put it in a quote on the CFI site,

Before the CFI, the university research enterprise was in very bad state. The CFI re-energized university research in Canada. It brought hope and support. It attracted researchers to Canada and retained Canada's best and brightest. It encouraged state of the art investments in research infrastructure, and Canadian researchers became the envy of researchers from many countries around the world. CFI was the start of a revolution in research funding by the Canadian Government.3

The establishment of CFI and the turn toward infrastructure anticipated a similar reorientation elsewhere in the research world, notably in the United States, where the 2003 “Atkins Report” of the National Science Foundation, entitled Revolutionizing Science and Engineering Through Cyberinfrastructure, promised an extraordinary transformation in research environments, and presumably also research, if there was appropriate investment and organization of cyberinfrastructure (CI). This report was followed by Our Cultural Commonwealth in 2006, which advocated for innovative CI in the humanities and interpretative social sciences. As David Green put it in an introductory article on the issue of CI for the liberal arts, “This is going to be big.”4 He goes on to quote Arden Bement of the National Science Foundation, who wrote that the CI Revolution “is expected to usher in a technological age that dwarfs everything we have yet experienced in its sheer scope and power."5 Likewise in the UK, a programme for e-Science, as they call infrastructure broadly, was set up in 2001, and in Europe there is a European Strategy Forum on Research Infrastructures that has been developing initiatives.6

All of this activity and planning on our behalf should prompt researchers in the humanities to ask exactly what (cyber)infrastructure is or could be so that we might understand how it might be revolutionary and whether we want this revolution. As Peter Freeman puts it in an article on designing and defining cyberinfrastructure, “Cyberinfrastructure can have many definitions and, to some extent, the definition is in the eye of the beholder.”7 This paper will therefore step back and look at the infrastructure turn, as I call the revolution, before we go too far down that road, so to speak. The paper will look at the idea of research infrastructure in the humanities in four passes:

  • First, a look at traditional infrastructure like roads, so they can help us understand what cyberinfrastructure is. How is the call for information infrastructure drawing on our understanding of the infrastructure we know?
  • Second, a look at the characteristics of infrastructure and some of the definitions offered.
  • Third, I will look at the dangers in general and especially the issue of the turn from research to research infrastructure. I will argue that we need to be careful about defining the difference and avoid moving into the realm of infrastructure those things we are still studying.
  • Finally, I will give as an example of the moving models of research and infrastructure the area of text analysis tools.

The Infrastructure We Know

How is traditional physical infrastructure paradigmatic for defining and turning to cyberinfrastructure? We are used to considering roads, sewers, and power as infrastructure and we have expectations about their funding and maintenance; how do those known types of infrastructure turn our imagination when it comes to virtual infrastructure?

Connections Between, as in Roads

Visible infrastructure like roads that make modern life possible seem an obvious candidate paradigm for what information infrastructure should be. When we buy a house, we expect the state to create and maintain the infrastructure that connects our house to others so we can walk or drive to and fro. We use the roads to move bits of physical stuff, including ourselves from our space to work, play and others, over roads. That’s what we pay taxes for.

By analogy, the Internet would seem to be a road system for moving virtual stuff, comparable to what we use for the physical stuff. Nicholas Negroponte in being digital played with this switch from atoms to bits. That the Internet was called the “Information Superhighway” in the 1980s and 90s developed the analogy. Al Gore is supposed to have promoted the view that just as his father, former US Senator Albert Gore, promoted the development of an interstate highway system, so he was a champion of the new Information Superhighway that would benefit commerce, education and communication. His signature initiative was, not surprisingly, called the “National Information Infrastructure.” Road jargon has woven itself thoroughly into network jargon with terms like “traffic” and “onramp.” The very visible and useful road system made a perfect analogy for explaining the invisible Internet and its need in the modern state to make connections. Who, after all, could imagine a state without roads?

Applying this infrastructure paradigm to research, we can see how infrastructure is the connective tissue maintained to allow us to collaborate and exchange information. Just as roads are absolutely necessary for movement and economic development, so are information highways needed for virtual movement and electronic business. You can also see where the articulation between infrastructure and research computing is located. The infrastructure connects researchers and other entities like corporations and governments. Anything that is needed to connect more than one person, project, or entity is infrastructure. Anything used exclusively by a project is not.

It is worth pointing out an important difference and that is that the Internet is, in fact, not run like the highway system. Governments do not maintain the Internet, though they regulate it; and despite all the talk about how it is a system designed to bypass interruption, certain larger ISPs like Cogent can effectively block others, leading to blackouts such as when, in March 2008, parts of Canada could not access parts of Sweden thanks to a commercial dispute between Cogent and Telia.8

Utilities that Service, as in Power

We also tend to think of less visible services like electricity and water as infrastructure. Obviously the transmission lines, the water mains, and the sewers are infrastructure in the sense of what is between, but they are not between individuals. They are between us and service providers and generally include the generating stations and the sewage plants. The infrastructure really is the service that provides electricity, provides water and disposes our waste.

While we can’t imagine civilization without roads of some sort, we can imagine alternatives to government provision of services like power and water. Roads by definition have to be shared to connect. Utilities don’t need to be shared and are thus more likely to be privatized, though there still seems to be an obvious efficiency in maintaining one power grid and one water/sewage system. We could each dispose of our own sewage, generate our own electricity and get our own water, but these functions are better provided as services on a large and efficient scale. The question with such utilities is at what scale and how should regional utilities interoperate.

Services as infrastructure serve as a second paradigm for research infrastructure, though a more complicated one. There are a number of computing services that we have come to expect as infrastructure beyond the provision of the physical Internet. There are the services like DNS that are needed to make the Internet work; there are the services like e-mail that work over the Internet that we have also come to expect; and there are services like digital libraries that are more efficiently provided centrally, but have not become expectations yet. We can think of a digital library or data service as a utility infrastructure that fuels research rather than as the virtual reflection of the Library as building. Just as our machines need electricity, so our minds need information. Just as library services are a form of expected research infrastructure, so digital library services make sense as research infrastructure.

Much of the turn towards cyberinfrastructure focuses on the development of these large information services. The age of small research projects developing scholarly electronic editions is passing. We can all see the value of shifting from individual editor-run projects maintaining information services to a model where research data services are managed as infrastructure with centralized providers and a professional staff dedicated to the infrastructure. That would let researchers move on, just as they do after publishing a monograph. We expect publishers and libraries to maintain our scholarship after the research, so why not have equivalent service infrastructure to maintain our virtual scholarship?

That we don’t have national digital research data/text archives or libraries despite the decades of development is one of the hurdles that might explain the turn to infrastructure, though I worry the time may have passed for such a utility, as it may be perceived as unnecessary given large-scale commercial services like Google Books. For a sustained discussion of digital library (DL) developments and infrastructure, see Carl Jay Lagoze’s dissertation on Lost Identity: The Assimilation of Digital Libraries into the Web.

A legitimate question to ponder is why the “imposition from above” model was successful in the context of the Internet, but not in DLs. A look at the history on the Internet reveals a key factor that initial deployment and ramp-up occurred within a tightly scoped community, academic institutions and (primarily defense-related) research labs. The infrastructure had a long percolation period in this context before its subsequent mass popularization. This is quite different than the DL infrastructure work, which from the beginning was motivated by visions of widespread grassroots dissemination inspired by scenarios such as that articulated by then Vice President Gore in his “schoolchild in Carthage, Tennessee plugs into the Library of Congress” speeches ( 9

Organizations that Run, like Governments

When we look closely at civic infrastructure, we see that the physical infrastructure and service infrastructure are dependent on organizations for maintenance and operation. In fact, if it is important that infrastructure last and be open, then the organization that maintains it is more important than the item itself. A good organization that builds and maintains bridges is more important than any one bridge. A bridge might be built, but it won’t be safe to cross if there aren’t regular safety checks and engineering support. It thus follows that good infrastructure includes the management, staffing, ongoing budgets, and support equipment that keep it all working. If we think of the Library as a traditional form of research infrastructure, we can see the importance of professional organization. The buildings and the books are important, but the Library can’t work as infrastructure without professional staff organized and funded to maintain services.

That said, describing organizations as infrastructure seems to push the definition. We tend to think of infrastructure as what you can touch and use, not the maintenance organization. One can see this in the ongoing politics of physical infrastructure renewal which are stable entertainment for those interested in municipal politics and stimulus packages. On a regular basis there are calls for infrastructure renewal like the dramatic and “hard-hitting” 1983 America In Ruins which has the ruins of a Roman forum on the cover. The cover says it all: the American Empire will fall apart as the Roman one did if there isn’t the political will to invest in infrastructure renewal. The report, while documenting the state of national infrastructure in the US, starts mostly with political recommendations to create the sustained organization and attention needed.

We might ask why calls for renewal are needed? The reason is that funding bodies like to build new infrastructure, but don’t like to budget for its ongoing maintenance. What funders can see is appreciated; maintaining infrastructure that is so expected that it becomes transparent is a thankless job. Funding new stuff looks progressive; maintaining infrastructure doesn’t impress.

Sustainability and governance is likewise an issue for cyberinfrastructure. When applying to CFI it is actually not the researchers who apply, but the universities that apply (with a researcher as a project leader). CFI requests ongoing maintenance plans, expects the university to take ownership, and does provide some additional funding, though most feel it is not enough. Edwards et al., in a must-read report that came out of a workshop bringing historians and social scientists to bear on CI, argue the importance of the social to infrastructure:

It is also possible that a tech-centered approach to the challenge of data sharing inclines us toward failure from the beginning, because it leaves untouched underlying questions of incentives, organization, and culture that have in fact always structured the nature and viability of distributed scientific work. Questions of trust loom large here, and run both ways. (Understanding Infrastructure: Dynamics, Tensions, and Design, p. 32)

It is therefore important to think of infrastructure realistically as some mix of hard visible components, softer services, and professionals that operate and maintain the two. I suspect that the largest part of the costs for the cyberinfrastructure proposed for the humanities will go to people, not hardware or buying services. This is despite the perception that when you invest in infrastructure you are buying the hard stuff, like roads.

Policies that Make Interchange, like Standards

Digging another level down, one finds that essential to certain types of cyberinfrastructure are the standards, policies, and procedures that allow us to run the infrastructure. It matters that electricity is provided at a standard and advertised voltage. Governments have zoning laws, policies, and procedures for handling construction both of the infrastructure they will maintain and for those who build new developments on infrastructure.

In computing we see the importance of standards in technologies like the World Wide Web. What makes virtual infrastructure like the web work is not one cable or one web browser, but the W3C standards that let different tools work together. The story we tell about the web as infrastructure is that all it took is HTTP and HTML to spark the collaborative and open development of information infrastructure. This is the lightest type of infrastructure, where there is no material or service base to maintain, but a base of definitions and standards on which others build layers. This is the most attractive paradigm for infrastructure for funders, as it is the least expensive to maintain. Perhaps things like the Text Encoding Initiative Guidelines are the real infrastructure of humanities computing, and consortia like the TEI are the future for light and shared infrastructure maintenance.

Defining Cyberinfrastructure Again

Having looked at paradigms for what infrastructure is, I will now turn to how it is defined, because I am going to argue that the act of defining is a political one that shifts the boundary of what is in and out. The act of defining things as infrastructure positions them as things like roads, utility services, organizations and standards. That in turn triggers expectations about the value and support needed for the infrastructure. Calling something infrastructure is not a neutral act; it turns that thing into something that:

  • is broadly useful to a public,
  • is therefore well enough understood that we are sure it is useful,
  • is confidently expected to foster economic or research activity,
  • should be funded by the public for the public,
  • becomes invisible as its use becomes expected, and
  • is maintained for the long term by some organization that has ongoing funding to maintain the infrastructure.

Now you can see the difference between research infrastructure and research. Research, by contrast, is not expected to be useful, necessarily, and certainly isn’t expected to be useful to a public. Research is about that which we don’t understand, while infrastructure shouldn’t really be experimental. Research is expected to be funded by the public, but we do not expect any one research project to be funded. Nor do we expect to fund a research project for the long term.

Cyberinfrastructure is supposed to foster research. The model can be said to have three parts that together are expected to generate research. The researcher is supplied by research grants to do research on research infrastructure. Well designed and well maintained infrastructure should lead to reduced supply costs and more research from each researcher. A well run system invests in all three (researchers, research supplies and research infrastructure). Investment in research grants supports researchers and pays for supplies. Investments in the infrastructure enhance the productivity of the researchers much as good roads support economic productivity.

This brings me back to definition. In effect, redefining something as infrastructure is a way of moving it from the category of research and therefore changing the urgency of its provision and changing the perception of who should fund it and maintain it. It is, in short, a great way to argue that some organization like a university or government should fund something in perpetuity rather than fund it as a grant would for a particular period and limited group. Calling something cyberinfrastructure distinguishes it from that which only a project needs and which is needed only for the duration of the research.

We can see the redefinition at work in how funding like CFI works. The Social Science and Humanities Research Council (SSHRC), the federal research funding agency for the humanities in Canada, traditionally supported a class of research called “research tools.” SSHRC, on their Standard Research Grants web page, gives the following examples of eligible tools:

  • bibliographies, indices and catalogues of research collections;
  • concordances and dictionaries (refer to SSHRC Research Data Archiving Policy);
  • materials that facilitate access to archival holdings or collections such as repository guides, inventories of a group of manuscripts or of a body of archives, inventories or documentary materials, thematic guides to archival materials, records surveys and special indices;
  • scholarly editions; and
  • data series. (“Apply for Funding - Standard Research Grants”)

When CFI was introduced, the legislation defined “research infrastructure” in a way that included collections, computer software, and information databases. Proposals to CFI from the humanities, like the TAPoR project that I led, argued that certain research tools qualified as research infrastructure. We received funding to buy and set up tools that were the online equivalents to the research tools SSRHC funds. Faced with the significant new funding offered for research infrastructure, we negotiated with CFI to define humanities research tools as infrastructure. SSHRC even supported us in this and they continue to work closely with CFI to articulate the boundaries.

One can see this redefinition of cyberinfrastructure in the humanities also in the Mellon supported report, Our Cultural Commonwealth. The summary page where you can download the report is dominated by the answer to the question “What is Cyberinfrastructure?” The answer provided is:

“Cyberinfrastructure” is more than just hardware and software, more than bigger computer boxes and wider pipes and wires connecting them. The term was coined by NSF to describe the new research environments in which capabilities of the highest level of computing tools are available to researchers in an interoperable network. These environments will be built, and ACLS feels it is important for the humanities and social sciences to participate in their design and construction. Ed Ayers has commented that much of the work of developing the Valley of the Shadow was analogous to building a printing press when none existed. Effective cyberinfrastructure for the humanities and social sciences will allow scholars to focus their intellectual and scholarly energies on the issues that engage them, and to be effective users of new media and new technologies, rather than having to invent them.
“Cyberinfrastructure” becomes less mysterious once we reflect that scholarship already has an infrastructure. The foundation of that infrastructure consists of the libraries, archives, and museums that preserve information; the bibliographies, finding aids, citation systems, and concordances that make that information retrievable; the journals and university presses that distribute the information; and the editors, librarians, archivists, and curators who link the operation of this structure to the scholars who use it. All of these structures have both extensions and analogues in the digital realm. The infrastructure of scholarship was built over centuries with the active participation of scholars. Cyberinfrastructure will be built more quickly, and so it is especially important to have broad scholarly participation in its construction: after it is built, it will be much harder to shift, alter, or improve its foundations. (ACLS Commission on Cyberinfrastructure summary page,

Note how the commission drew on the work of the 2003 NSF report Revolutionizing Science and Engineering through Cyberinfrastructure. In Our Cultural Commonwealth, they draw on the Atkins report (as the NSF report is known) for a definition of infrastructure, one that to some extent determines the outcome.

In other words, for the Atkins report (and for this one), cyberinfrastructure is more than a tangible network and means of storage in digitized form, and it is not only discipline-specific software applications and project-specific data collections. It is also the more intangible layer of expertise and the best practices, standards, tools, collections and collaborative environments that can be broadly shared across communities of inquiry. (Page 6)

It is also worth noting how the ACLS Commission writes a history to CI arguing that libraries, finding aids, journals and so on are already existing infrastructure, and cyberinfrastructure is just the extension of what we expect into the digital realm. Many of these things like concordances and dictionaries we would call (following SSHRC) “research tools.” Others, like digital editions of content, I would call just editions. Few would have called them infrastructure except in the weakest sense of something that others build on. What changed was how these things have to be funded. A good print concordance or critical edition can be treated as a project. Once it is done, you print it, sell it to libraries, close down the project and move on. Not so with digital editions or digital tools. They, it seems, need to be maintained perpetually to be accessible at all—you can’t print a bunch of copies, put them in libraries, and let the librarians deal with the maintenance.

Infrastructure is a change in funding model

The reason for this shift is that we have a growing-up problem in the digital humanities, and one that has been noted under a different rubric. The crude way to put it is that we are drowning in our own research poop. The more sophisticated digital works we create, the more there is that has to be maintained and maintained at much greater cost than just shelving a book and occasionally rebinding it. Centers and institutes get to the point that they can't do anything new because maintaining what they have done is consuming all their resources. One way to solve that problem is to convince libraries to take your digital editions, but many of us don’t have libraries with the cyberinfrastructure. Another way to deal with this is to define certain tools as cyberinfrastructure so that they are understood as things that need ongoing support by organizations funded over the long term. If the scale is right we might even have an economy of scale so that we could all pay for a common organization to maintain the commonwealth of infrastructure, and that is one read of what Bamboo is trying to do: determine what things are needed in common for research and then develop a consortium that could develop and sustain them for us at a cost we can afford if spread around.10 A worthy goal that may be too late or just in time, given the fiscal storm that could redefine higher education.

Dangers of Infrastructure

However, there are dangers to such redefinition. This is not the place to discuss all the dangers of infrastructure, so I am going to list a few and focus on one in particular, which is the losing of research to infrastructure. But first, a reminder of some of the usual dangers of infrastructure to offset the almost universal call for more of it:

  • Research infrastructure is not research just as roads are not economic activity. We tend to forget when confronted by large infrastructure projects that they are not an end in themselves. There is an opportunity cost to investing precious research funds into infrastructure. Every $100,000 lab that lasts four years before needing renewal is the equivalent to $25,000 a year for a Ph.D. student to do research for four years.
  • Infrastructure projects can become ends in themselves by developing into an industry that promotes continued investment. To sustain infrastructure there develops a class of people whose jobs are tied to infrastructure investment. You can get situations, as one does in municipal politics, where ongoing infrastructure investment forms a political feedback loop (otherwise called corruption), where politicians spend money on construction because the construction companies reliably provide election funding back.11 The point is that you can get a community invested in maintaining infrastructure not for research, but for their continued existence.
  • Infrastructure needs to be maintained. Any investment in infrastructure carries the expectation that if that infrastructure is useful it can expect reinvestment. That, of course, is the reason for shifting projects from research to infrastructure, because the nature of the project calls for sustained funding, but the fiscal reality is that if every project is treated like infrastructure then at some point there is no loose funding left for new projects. We either build infrastructure and don’t maintain it, as people have argued is happening with your physical infrastructure, or we end up so heavily committed to maintenance that there is no room for research innovation. It seems to me that most funders want, like absent fathers, to seed the infrastructure and then step away from maintaining it by insisting that the applicants have sustainability plans, few of which really work.12
  • Infrastructure can distort the field and alter the ecology of a field. Highways are a good example. The postwar boom in highway building changed our relationship to the car and where we live. Interstate highways were matched by state highways, which were matched by municipal highways, many of which were driven through vibrant neighborhoods so as to make it easy for the middle class to leave town for the suburbs. It is no longer clear that we benefited from the modernist shift to a suburban, detached-house-and-cars lifestyle that was facilitated by massive road building in the 50s and 60s, not to mention the destruction of older neighborhoods, ravines, and gutting of city centers. Much of this expensive infrastructure development was, at the time, perceived as needed for economic modernization. Only those whose neighborhoods were cleaned up and replaced with projects complained. What if North America had seen massive investment in mass transit infrastructure comparable to Europe’s?13 How confident are we that massive research infrastructure won’t likewise change the ecology of research in unpredictable ways?

The danger that should concern researchers is what Understanding Infrastructure calls premature fixing of infrastructure.

Given its relative immaturity and the rapidly changing technological backdrop against which cyberinfrastructure is unfolding, efforts not to prematurely “sink” or “fix” the form and vision of cyberinfrastructure (or distinct cyberinfrastructure projects) should be supported. (p. 42)

When some research tool is redefined as infrastructure, researchers lose control over its formation. Turning into infrastructure shifts responsibility from researchers to the research infrastructure profession. The turn also changes what can get funded. If, for example, translations were to be seen to be infrastructure, we might find that we couldn’t get research grants to do them—that would be the responsibility of infrastructure professionals.

Figure 1
Figure 1 (Picture 1.png)

Image by Stan Ruecker

Tools are reinvented as we reinterpret

A pernicious version of the argument for shifting things like tool development over to infrastructure goes that we need to stop “reinventing wheels” as if that was what happens when research tools are redeveloped. The suggestion is that humanists have a tendency to reinvent things uselessly when it would be more efficient to hand the job over to professional software engineers who would do a better job and do it once and for all. Maybe. Setting aside the fact that wheels are reinvented over and over, to fit new models of cars, the reinvention of meaning is exactly what characterizes the humanities. Tools are not used to extract meaning according to objective principles. In the humanities we reinvent ways of making meaning within traditions.14 We are in the maintenance by reinvention and reinterpretation business and we don’t want our methods and tools to become invisible as they are part of the research. To shift tool development from researchers to infrastructure providers is to direct the attention of humanities research away and to surrender some of the research independence we value. To shift the boundary that defines what is legitimate research and what isn’t is something humanists should care passionately about and resist where it constrains inquiry. I can understand the impatience funders may have with the plodding iterative ways of the humanities—do we really need another interpretation of Plato—but that just means that we humanists have to do a better job explaining the value of reinterpretation rather than allow the organizational boundaries to be moved.

Infrastructure is a boundary interpretation

This brings us to the issue of sustainability. Ironically, it is the reinvention of the humanities that is its most robust form of sustainability in the humanities. The humanities sustain traditions of performance and interpretation not by fixing them in infrastructure but by continually reinventing them. Plato is not an archaeological park which, once surveyed, can be safely preserved for future visitors as philosophical infrastructure complete with an “interpretative center.” He is a source of ongoing, and often creative, reinterpretation, and that is what sustains interest in Plato. The death of Plato would be when we tire of reinventing his tradition and move on. To prematurely turn humanities computing questions, quarrels, inventions, deformations and challenges into infrastructure risks taking them out of the play that is the humanities at the very moment when they matter. As a computing humanist involved in experiments in analytics I struggle with the question of when to let go of the play for the good of the infrastructure of others.

Reinvention is sustained in play not structure

Text Analysis Tools as an Example

The boundary between technological and organizational means of information processing is mobile. It can be shifted in either direction, and technological mechanisms can only substitute for human and organizational ones when the latter are prepared to support the substitution. (Understanding Infrastructure, p. 3)

I give, as an example of boundary moving, a genre of research tool I have been involved in: text analysis tools. I do this in order to demonstrate the fluidity of definition that can move such work from research to infrastructure and back again. I will do it by talking about selected projects that have attempted to develop text analysis tools.

Figure 2
Figure 2 (P 4.png)

PRORA output

PRORA. In 1966, the University of Toronto press published Glickman and Staalman’s Manual for the Printing of Literary Texts and Concordances by Computer. The manual covered the operation of PRORA, which was a mainframe batch concordance generation tool like OCP. PRORA was developed by a humanist (Glickman) and an engineer to facilitate the preparation of print concordances. The research tool was the print concordance; PRORA was a tool to support researchers like Glickman.

Figure 3
Figure 3 (P 5.png)

TACT screen

TACT. TACT, released in 1989 by the University of Toronto Centre for Computing in the Humanities, was one of a number of projects that set out to develop an accessible interactive text analysis tool in the 1980s. The tool was, and still is, free for download and the MLA published the manual, Using TACT.15 TACT was designed to be usable on an IBM PC running MS-DOS and is still used with DOS emulators. The model was to build a widely useful and interactive tool that could run on what most humanists had on their desk. No one called tools infrastructure back then; it was just a research tool funded by a university centre. TACT was also one of the first tools that was designed to be used for interpreting on the PC interactively. You didn’t run a text through a batch process and then study the resulting print concordance, you used the tool in research.16 The tool instantiated changing ideas about how computer-assisted interpretation would take place. Finally, TACT was co-developed by a team with professional programmers and academics associated with the Centre. John Bradley and I adapted it to the web in the TACTweb project in 1998.17

CETH meetings. Susan Hockey, when she was Director of the Center for Electronic Texts in the Humanities (CETH) at Princeton/Rutgers, organized two meetings to develop a coalition, similar to the TEI, to develop the next-generation tool. As Hockey explained in a post to HUMANIST,

For some time, those of us active in humanities computing have felt the need for better and/or more widely accessible text analysis software tools for the humanities. There have been informal discussions about this at a number of meetings, but so far no substantial long-term plan has emerged to clarify exactly what those needs are and to identify what could to be done to ensure that humanities scholars have readily-available text analysis tools to serve their computing needs into the next century.18

Much of the discussion, of which I was part, circled around the question of whether we wanted to develop a “garden variety” tool that, like a word processor, could be installed and used by our colleagues easily. The emphasis of this model was on personal computing, ease of use, and a general feature set developed following a needs analysis. In short, what was imagined was a TACT for Windows that was capable of using TEI markup and was so easy to use that we could convert our colleagues. Some of us argued for a web-centric and modular alternative, an alternative that was taken seriously, but what matters here is this vision of what would have been personal analysis software infrastructure. But there was also a second model at work, and that was an organizational model of an international grouping that would plan and develop one universally useful tool. Alas, the project never went anywhere—the time for academic PC tool development had been bypassed by the popularity of the web.

Figure 4
Figure 4 (P 2.png)
Figure 5
Figure 5 (P 3.png)

TAPORware input and output screens

TAPoR and TAPoRware. Building on HyperPo, a project Stéfan Sinclair had for a web-based analytical tool, a bunch of us developed a proposal for CFI to develop a Text Analysis Portal for Research, which was funded in 2002.19 TAPoR as a CFI infrastructure project developed all sorts of infrastructure, including text databases, servers and labs at six Canadian universities. Two important components were the TAPoR portal and a set of reference tools, TAPoRware.20 The model was that there should be a portal that allowed people to discover and use tools that could be registered by developers as web services running elsewhere. The portal would give access to a broad collection of atomic tools that could run over the web. It was a deliberate experiment in cyberinfrastructure, as CFI was funding it and they valued innovation. The portal would be a broadly accessible web infrastructure that would encourage research and development of tools by others which could then be “published” through the portal. The portal was built on contract by the professional programmers of Open Sky Solutions in close dialogue with us. Parts of the model worked and parts didn’t. The TAPoRware tools are used around the world, but the portal is complex and clumsy and is therefore being reinvented. Web services aren’t as reliable as they should be and users want simplicity and reliability. My point here is that the model was to keep tool development as research but make the research tools easy to discover and use through portal-like infrastructure. A further paradigm was that tools could be embedded in online texts as small viral badges, thereby hiding the portal and foregrounding the visible text, an experiment we are just embarking on.

SEASR. The Software Environment for the Advancement of Scholarly Research (SEASR) is a more ambitious infrastructure model that builds on work around text-mining done at the NCSA.21 SEASR has a visual programming environment where programmer-users can develop applications (or flows) that can then be deployed on robust hardware for use in projects. It thus reconciles flexibility (in that programmers can create new components and advanced users can develop new flows) and robust delivery (in that a useful flow can be deployed as infrastructure and integrated into other projects). This model moves the most into the realm of infrastructure to be developed by professional engineers and supported for humanists. Humanists are encouraged to use the components, and, if they are sophisticated, to program their own flows on an infrastructure which generally has to be run by a center as infrastructure. The return is scalability and reliability. Content publishers are also encouraged to develop sophisticated tools in SEASR that can then be integrated into collections or other tools like Zotero. One can imagine how SEASR could be scaled up to the cloud to provide a visual programming and tool delivery platform for humanists.

There is a rich history of modeling tools for interpretation

While I have no doubt done a disservice to all of the projects listed, this quick survey was designed to show how fluid are the boundaries between research and infrastructure when it comes to text analysis tools. Many of these projects didn’t even conceive of themselves as infrastructure projects, but would fit under later definitions. We have been reinventing our tools, but each time based on revisiting the model as to who develops the tool, how it is distributed, where it is run, how much control the researcher has, who is responsible for it, and whether it is research itself. I suspect we are going to keep on reinventing this wheel and experimenting with models as the community matures, but the time may have come for a “die-off” and rationalization that leaves us with fewer, but better maintained models.


Some might read this paper as critical of the turn to cyberinfrastructure as it is a standard move in the humanities to “problematize” some accepted truth as a way of undermining it. My intent was not to declare “gotcha,” but to draw attention to the defining conversation we have to have. Infrastructure is not as transparent as it seems. That the turn to infrastructure is political; that it involves redefining what is research; and that it has dangers doesn’t mean we shouldn’t do it. My point is that we should do it thoughtfully, cognizant of what we may lose and the costs to research. I would go so far as to say that negotiating what is and what isn’t infrastructure is a good way to define what should by supported by whom. That line will shift. It will also vary from one institution to another. I therefore conclude with some suggestions. Most of these are adapted to the humanities from Edwards et al., Understanding Infrastructure: Dynamics, Tensions, and Design, one of the wiser reports in the field.

  • We need to learn from the history of sociology of infrastructure development (as Understanding Infrastructure did.) We have colleagues that have studied other infrastructural revolutions—let’s listen to them.
  • We need to support infrastructure experiments. We need to support prototyping in order to test risky models before massive investment. Such experiments should also be designed so that they do not become commitments to infrastructure. Such experiments are and should continue to be legitimate and valued research in humanities computing and library and information science.
  • We should turn to research infrastructure where the infrastructure is not an area of research, but where it is clearly useful for a wide community.
  • We need to recognize the social dimension of infrastructure—it isn’t just stuff and it is rarely neutral. Large-scale investments almost always have opportunity costs and paths not taken. It is better, especially in the humanities where the need for cyberinfrastructure is far from obvious to our colleagues, to go slow and be inclusive than to impose from above.
  • Given a funding climate where long-term investments are less likely, we need to look at collaborative and social models for developing and maintaining infrastructure. The SETI@home project should be our paradigm, not the National Library (or Library of Congress.)
  • We need to imagine infrastructure not just for professional researchers at universities, but the amateur researchers in the community. If we want long-term political investment, we need to open it up the community.

Infrastructure turns your thinking away to new problems


Atkins, D. E., et al. Revolutionizing Science and Engineering Through Cyberinfrastructure. Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure, 2003.

Bement, Arden L. Jr. “Shaping the Cyberinfrastructure Revolution: Designing Cyberinfrastructure for Collaboration and Innovation,” First Monday, 12:6, 2007.

Budget Implementation Act, 1997, c. 26.

Choate, Pat and Susan Walter. America in Ruins: The Decaying Infrastructure. Durham, N.C.: Duke Press Paperbacks, 1983.

Edwards, Paul N., Steven J. Jackson, Geoffrey C. Bowker, and Cory P. Knobel. Understanding Infrastructure: Dynamics, Tensions, and Design, 2007.

European Strategy Forum on Research Infrastructures. .

Freeman, Peter A. “Is 'Designing' Cyberinfrastructure—or, Even, Defining It—Possible?” First Monday. 12:6, 2007.

Green, David. “Cyberinfrastructure for Us All: An Introduction to Cyberinfrastructure and the Liberal Arts.” Academic Commons (December 2007).

Glickman, Robert Jay and Gerrit Staalman. Manual for the Printing of Literary Texts and Concordances by Computer. Toronto: University of Toronto Press, 1966.

KPMG. Evaluation of Foundations. Report prepared for the Treasury Board Secretariat of the Government of Canada, 2007.

Lagoze, Carl Jay. Lost Identity: The Assimilation of Digital Libraries into the Web, Ph.D. Thesis. Cornell University, 2010.

Lancashire, Ian, Ed. Using TACT with Electronic Texts. New York: Modern Languages Association of America, 1996.

Mackie, Christopher J. “Cyberinfrastructure, Institutions, and Sustainability.” First Monday. 12:6, 2007.

McGann, Jerome and Lisa Samuels. “Deformance and Interpretation,” New Literary History. 30:1, 1999, pages 25-56.

Negroponte, Nicholas. being digital, Alfred A. Knopf, New York, 1995.

Project Bamboo.

Research Councils UK. e-Science.

Singel, Ryan. “ISP Quarrel Partitions Internet,”, March 18, 2008.

Smith, John. “A New Environment For Literary Analysis.” Perspectives in Computing. 4:2/3, 1984, pages 20-31.

SSHRC. “Apply for Funding—Standard Research Grants.”

The CFI Story.

Unsworth, J., et al.. Our Cultural Commonwealth. Report of the American Council of Learned Societies Commission on Cyberinfrastructure for the Humanities and Social Sciences, 2006.


  1. “The CFI Story” on the CFI web site describes the intentions thus:"CFI support is intended to:
    • strengthen Canada’s capacity for innovation;

    • attract and retain highly skilled research personnel in Canada;

    • stimulate the training of young Canadians through research;

    • promote networking, collaboration, and multidisciplinarity among researchers;

    • ensure the optimal use of research infrastructure within and among Canadian institutions.”

  2. According to a Treasury Board Evaluation of Foundations, “CFI was described, at that time, as an entirely new approach by the government to the support of research and development. From this starting point, involving a once-off investment of $800 million, the federal government went on to create a variety of foundations that either receive conditional grants for disbursement over a finite number of years or to create perpetual endowments that use the income generated by the endowment to fund their disbursement programs and operations."
  3. See the “Quotes” page on the CFI web site, . I doubt all researchers in the humanities would share this view of CFI as revolutionary, but it did, for those of us who applied, change how we thought about research funding.
  4. This is how Green starts his article “Cyberinfrastructure For Us All: An Introduction to Cyberinfrastructure and the Liberal Arts” which introduces a special issue on the subject. Other articles in the issue are also worth reading. See .
  5. The Bement quote is from remarks he gave on “Shaping the Cyberinfrastructure Revolution: Designing Cyberinfrastructure for Collaboration and Innovation” which have been published in First Monday.
  6. Regarding e-Science under “About the UK e-Science Programme,” the UK e-Science web site describes how “The e‑Science Core Programme … has supported the development of generic technologies, such as the software known as middleware that is needed to enable very different resources to work together seamlessly across networks and create computing grids.” . For the ESFRI see .
  7. Freeman, “Is ‘Designing’ Cyberinfrastructure — or, Even, Defining It — Possible?” The emphasis is his. In fact, this quote is centered, bolded, italicized and in blue on the web page just in case we miss it.
  8. See Singel, “ISP Quarrel Partitions Internet.” .
  9. Page 3, footnote 5 of Lagoze, Lost Identity: The Assimilation of Digital Libraries into the Web.
  10. See the Project Bamboo web site for more on this initiative. It is too early to say exactly what the infrastructure they develop will look like, but the rhetoric is very much about developing tools in the cloud as infrastructure and distributing the costs over a wide consortium.
  11. There is currently a scandal around road construction spending in Montreal. See stories like “Corruption scandal’s web ensnares struggling ADQ” (Rhéal Séguin, The Globe and Mail online, Oct. 28, 2009).
  12. Christopher Makie has an article, “Cyberinfrastructure, Institutions, and Sustainability,” that reflects on models for maintaining what we build.
  13. While the sins of highway infrastructure investments are not the point of this essay, titles like Twentieth-Century Sprawl: Highways and the Reshaping of the American Landscape (Gutfreund, Owen. New York: Oxford University Press, 2004) suggest that we should beware of the redefining that infrastructure investments can lead to.
  14. McGann and Samuels argue something similar in “Deformation and Interpretation”: “Our deformations do not flee from the question, or the generation, of ‘meaning.’ Rather, they try to demonstrate—the way one demonstrates how to make something, or do something—what Blake here assertively proposes: that ‘meaning’ in imaginative work is a secondary phenomenon, a kind of meta-data...” (p. 48).
  15. Using TACT can now, courtesy of the MLA, be downloaded as a PDF. The software that runs on MS DOS is available at .
  16. ARRAS (Archive Retrieval and Analysis System) was probably the first interactive concordancer. Smith writes about it in “A New Environment For Literary Analysis,” describing his model thus: “ARRAS should not be thought of as a ‘black box’ into which one inserts a text along with a set of commands and out of which one receives a completed analysis. A better analogy is a toolbox containing a set of tools, each designed for a particular task. The ARRAS design always presumes a human inquirer at the center. Thus ARRAS amplifies, rather than replaces, specific perceptual and cognitive functions” (p. 22).
  17. See .
  18. Humanist Discussion Group, Vol. 10, No. 54. .
  19. HyperPo is still running (on a TAPoR server) at . In many ways it has been superceded by Voyeur, which is also being developed by Sinclair.
  20. The portal is available at . It is currently being moved to the University of Alberta and installed on an High-Performance Computing installation here (another form of infrastructure). TAPoRware is at and . TAPoRware, as used as it is, is being replaced by Voyeur, which is meant to scale and offer more functionality.
  21. For more on SEASR or to download it, go to . See also the MONK (Metadata Offer New Knowledge) project at . MONK has built an interesting interface for users to run tools on text collections.

Collection Navigation

Content actions


Collection as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Module as:

PDF | More downloads ...


Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens


A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens


A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks