Online Humanities Scholarship: The Shape of Things to Come » Give us editors! Re-inventing the edition and re-thinking the humanities

Book by: Jerome McGann.

Book by: Jerome McGann. E-mail the author

Give us editors! Re-inventing the edition and re-thinking the humanities

Module by: Gregory Crane. E-mail the authorEdited By: Frederick Moody, Ben Allen

“We are entering a great age of editing.”

—Jerome McGann, at an October 1997 Conference at MIT

Introduction

This paper offers a response to Roger Bagnall’s contribution on Digital Papyrology, but a proper response to this particular topic requires addressing the broader topic behind this workshop: the reinvention of editing in a digital age. More than a decade ago, at a conference at MIT, Jerome McGann remarked in passing that we were entering a great age of editing. These words were not among his prepared remarks—when this programmatic remark was called to his attention several years later, he had forgotten the words but warmly endorsed the sentiment. The papers in this workshop suggest the impending truth of that prophetic remark. Scanning books and generating transcriptions is the incunabular phase of digital publication.1 We need to rethink the goals of editing in the light of the possibilities and challenges of emergent digital media.2 We are not entering—we have already entered and will never leave—a new intellectual space, where the speed and the distance between question and answer is qualitatively different from that for which we were trained.

In a digital world where we can publish video and sound and where we can annotate space, we need to extend our vision of editing beyond linguistic sources. In his paper for this collection, Ken Price talks about “topic-based editing,” of which his own Civil War Washington3 provides one example.4HyperCities5 illustrates the opportunities of annotating coordinates in space and time, allowing us to trace such events as the turmoil in Tehran after the 2009 Iranian elections and a tumultuous succession of public buildings over the past century in Berlin. Alison Muri’s Grub Street Project6 sets out to bring an entire moment in history to life. If we are to publish documents—especially documents as enmeshed with their material and cultural context as tweets from Tehran or newspapers from eighteenth century London or nineteenth century Washington, we need to embed them within rich cultural databases and to imagine our textual annotations as links into geographic, visual, quantitative, and textual data.

Within this essay, I restrict myself to the editing of textual sources, but within that field I understand editing in a very broad sense as making our primary textual sources usable for scholarly work. If we take this as an intellectual model, then a wide range of document-centric publications is relevant. These include not only facsimile, diplomatic, and critical editions but also translations, commentaries, and even specialized lexica and indices—documents that are hypertextual in nature, largely composed of individual annotations and expositions upon named portions of a primary source.7 The boundary between editing in this sense and other categories of publication is, in this case (as in almost any classification task), fuzzy. Essays in expository prose that largely follow the structure of a document to elicit an interpretation should probably be considered as well. At least some such studies would be better served if published as hypertextual guides through a document, directing a reader’s focus to one passage after another and using chunks of argumentation to draw out various features of the primary source and comparanda.8 The instinct for such publications is deep, and early forms of such hypertextual publications have appeared in various guises (PerseusPaths, Walden’s Paths,9 etc.).

1) We need editors—lots of them. We have before us a new model of intellectual life in general and especially within the humanities. We have valued scholarship that is difficult to produce and almost as difficult to understand. When a 2009 tenure track job listing asked for candidates who can support contributions and original research by undergraduates as well as MA students within the field of Classics, almost none of nearly two hundred applicants had been trained to think about what MA-level students, much less undergraduates, could contribute to the field or about what meaningful research they might be able to conduct.10 A few had creative ideas and had even experimented in their teaching but they had done so outside of—and in some measure in spite of—their formal training. Most of those with whom we spoke shifted uncomfortably in their chairs as we pressed them on this point.

We have vast amounts of work before us—far more than a relative handful of salaried academics can accomplish and plenty accessible to our students and to those who love a given subject but maintain a day job doing something else.11 We need to edit the entire record of humanity. Brute digitization provides physical access to digital representations that are qualitatively more useful than anything possible in print—print publication constitutes only a small dimensional reduction of the space in which we now move. At least as important, we have at our disposal a growing set of analytical tools that can make these sources intellectually as well as physically accessible.12 At one end, we can detect not only words and phrases but also ideas in vast collections of data—the bigger the better, in fact.13 The same currents that flatten individual human analysts provide the lift on which many of our algorithms can soar, allowing us to find within vast collections patterns that yield themselves to deep contemplation—and indeed, to the most traditional of intensive reading.14 At the other end, we can now automatically generate background information—a workable commentary—with which to contextualize what we see. And we have begun to attack the greatest of all logistical barriers in intellectual life—the heretofore impenetrable barrier of language. In print culture, we could do nothing with documents in languages that we had not studied. Already today, if we combine machine translation of individual words as well as passages, morphological and syntactic analysis, dictionary lookup, and text mining we can begin to work with sources that were once inaccessible.15

Vast collections and clever services provide a starting point for human analysis. Consider one particular example from my own field. Most Latin literature was produced after, and much builds upon, the tiny surviving corpus of Classical Latin.16 We have an endless supply of intellectually accessible and eminently useful undergraduate and MA-level projects, with our students building upon their training in Classics, analyzing the results of automated systems, and producing introductions, commentaries, and annotated translations of individual documents. We can then publish these as components of increasingly sophisticated digital libraries that can parse their structure and mine the machine-actionable information within them: the scholarly labor applied to each edited document becomes training data that then improves the automated results for the rest of the document in question, as well as the corpus of digital Latin.17 If we move towards community-driven models of updating and preserving such editions, preserving the original contributions within a versioning system but allowing the documents to evolve as their authors pursue their careers, these editions can serve as starting points rather than as fixed and obsolescent snapshots.

If we understand editing as the process of enabling others to think about an object from the past, then the editorial process applies as much to spaces (e.g., the development of the Unter den Linden in Berlin), buildings (e.g., the Brandenburg Gate), and objects (e.g., the Quadriga atop the Brandenburg Gate, a chariot drawn by four horses driven by Victoria, the Roman goddess of victory), as to topics (e.g., the development of Prussian nationalism of which Unter den Linden and the Brandenburg Gate are expressions).

Editors can be as much artists as scholars, for the editor who contextualizes an object directs the reader, listener or viewer to a finite set of data that align themselves into meaningful patterns. Editors in this sense differ from authors insofar as they direct their audiences’ gaze away from themselves and towards the object of contemplation. The point is not to vanish—to vanish is to deceive and to imply a transparency that simply does not exist. Rather, editors must provide the clearest possible account of their own biases.18 In this, they resemble their colleagues in the sciences, who explain how the data was collected and how they conducted their experiments so that others can draw their own conclusions about the strengths and limitations of their work.

2) In a digital age, philologists need to treat our editions as components of larger, well-defined corpora rather than as the raw material for printed page layouts.19 Many may take this as obvious but few have pursued the implications of this general idea. The addition of punctuation, the use of upper case to mark proper names, specialized glossaries, the addition of name and place indices, and even translations prefigure major classes of machine-actionable annotation—interpretations of morphological and syntax analyses, lexical entries, word senses, co-reference, named entities are only a subset of the features we may choose to include as new practices of editing emerge. Even when we turn to the most heavily studied classical Greek and Latin texts, a radically new world is taking shape. We have returned to an age of the editio princeps—not literally the first edition, but the first edition in a medium so distinct from that which preceded it that it constitutes a new beginning. We see before us a great age—indeed, a heroic age, one filled with triumphs and false starts, messy, destabilized and destabilizing, and, above all, dynamic.

The remainder of this paper will focus primarily upon the new forms of editing and their consequences. But before exploring those consequences, we will first outline some basic goals based upon the changing possibilities within the digital culture.

Humanities, Classics, Philology: Assumptions and Implications

Strategic goals are slippery things. The following have proven useful to me and I offer them if only to explain the decisions implicit in what I will describe later. I offer implications that I have drawn from three largely hierarchical perspectives: the humanities as a whole, the study of the Greco-Roman world, and the responsibilities of a Classical Philologist.

Humanities

As a humanist, I seek to advance the intellectual life of humanity. I am not pursuing the cures to dread diseases or developing new sources of energy. To some extent, my work resembles that of the scientist or mathematician developing knowledge without immediate practical applications. But the funders of such basic research point to the long- term utility of such initially abstract activity—the basic research of today needs to be unfettered so that it can stumble upon the practical methods of tomorrow.

If our goal is to support the intellectual life of humanity by making intellectual actions transparent for inspection, then the editorial process, construed as the sustained process of making primary sources intellectually accessible, rises to the fore. The most brilliant hypotheses and argumentation only assume their full value insofar as any human being can drill down behind the exposition and into the evidence.

For me this goal has a very clear consequence. Publishing in conventional journal articles that assume specialized knowledge and that are legally restricted behind subscription gateways has much less appeal than it did when I began my life as a professional scholar a generation ago.20 More generally, I act on the assumption that all human beings should have access to the evidence behind any proposition about their shared cultural heritage. For me, this view is as arbitrary and natural as the idea of universal education. Access has a physical dimension—we cannot make every site and every artifact physically accessible but we can do far more with their digital counterparts. But access also has an intellectual dimension—we need to provide people with the tools that they need to think productively about what they see. To realize physical and intellectual access, we need rights regimes that allow the digital surrogates for human cultural heritage to flow freely and instantaneously back and forth between humans and machines.21

From this perspective, the most important among thousands of books published in Classics in the first decade of the twenty-first century was Christopher Blackwell’s Demos, an e-book that earned its author tenure.22 This book not only provides a survey of Athenian Democracy and its underlying institutions but also systematically provides machine-actionable links between its exposition and open-access versions of relevant primary sources in Greek and English. These links include not only simple citations but also services such as dynamic maps and searches for keywords. In the history of intellectual life this act was as profound as it was simple. Demos was not the first such publication. In the 1980s, Thomas Martin, one of the founders of the Perseus Project, created an Overview of Greek Culture23 that appeared both as a print book published by a university press and as a electronic publication that contained machine-actionable links to primary sources and to dynamic services. Christopher Blackwell was the first to create such a publication as an independent project and to hazard his career upon the experiment.

In my own work, I can point to at least one major practical consequence of this larger goal. There are clear benefits to working with the most up-to-date editions of Greek and Latin source materials, but if we work with primary sources that are legally entailed, whether by contract or copyright law, we bear the cost of losing open access and open content. Open access in this case designates the ability to make a source freely available and to allow anyone without restriction to examine the primary sources on which our work is based. The fact that many of these sources may not have accompanying translations in a language familiar to a particular audience is a subject to which I will return.

I use open content to designate the right for third parties to modify, repurpose and redistribute derivative works from original digital editions freely. Even if we only interest ourselves to our professional colleagues with privileged access to commercial databases, we cannot conduct emerging scholarship without the ability to create derivative works. This includes not only the production of new editions but also of scholarship that augments digital corpora. The Hestia Project24 in the UK, for example, took the geo-coding within the Perseus digital edition of Herodotus as a starting point, augmenting and correcting the automatically generated information from Perseus and creating a new database with which to study the geographic relations within Herodotus.25 They were able to begin this project without formal permissions because the content was available under a Creative Commons license26 and they can now redistribute the results of their work freely. Among the most important fields for much emerging humanities research—certainly for those of us who work with language—are corpus and computational linguistics.27 The operative model is not the entrepreneurial Dickens seeking copyright protection to make money. Historical linguistic sources must be defended as data sets that must circulate as freely as their counterparts in environmental science or astrophysics if our research is to realize its full potential.

In Classical Studies, at least, the benefits of open access and content have, in the view of those most actively working with digital corpora as corpora (as opposed to giant virtual indices), outweighed the benefits of more recent but restricted editions.28 We have chosen to base our corpora upon the best public domain editions. Where later editions contain information that affects debate, members of the community can identify these passages and add their own annotations. If rights holders assume a more pragmatic posture, we can align new editions, even when we depend upon Greek and Latin text generated by OCR, with earlier curated, public domain editions.29

Once we shift from publications that are static in form and unchanging by legal restrictions and into a world of versioned, dynamic linguistic data, then our textual sources become living entities that can evolve. Their current state is only a single datapoint. An edition that provides demonstrably superior information today is strategically inferior to an edition that can improve over time. If members of the community feel that the editions need to be improved, they can create their own versions and/or annotate those that exist.

Classics

As a Classicist, I seek to advance the role that Greco-Roman antiquity can play in the intellectual life of humanity. Several practical consequences emerge if we make this assumption.

First, we need to provide access to the full material and textual record so that we can understand the ancient world properly. This is hardly a new idea: such giants in the foundation of modern Classical Studies as Friederich Wolf (1759-1824) and August Boeckh (1785-1867) asserted that even the student of Greek and Latin language must pursue the totius antiquitatis cognitio, a knowledge of antiquity in its full range and without compromise.30 Nor is this insistence on a comprehensive view limited only to philologists: Robin Osborne and Susan Alcock assert in the introduction to a collection of essays entitled Classical Archaeology that they seek to create classical archaeologists who “first and foremost…will use the texts along with material archaeology, offering both a context for archaeology and an indication of areas of material culture ideologically understated or repressed.”31

It is one thing to assert the totalitätsideal that many great scholars have espoused. Creating an environment that would support this overwhelming task and that would enable researchers to draw upon more than a minute subset of the data from the material and textual record is a daunting task—but it is possible to achieve quite a bit over time. The major challenges are not logistical or financial—though these are both daunting—but social. We simply have to adapt our priorities and, even if we ourselves continue our traditional research, support those efforts needed, at the least, to support a new generation of research and, if we have a broader view, to enable Greco-Roman culture—not just literature or archaeology—to play the broadest possible role in intellectual life.

In the first decade of its work (c. 1987-1997), my research group developed representative digital collections on Greco-Roman culture, including a substantial amount of original photography and documentation that remains intellectually significant today.32 As digital photography, GPS and GIS systems, CAD, and other technologies became ubiquitous, we shifted our focus back to the textual record, but the early collections within Perseus remain large enough to illustrate the many challenges and to provide us with intellectual hooks through which to connect with other collections that have begun to emerge online. A new generation of digitally adroit scholars such as Thomas Elliott of the Institute for the Study of the Ancient World (ISAW) at New York University,33 Bruce Hartzler of the American School of Classical Studies at Athens,34 and Sebastian Heath of the American Numismatic Society35 have taken the lead in creating archaeological resources that are openly accessible and transparently interoperable.

To further this larger goal, my own group has for years collaborated with the German Archaeological Institute (DAI) and Arachne,36 the Cologne-based central object-database of the DAI.37 We have exchanged extended visits between members of our teams and numerous visits between Boston and Berlin. For us, integrating the collections within Perseus with this much larger and growing library of archaeological data is a very high priority.38 Perseus contains extensive information about thousands of objects and tens of thousands of images—much smaller holdings than those in Arachne but large enough to offer a bridge not just between our collections but between increasing archaeological holdings and increasingly sophisticated infrastructure for the textual record.

Study of the Greco-Roman world not only includes the full textual and material record but also the cultures from which it emerged (e.g., the influence of Near Eastern and Egyptian culture), with which it interacted (e.g., Persia, India, and even China), and to those whose cultural heritage it directly contributed (e.g., not only the West but the vast cultural swath from Rabat on the Atlantic to Kandahar, one of the Alexandrias that Alexander founded).

There are practical consequences to integrating both the textual and material records from many different cultures.39 We need to be serious about interoperability and not simply endorse it in the abstract. Interoperability involves face-to-face meetings and conversations, learning about the varying needs and opportunities of disparate communities, of pushing beyond surface difference to commonalities that are not at all obvious. It also means taking the time to understand very different scholarly cultures and even supporting the development of linguistic training—we need classicists proficient in Mandarin and Arabic as well as English, French, German, and Italian.

The consequences for us are significant. Collaboration with the DAI is particularly promising because it supports institutes and research projects around the world and must address archaeology on a global scale. We cannot, however, simply work with colleagues in Western Europe but need to develop on-going scholarly and personal relationships with scholars beyond the dominant networks of scholarly exchange from the twentieth century.

In our own work, we collaborate with colleagues in Cairo, Zagreb, and China. We need not only to deepen these existing collaborations, but to also reach out to regions such as South America and India. This involves a great deal of travel and a constant need to expand our own linguistic skills.

Philology

As a philologist, I have a particular responsibility to classical Greek and Latin. From a pragmatic perspective, my goal is to help Greek and Latin sources play the fullest possible role in the intellectual life of humanity—to help as many people as possible to think as creatively as possible about as many Greek and Latin sources as possible.40 This implies access, physical and intellectual, to the widest possible audience today and preservation for audiences of the future.

The surviving corpus of Greek and Latin sources includes, of course, the traditional corpora of texts that survive through literary transmission or on objects such as stone, papyrus, metal, and bone that have survived directly from antiquity. In a digital age, we have access already to more Greek and far more Latin sources produced after the classical period (e.g., 500 CE in the West, 600 CE in the East). The Thesaurus Linguae Latinae (TLL)41 focuses upon an archive of ten million words from ancient Latin authors recorded on paper slips.42 Johannes Ramminger, a TLL researcher, had in January 2008 already collected more than 200 million words of early modern Latin in digital form.43 David A. Smith, a computer scientist from the University of Massachusetts at Amherst, identified approximately 13,000 books whose cataloguing data listed them as being in Latin from among the approximately 1.5 million books available from the Internet Archive.44 While some of these books are multiple editions of classical authors and some are editions of Greek authors with Latin introductions, most of the 1.7 billion words in these books are probably unique Latin texts and thus represent a corpus almost one hundred times larger than the classical corpus. Google has, at present, digitized twelve million books, so the amount of Greek and Latin available in digital form is only going to increase.45

Some sources will always have greater appeal than others—Vergil will always attract more attention than your random nineteenth-century dissertation in Latin—but it also recognizes that much, indeed most, of our new interest in Greek and Latin may emerge from improved access to a rich body of sources that were physically accessible in only a few research libraries and archives. Even when sources outside of the traditional canon were physically available, their intellectual context and their idiosyncratic language made them intellectually inaccessible to all but those few readers who had in fact developed an advanced knowledge of Greek and Latin.

Students of Greek and Latin need to think about breadth as well as depth. We can treat our core authors as corpora on whom we can continue to lavish an extraordinary amount of labor. Many of the most heavily studied authors are relatively small: the Homeric Epics, Greek Drama, Catullus, Vergil, and Horace constitute approximately one million words. Authors such as Plato (600,000 words), Aristotle (1.1 million words), Cicero (1.1 million words), and Livy (570,000 words) are, however, considerably larger and do not lend themselves so readily to the same intensive methods that we can apply to the 35,000 words of Aeschylus. The TLL began work in 1894 on its ten million words from the most heavily studied Latin authors. With a staff of twenty Latinists, the TLL has completed approximately two-thirds of its task. As we move beyond the classical corpus into much larger collections where we lack the editions, commentaries, indices, specialized lexica and other scholarly infrastructure available for the heavily studied authors, we need different methods.

Digital philology and, indeed, much of modern editing, must depend upon two new disciplines with deceptively similar names, often lumped together but complementary in principle and separate in practice: computational and corpus linguistics.

We need computational methods as we confront tasks that overwhelm manual methods.46 As we confront the challenge of editing billions of words, we need editors who can apply automated methods, measure the results by analyzing randomly sampled subsets of the data, and provide large bodies of textual data, of known accuracy, useful for most automated purposes and ready for others to refine, in whole or in part, as time allows and interest dictates.

But important as computation may be, the most important new discipline for classical—and any philology—is corpus linguistics.47 All students of historical languages are, in some sense, corpus linguists, for they are studying corpora that are fixed—we may recover new papyri and inscriptions, but the surviving linguistic record, whether discovered or not, can never expand.

Editing in Classics

Editors of classical texts have for the most part focused on the challenge of reconstructing an original copy text. All scribes make errors and even relatively small error rates add up in a single manuscript. As texts are copied, they accumulate new errors. In some cases, relatively large texts survive with relatively few obvious errors—the 600,000 words of Plato present relatively few problems. In other cases, much shorter texts can become garbled in transmission, especially when only a few or even simply one manuscript survive—the 35,000 words in the seven surviving plays of Aeschylus are notoriously problematic. When print allowed scholars to publish hundreds and thousands of identical copies, transmission stabilized and scholars set out to undo the damage. Classicists published thousands of editions from the editions principes of the late fifteenth century through the great systematic editions of the nineteenth and twentieth centuries.48

Scholarship was so successful that classicists in the closing decades of the twentieth century shifted their focus away from editing and commentaries and more fully embraced the monograph-oriented publication culture of English and History. While we do find faculty at leading institutions who create editions and commentaries, most of these either work on less developed areas (e.g., Byzantine studies or less commonly read classical authors), or in areas such as papyrology, where new material needs to be edited and published, or came of age in the 1960s and 1970s. Thus a 2008 panel of the American Philological Association (APA) on “Critical Editions in the 21st century,” organized by Cynthia Damon, included scholars who worked on papyri, Medieval Latin, Greek science, and Byzantine literature. James McKeown, an editor of Ovid, and Donald Mastronarde, an editor of Euripides, received their BAs in 1974 and 1971 respectively.

Classicists do not appreciate the amount of scholarly labor at their disposal. The Bryn Mawr Classical Review (BMCR)49 included 3,008 reviews in the five years beginning with 2005 and running through 2010—approximately 600 reviews each year.50 Clearly, there will be some hastily produced works while others will reflect years of study. We might assume, conservatively, that the average item reflects one year of direct labor. For every dollar that we pay a faculty member, we need to assume a dollar for benefits (e.g., health care) and the complex overhead of running an institution of higher learning. If we assume $50,000 as a typical salary, then the cost of a year of academic labor is around$100,000. In other words, even if we ignore all the articles that classicists write and only consider those books that warrant reviews in BMCR, the 600 or so reviews each year represent, by a conservative estimate, more than 50 million dollars of labor. Of 704 entries counted in the 2009 BMCR volume, a quick survey turned up only 28 for editions and commentaries—or around four percent of the total. Another preliminary survey reinforced this figure. An analysis of 100 CVs (from a total of about 180) for a position in Philology advertised in the fall of 2009 revealed only three with an interest in producing editions and commentaries. In effect, classicists as a group have made a cost/benefit decision to allocate less than approximately five percent of their labor to the production of editions and commentaries. Improving the print infrastructure for the fifty million words of Greek and Latin that survive in manuscript transmission through about 500 CE was not a high priority—the benefits were no longer great enough to justify much scholarly labor. Rather, we invested our energy in interpretive articles and monographs. A number of factors have altered the cost and benefits of editing within the humanities. Each project participating in this workshop reflects, in varying ways, the subsequent recalculations of how to invest scholarly labor. 1) Our editions can reach a larger audience. First and most obviously, we can provide much better physical access to editions and commentaries, disseminating across space and preserving them over time.51 We are now—and have for years been—able to deliver the results of our work to a global audience—reaching hundreds of millions rather than thousands of locations. We also have in our institutional repositories a mechanism to preserve these editions over long periods of time—certainly providing far longer access than the ephemeral print runs common in traditional publication. The guarantor is not the medium—paper vs. digital—but the reorganization of our library priorities.52 We have the resources already in our library collection budgets to pay for dissemination and preservation. The questions are political and social rather than financial or technical. 2) A larger audience can make use of our editions. As we can see with the Advanced Papyrological Information System (APIS)53 and with the Homer Multitext Project,54 digital representations in 2d and 3d of manuscripts, papyri, inscriptions and other written artifacts can provide better visual data than any print facsimile edition could match.55 We can, as Peter Robinson has long demonstrated, encode the textual data in machine-actionable forms that allow us to analyze and visualize variants with greater precision.56 We can link Greek and Latin editions to modern language translations, either produced for the edition or already published elsewhere. We can add as much explanatory material as we have the time to produce and as we consider useful, including visual and textual explanations, static images, and dynamic visualizations. We can align our primary sources with the material record, not simply as a source for illustrations but to provide contrasting views of the lived world on which the textual and material record shed light. 3) Systematic annotation transforms the editorial process, redefines what readers can expect, and enables editions to interact directly with much larger collections. Our ability to annotate our primary sources has so far outstripped what we could do in print that annotation has evolved into something qualitatively different. We have long had indices of places, but efforts such as the Hestia Project have created machine-actionable databases with which to analyze the spatial relations implicit in our sources. For millennia, students of Greek and Latin have patiently answered such questions as: “What is the main verb?” “What is the subject?” and “What noun does this adjective modify?” When we systematically record the syntactic dependencies into a database and create treebanks of syntactically-analyzed sentences, we can convert impressionistic statements such as “common in tragedy” or “rare in late prose” into statements that are quantifiable and transparent, because we can call up the treebank and examine the decisions behind the numbers.57 We have always embedded our syntactic interpretations in our print editions—each comma and period reflects our interpretation of the language. Now we can make those assumptions explicit and then use them to support new questions and research.58 And, at the same time, the treebanks that support large-scale linguistic analysis can help the reader understand a complex sentence in Plato and thus expand the role that sentence can play in intellectual life. Here, as so often, we find the automated system and human observer interacting symbiotically, with each driving the other. 4) We need to shift from lone editorials and monumental editions to editors as … editors, who coordinate contributions from many sources and oversee living editions.59 We have vast amounts of work to be done as we create new generations of editions. Automated methods can detect patterns across the billions of words and hundreds of languages, but even automated methods often depend upon carefully curated data. We need diplomatic XML transcriptions of manuscripts and papyri. We need translations into modern languages, not only for human readers but also to support such automated methods as parallel text analysis. We need curated syntactic analysis for our treebanks. We have an endless supply of projects that are challenging but accessible to our students and that will increase in complexity as they move through their careers. They may start by analyzing sentences or distinguishing one Alexander from another but they can also begin to analyze linguistic, stylistic, intellectual, historical and other questions. We will—and must—depend upon a generation of fundamental scholarly work produced by our students, published online, linked to the passages on which they bear, and preserved indefinitely. 5) Digital editing lowers barriers to entry and requires a more democratized and participatory intellectual culture. The decentralization of editing is a necessity that has further consequences. A fall 2009 listing for a tenure track job welcomed “candidates who can support contributions and original research by undergraduates as well as MA students within the field of Classics.” This seemingly innocuous term flummoxed almost all of the 180 or so applicants to this position—most simply ignored it. Some had creative ideas but these were ideas they had developed themselves. The intellectual culture of Classical studies assumes a long apprenticeship model, with advanced graduate students working their way toward a point where they can publish articles in specialist journals and books in academic presses.60 In a culture of digital editing, our students can begin contributing in tangible ways as soon as they can read Greek—first-year Greek students are already able to distinguish text from commentary in the digitized Venetus A manuscript of Homer.61 Intermediate students of Greek and Latin offer their own analyses of individual sentences for the Greek and Latin treebanks—contributions that are then compared against each other and then added to a public database, with the names of each contributing student attached to each sentence.62 These contributions can develop seamlessly into undergraduate and MA theses of real value and immediate use. When our students publish previously unpublished material or contribute to knowledge bases, we find ourselves in a participatory culture of active learning. Pale clichés about citizenship and democratization suddenly become tangible. There is nothing innovative in having undergraduates contribute to and then conduct research within a field—promising students in the sciences, for example, regularly begin working in laboratories, taking measurements or conducting technical procedures, and then develop experiments of their own. Classics is—or should be—a demanding field, but no more so than the sciences. In the second half of the twentieth century, we developed courses and degrees in classical studies that removed or minimized the burden of mastering Greek and Latin. We are now in a position to create another path through the field, one that can be as challenging as any curriculum we have offered in the past but that also engages our students as collaborators. 6) The emerging digital environment can potentially allow editors to accomplish more with the same degree of effort. Word processing, high-resolution digital images, and email alone have changed the way in which editors can carry out their traditional tasks. The millions of dollars and euros invested in digital editing should have made traditional editorial practices faster and less expensive. The problem is, of course, that digital tools change what is possible and challenge us to redefine our tasks, making older models inappropriate—feature creep from one perspective, renaissance from another. Adding syntactic analyses for every word in a text, for example, should not significantly increase the overall editorial task—the real effort lies, or should lie, in thinking about each and every word in the text. If anything, editors should spend more time thinking about different interpretations for each sentence, knowing that they can publish these alternate interpretations in a form that can be visualized or analyzed. Beyond simple calculations of time and money and the new questions we can ask, there is a chance that new scholarly instruments such as treebanks may allow us to make progress with thorny questions of reconstructing even our best-studied texts. We will certainly be able to frame arguments about various readings and conjectures when we have new and extended linguistic comparanda. Even if we avoid the idea of progress in reconstructing classical texts (and in this case, progress is a plausible category because in many cases we are trying to reconstruct a single source), we can at least provide new evidence with which to advance our discussions. Three Dimensions of Change The previous section described changes in the cost-benefit calculations that we as scholars make that bear upon the role of editing. This section shifts the focus and examines three dimensions by which to evaluate the current and emerging impact of digital scholarship. While I have selected several projects as exemplary for particular advances, I do not mean to imply that any one project only represents a single category. 1) Advancing established scholarship. Digital environments only exert long-term change if they first address the well-understood problems and aspirations of scholarship.63 Roger Bagnall’s piece on the rise of digital papyrology documents what is arguably the biggest success story within the digital humanities. Papyrologists from different institutions and nations have collaborated over a period of years to transform their core materials into shared digital collections and to create a functional digital environment for the shared editing of new texts.64 Several factors are at play. Papyrologists have new material that they need to publish. The traditional audience is not huge and thus there are fewer problems with rights holders restricting open-access publication. Most of all, there is what Ulrich Wilcken (1862-1944) long ago phrased the amicitia papyrologorum,65 a distinctively and consciously collegial relationship among papyrologists and a determination to support each other and their field. The amicitia papyrologorum, imperfect and intermittent as it may be, is nevertheless an extraordinary achievement and has conferred upon papyrology a competitive advantage in the fierce Darwinian struggle of academic disciplines. The rise of Digital Papyrology both reflects and increases this advantage. Technology progress begins with our attitudes towards our colleagues and depends upon our ability to place some shared good above our quarrels and personal advantages. 2) Enabling new research. Every project represented in this workshop is expanding the frontiers of research beyond what was possible in print. I would highlight a project now getting under way that attacks a very traditional goal but is proceeding in a way that is only possible through the combination of advanced automated methods, very powerful computing and very large collections. The history of Latin is hardly a new subject, but we can place this project on a radically new footing. David A. Smith, who holds a Ph.D. in Computer Science and a B.A. in Classics, is one of the principal investigators for a four-year,2.5 million project funded by the National Science Foundation (NSF).66 He has downloaded more than 1.5 million books from the Internet Archive and from these has identified twelve thousand whose language is listed as being Latin. The resulting collection contains approximately1.8 billion words of Latin—almost two hundred times as much Latin as the ten-million–word database on which the TLL has labored for more than a century. If and when we should have access to the twelve million books that Google has already scanned, the collection of available Latin will only increase. Library metadata is, however, rough—we cannot, for example, distinguish the nineteenth-century Latin introduction from an accompanying edition of Cicero, and many books are cited by the date of the published edition (e.g., 1879 in Paris) rather than that of their original creation (e.g., 1623 in Leiden). There are multiple copies of the same author (e.g., ten editions of Horace). Organizing this rough assemblage will provide plenty of opportunity for advanced automated methods. Nevertheless, we now already have in hand the raw materials with which to rewrite the history of the Latin language over the course of two thousand years. Automated analysis with systematic sampling to evaluate error rates redefines the way in which we can conceptualize new research in this subject.

In print culture, Arabic speaking scholars of the Greco-Roman world had little access to, and less visibility within, the largely English, French, German and Italian publication space of classical scholarship. In a networked world where such knowledge bases as treebanks emerge as pre-eminent channels within which to publish interpretations of literary text, the first language of the scholar becomes less important. We are better positioned to establish new intellectual and collegial relationships across challenging barriers of space, language and culture.

3) Redefining who can contribute to scholarship. In this regard, Wikipedia remains an historic phenomenon because it has demonstrated a new mode of intellectual production—one that this philologist thought was at best implausible until Roy Rosenzweig confronted my prejudices with evidence and analysis.67 Classicists had developed their own community-driven project with the Suda Online (SOL),68 which has so far produced English translations for more than 27,000 entries from a large tenth-century Byzantine Greek historical encyclopedia of the ancient world. The SOL, however, mobilized professional scholars and included a fairly traditional editorial process.69 The most important project for Classical scholarship in the United States may be the Homer Multitext, because this project demonstrated not only what undergraduates could do in a very complex project but also the effect of participation in this project on their work and on their view of classics. The Homer Multitext defied my own personal expectations as to what undergraduates would do or would find interesting.

Digital editing lowers barriers to access not only for our students but also for students of Greek and Latin beyond the traditional English-speaking and Western European centers of Classical studies. The University of Cairo maintains a well-established and flourishing Classics department—few North American or European institutions can, for example, boast an intermediate Classical Greek class with more than one hundred students. Instruction takes place in Arabic and the departmental website has no translation in English or any other European language. The Classicists at Cairo and elsewhere in the Arabic speaking world are familiar with scholarship in a range of modern languages, but few indeed of us in Europe or North America are able to read their Arabic publications.

If we are developing a digital edition with morpho-syntactic analyses, disambiguated people and places, and other categories of machine-actionable annotation, our knowledge of Greek and Latin and our understanding of Greco-Roman culture are the most important factors. A diagram of a sentence from Aeschylus looks the same whether the scholar who produced that analysis thinks in English, Arabic or Chinese. The problem of language emerges as we articulate our reasons for choosing that interpretation of the sentence—we may find ourselves constructing hypertextual arguments that emerge, as much as possible, from the way we select and structure evidence. We also need to address the challenge of writing for machine translation. This would benefit from an authoring environment that would prompt us to clarify salient ambiguities as we write (e.g., does “case” refer to a grammatical category or a criminal investigation?).

We have an opportunity to transform the community of scholarship and our ability to stimulate, from within the academy, the broad intellectual life of humanity. For Classics, we have an opportunity to create a global discipline that disseminates knowledge and stimulates debate about Greco-Roman antiquity across the linguistic and cultural barriers that we inherited from print culture.

What is to be done?

A great deal has been done over the past twenty-five years. We have established collections that have evolved over a number of years and long outlived initial grants and enthusiasm. Roger Bagnall, Allison Muri, Greg Nagy, and Kenneth Price have described disciplinary projects with overlapping needs. Alan Burdette, Charles Henry, Paolo D’Iorio, Penelope Kaiserlian, and Todd Presner have reported substantive progress towards an infrastructure that can support scholarly activities today and foster innovation over time. But we have a long way to go. We need to think on a far broader scale and move much further beyond our original disciplines if we are to realize the potential of this new medium. We are still far too fragmented and constrained by our disciplinary perspective or the traditions of publication that we have inherited. Much of what we say echoes what we could have heard a generation ago. All of us need to move beyond first-generation efforts and do a better job of transcending our own disciplinary boundaries.

As a philologist with a particular responsibility for linguistic sources, I pose the following questions:

1. 1) Can we manage the linguistic sources of the contemporary world? This includes thousands of languages, time-based media and overwhelming scale.
2. 2) Can we manage the historical record of human language, extending more than four thousand years into the past and stretching, at minimum, from China and Japan to the Atlantic coast of Europe and Africa? Ultimately, the challenge here becomes scarcity, because, however rich our surviving sources may be, they are finite and imperfect. We cannot conduct experiments with native speakers of Classical Greek.
3. 3) How do our linguistic sources relate to the material record? How well can we integrate the texts of Thucydides’ History of the Peloponnesian War or Tom Sawyer to the very different datasets available for the Greek world of the fifth century BCE and North America in the nineteenth century?

None of us can solve or fully understand any of these three questions. We all have our own research projects and must focus our efforts if we are to make tangible progress. Nevertheless, the digital world has no borders and every digital project can potentially interact with each other. My work on the Greek historian Thucydides should combine in unpredictable and interesting ways with work on the Chinese historian Sima Qian, the North African historian Ibn Chaldun, and the memoirs of Ulysses S. Grant—the more recombinant my work, the better its chance not only of surviving but evolving long after my contribution has ceased. We all know the cliché that we should think globally and act locally. Whatever we do for the world in which we live, we should think about each of the global and the local every day in our scholarly work.

References

Alcock, Susan E. and Robin Osborne. (2007). Classical Archaeology. Malden, Ma: Blackwell Publishers.

Association of Research Libraries (ARL). (2009). The Research Library's Role in Digital Repository Services: Final Report of the ARL Digital Repository Issues Task Force. Technical Report. http://www.arl.org/bm~doc/repository-services-report.pdf.

Arms, William Y. and Ronald L. Larsen. (2007). The Future of Scholarly Communication: Building the Infrastructure for Cyberscholarship. Technical Report. http://www.sis.pitt.edu/~repwkshop/SIS-NSFReport2.pdf.

Babeu, Alison, David Bamman, Gregory Crane, Robert Kummer, and Gabriel Weaver. (2007). “Named Entity Identification and Cyberinfrastructure.” Proceedings of the 11th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2007), pp. 259-270. http://hdl.handle.net/10427/42681.

Bamman, David and Gregory Crane. (2009). “Computational Linguistics and Classical Lexicography.” Digital Humanities Quarterly, 3 (1), http://www.digitalhumanities.org/dhq/vol/003/1/000033.html.

Bamman, David and Gregory Crane. (2008a). “Building a Dynamic Lexicon from a Digital Library.” Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2008), pp. 11-20. http://www.perseus.tufts.edu/~ababeu/fp135-bamman.pdf.

Bamman, David and Gregory Crane. (2008b). “The Logic and Discovery of Textual Allusion.” Proceedings of the Second Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2008). http://hdl.handle.net/10427/42685.

Bamman, David, Francesco Mambrini and Gregory Crane. (2009). “An Ownership Model of Annotation: The Ancient Greek Dependency Treebank.” Proceedings of the Eighth International Workshop on Treebanks and Linguistic Theories (TLT8). http://www.perseus.tufts.edu/~ababeu/tlt8.pdf.

Barker, Elton. (2010). “Repurposing Perseus: the Herodotus Encoded Space-Text-Image Archive (HESTIA).” DFG-Perseus Workshop on Historical Texts. http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/events-en/nehdfg/pdf/hestia-presentation.

Blackwell, Chris and Thomas R. Martin. (2009). “Technology, Collaboration, and Undergraduate Research.” Digital Humanities Quarterly, 3 (1). http://www.digitalhumanities.org/dhq/vol/3/1/000024.html .

Blanke, Tobias. (2010). “From Tools and Services to e-Infrastructure for the Arts and Humanities.” Production Grids in Asia: Applications, Developments and Global Ties, pp. 117-127.

Bodard, Gabriel. (2008). “The Inscriptions of Aphrodisias as Electronic Publication: a User's Perspective and a Proposed Paradigm.” Digital Medievalist, http://www.digitalmedievalist.org/journal/4/bodard/.

Bodard, Gabrel and Juan Garcés. (2009). “Open Source Critical Editions: A Rationale.” In Text Editing, Print and the Digital World.Surrey, England: Ashgate Publishing, pp. 83-98.

Boeckh, August. (1966). “die Erkentniss des Alterthums in seinem ganzen Umfange” in Enzyklopädie und Methodenlehre der philologischen Wissenschaft (edited by Ernst Bratuscheck). Darmstadt.

Boeckh, August. (1968). On Interpretation and Criticism (translated by John Paul Pritchard). Oklahoma: University of Oklahoma Press.

Boschetti, Federico. (2009). “Digital Aeschylus—Breadth and Depth Issues in Digital Libraries.” Workshop on Advanced Technologies for Digital Libraries 2009 (AT4DL 2009). September 2009.

Boschetti, Federico, Matteo Romanello, Alison Babeu, David Bamman, and Gregory Crane. (2009). “Improving OCR Accuracy for Classical Critical Editions.” Proceedings of the 13th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2009), pp. 156-167. http://www.perseus.tufts.edu/~ababeu/ecdl2009-preprint.pdf.

Calder III, William M. (1991). “How Did Ulrich Von Wilamowitz-Moellendorff Read a Text?” The Classical Journal, 86 (4), (Apr. - May, 1991), pp. 344-352.

Chen, Jiangping, Yuhua Li, and Gang Li. (2006). “The Use of Intelligent Information Access Technologies in Digital Libraries.” Web Information Systems—WISE 2006 Workshops, pp. 239-250.

Clement, Tanya E. (2008). “‘A Thing Not Beginning and Not Ending’: Using Digital Tools to Distant-Read Gertrude Stein’s The Making of Americans.” Literary & Linguistic Computing, 23 (3), pp. 361-381.

Crane, Gregory, Alison Babeu, and David Bamman. (2007). “eScience and the Humanities.” International Journal on Digital Libraries, 7 (1), pg. 117-122. http://hdl.handle.net/10427/42690.

Crane, Gregory, Alison Babeu, David Bamman, Thomas Breuel, Lisa Cerrato, Daniel Deckers, Anke Lüdeling, Daid Mimno, Rashmi Singhal, David A. Smith, and Amir Zeldes. (2009a). “Classics in the Million Book Library.” Digital Humanities Quarterly, 3 (1). http://www.digitalhumanities.org/dhq/vol/003/1/000034.html.

Crane, Gregory, Alison Babeu, David Bamman, Lisa Cerrato, and Rashmi Singhal. “Tools for Thinking: ePhilology and Cyberinfrastructure.” (2009b). Working Together of Apart: Promoting the Next Generation of Digital Scholarship: Report of a Workshop Cosponsored by the Council on Library and Information Resources and The National Endowment for the Humanities. Washington, D. C. United States: Co-Sponsored by: Council on Library and Information Resources National Endowment for the Humanities, 2009-03. http://www.clir.org/activities/digitalscholar2/crane11_11.pdf.

Crane, Gregory, David Bamman, Lisa Cerrato, Alison Jones, David Mimno, Adrian Packel, David Sculley, and Gabriel Weaver. (2006). “Beyond Digital Incunabula: Modeling the Next Generation of Digital Libraries.” Proceedings of the 10th European Conference on Digital Libraries (ECDL 2006), pp. 353-366. http://hdl.handle.net/10427/36131.

Crane, Gregory and Chris Blackwell. (2009). “Conclusion: Cyberinfrastructure, the Scaife Digital Library and Classics in a Digital Age.” Digital Humanities Quarterly, 3 (1). http://www.digitalhumanities.org/dhq/vol/3/1/000035.html.

Crane, Gregory, Brent Seales, and Melissa Terras. (2009). “Cyberinfrastructure for Classical Philology.” Digital Humanities Quarterly, 3 (1). http://www.digitalhumanities.org/dhq/vol/3/1/000023.html#.

Deckers, Daniel, Lutz Koll, and Cristina Vertan. (2009). “Representation and Encoding of Heterogeneous Data in a Web Based Research Environment for Manuscript and Textual Studies.” Kodikologie und Paläographie im digitalen—Zeitalter-Codicology and Palaeography in the Digital Age. http://kups.ub.uni-koeln.de/volltexte/2009/2962/.

Dué, Casey and Mary Ebbott. (2009). “Digital Criticism: Editorial Standards for the Homer Multitext.” Digital Humanities Quarterly, 3 (1). http://www.digitalhumanities.org/dhq/vol/3/1/000029/000029.html.

Flaten, Arne R. (2009). “The Ashes2Art Project: Digital Models of Fourth-Century BCE Delphi, Greece.” Visual Resources: An International Journal of Documentation, 25 (4), pp. 345-362.

Hanson, Ann E. (2001). “Papyrology: Minding Other People's Business.” Transactions of the American Philological Association, 131, 297-313.

Hillen, Michael. (2007). “Finishing the TLL in the Digital Age: Opportunities, Challenges, Risks.” (Translated by Kathleen Coleman). Transactions of the American Philological Association, 137, pp. 491-495.

Jackson, Mike, Mario Antonioletti, Alastair Hume, Tobias Blanke, Gabriel Bodard, Mark Hedges, and Shrija Rajbhandari. (2009). “Building Bridges between Islands of Data - An Investigation into Distributed Data Management in the Humanities.” e-Science ’09: Fifth IEEE International Conference on e-Science, 2009, pp. 33-39.

Kirschenbaum, Matthew. (2007). “The Remaking of Reading: Data Mining and the Digital Humanities.” NGDM 07: National Science Foundation Symposium on Next Generation of Data Mining and Cyber-Enabled Discovery for Innovation. http://www.cs.umbc.edu/~hillol/NGDM07/abstracts/talks/MKirschenbaum.pdf.

Kummer, Robert. (2006). “Integrating Data from The Perseus Project and Arachne using the CIDOC CRM: An Examination from a Software Developer's Perspective.” Exploring the Limits of Global Models for Integration and Use of Historical and Scientific Information: ICS-FORTH Workshop, Heraklion, Crete: ICS-Forth, 2006. http://www.perseus.tufts.edu/~rokummer/KummerCIDOC2006.pdf.

Lehmberg, Timm, Georg Rehm, Andreas Witt, and Felix. (2008). “Digital Text Collections, Linguistic Research Data, and Mashups: Notes on the Legal Situation.” Library Trends, 57 (1), pp. 52-72.

Lüdeling, Anke and Amir Zeldes. (2008). “Three Views on Corpora: Corpus Linguistics, Literary Computing, and Computational Linguistics.” Jahrbuch für Computerphilologie,9, pp. 49-178.

Lynch, Clifford. (2006). “Open Computation: Beyond Human-Reader-Centric Views of Scholarly Literatures.” Open Access: Key Strategic, Technical and Economic Aspects, pp. 106-110. http://www.cni.org/staff/cliffpubs/OpenComputation.htm.

Mahoney, Anne. (2009). “Tachypaedia Byzantina: The Suda On Line as Collaborative Encyclopedia.” Digital Humanities Quarterly, 3 (1). http://www.digitalhumanities.org/dhq/vol/3/1/000025.html#.

Monella, Paolo. (2008). “Towards a Digital Model to Edit the Different Paratextuality Levels within a Textual Tradition.” Digital Medievalist, 4, http://www.digitalmedievalist.org/journal/4/monella/.

O’Donnell, Daniel Paul. (2010). “Different Strokes, Same Folk: Designing the Multi-Form Digital Edition.” Literature Compass, 7 (2), pp. 110-119.

Pasanek, Brad and D. Sculley. (2008). “Mining Millions of Metaphors.” Literary and Linguistic Computing, 23 (3), pp. 345-360.

Price, Kenneth M. (2009). “Edition, Project, Database, Archive, Thematic Research Collection: What's in a Name?” Digital Humanities Quarterly, 3 (3). http://digitalhumanities.org/dhq/vol/3/3/000053/000053.html.

Pritchard, David. (2008). “Working Papers, Open Access, and Cyber-infrastructure in Classical Studies.” Literary and Linguistic Computing 23 (2), pp. 149-162. http://ses.library.usyd.edu.au/handle/2123/2226 .

Pybus, John and Ruth Kirkham. (2009). “Experiences of User Involvement in the Construction of a Virtual Research Environment for the Humanities.” 5th IEEE International Conference on E-Science Workshops, pp.135-137.

Rausing, Lisbet. (2010). “Toward a New Alexandria: Imagining the Future of Libraries.” The New Republic (March 2010). http://www.tnr.com/print/article/books-and-arts/toward-new-alexandria.

Riva, Massimo and Vika Zafrin. (2005). “Extending the Text: Digital Editions and the Hypertextual Paradigm.” HYPERTEXT '05: Proceedings of the sixteenth ACM conference on Hypertext and Hypermedia, pp. 205-207.

Robinson, Peter. (2010). “Editing Without Walls.” Literature Compass, 7 (2), pp. 57-61.

Robinson, Peter. (2009). “Towards a Scholarly Editing System for the Next Decades.” Sanskrit Computational Linguistics, pp. 346-357.

Rosenzweig, Roy. (2006). “Can History Be Open Source? Wikipedia and the Future of the Past,” Journal of American History, 93, http://www.historycooperative.org/journals/jah/93.1/pdf/rosenzweig.pdf.

Ruhleder, Karen. (1995). “Reconstructing Artifacts, Reconstructing Work: From Textual Edition to On-Line Databank.” Science, Technology, & Human Values, 20 (1), pp. 39-64.

Schilit, Bill N. and Okan Kolak. (2008). “Exploring a digital library through key ideas.” JCDL ’08: Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries, pp. 177-186.

Sennyey, Pongracz, Lyman Ross, and Caroline Mills. (2009). “Exploring the Future of Academic Libraries: A Definitional Approach.” The Journal of Academic Librarianship, 35 (3), pp. 252-259.

Shaw, Ryan, Michael Buckland, and Ray Larson. (2009). “Integrating Tools for Synthesis into Digital Libraries.” Proceedings of JCDL 2009 Workshop: Integrating Digital Library Content with Computational Tools and Services.

Turner, E. G., T. C. Skeat, and J. David Thomas. (1967). “Sir Harold Idris Bell,” The Journal of Egyptian Archaeology, 53, pp. 131-140.

Footnotes

1. For more on the incunabular nature of much early digital publications, see Crane et al. 2006.
2. Literature Compass (http://www3.interscience.wiley.com/journal/123273796/issue) and the Digital Humanities Quarterly (http://digitalhumanities.org/dhq/) have both had special issues dedicated to the future of textual editing and creating digital editions. Similarly, the recently founded InterEdition project (http://www.interedition.eu/) is holding a series of workshops between researchers in the field of textual editing and information technologists to create a roadmap that they hope will lead to an “interoperable supranational infrastructure for digital editions.”
3. http://civilwardc.org
4. For more detail, see Price 2009.
5. http://hypercities.com/
6. http://grubstreetproject.net/
7. For a further discussion of this topic, see Crane, Seales and Terras 2009.
8. “Hypertext editions” that link primary sources (particularly individual works or documents) with a wealth of related materials, present a number of their own challenges and has been discussed among many others by O’Donnell 2010 and Riva and Zavrin 2005.
9. http://www.csdl.tamu.edu/walden/
10. A recent article by Blackwell and Martin 2009 explores the potential of undergraduate research for revinvigorating teaching in classics. Another interesting model of undergraduate research in Art History can be found in Flaten 2009.
11. Peter Robinson has also spoken of how this deluge of material has presented digital editing with its greatest opportunity: new collaboration models where scholars, students and the interested public can make contributions to digital editing Robinson 2010.
12. The development of computational tools to provide greater intellectual access to digital collections is a heavily studied topic; for some recent work see Chen et al. 2006, Shaw et al. 2009.
13. Interesting work in this area has included metaphor discovery (Pasanek and Sculley 2008) and quotation detection (Schilit and Kolak 2008).
14. Text or data mining in historical collections and its potential to support intensive reading has been examined by Kirschenbaum 2007, and Clement 2008.
15. Further detail on these topics can be found in Crane, Babeu and Bamman 2007.
16. To take just one example, the sheer amount of neo-Latin editions available online numbers over 30,000, according to the Philological Museum, http://www.philological.bham.ac.uk/bibliography/ .
17. The use of automated technologies to transform traditional reference works for Latin into machine-actionable knowledge sources has also been considered in Bamman and Crane 2009.
18. The need to make the algorithms, decisions and methodologies that an editor used in the creation of a digital edition more transparent has been called for by Monella 2008 and Bodard and Garcés 2009.
19. Some of the challenges to philology in the age of digital corpora have been examined by Boschetti 2009.
20. A series of arguments against academic scholarship being locked behind commercial paywalls have recently been articulated by Rausing 2010. In the field of classics, Pritchard 2008 has explored recent efforts to make more secondary scholarship available as open access.
21. The need for open content that is accessible both to humans and to computational processes has also been stressed by Lynch 2006, and Arms and Larsen 2007.
22. http://www.stoa.org/projects/demos/home .
23. http://www.perseus.tufts.edu/hopper/text?doc=Perseus:text:1999.04.0009 .
24. http://www.open.ac.uk/Arts/hestia/ .
26. The various kinds of creative commons licenses available are described at http://creativecommons.org/. Almost all of the TEI-XML files available from the Perseus Digital Library can be downloaded, and they are all licensed under the Creative Commons NonCommercial ShareAlike 3.0 License.
27. And yet unfortunately, the reuse of many linguistics resources (historical or otherwise) also involves complicated copyright and other legal issues, see Lehmberg et al. 2008 for an overview.
28. The benefits of open access and content for classical studies is pursued at greater length in Crane and Blackwell 2009.
29. We have previously reported on our initial results with OCR in Ancient Greek in Boschetti et al. 2009.
30. August Boeckh, Enzyklopädie und Methodenlehre der philologischen Wissenschaft, edited by Ernst Bratuscheck (Darmstadt 1966, ed. 3) 25: “die Erkentniss des Alterthums in seinem ganzen Umfange” = August Boeckh, On Interpretation and Criticism, translated by John Paul Pritchard (Oklahoma, 1968) 22; William M. Calder III, “How Did Ulrich Von Wilamowitz-Moellendorff Read a Text?” The Classical Journal, Vol. 86, No. 4 (Apr. - May, 1991), pp. 344-352.
31. Alcock and Osborne 2007, p. 8.
32. For a full list of publications relevant to this work, see http://www.perseus.tufts.edu/hopper/about/publications.
33. http://www.nyu.edu/isaw/.
34. http://www.ascsa.edu.gr/.
35. http://numismatics.org/.
36. http://www.arachne.uni-koeln.de/drupal/.
37. http://en.wikipedia.org/wiki/Arachne_%28Archaeological_Database%29.
38. Some initial work on this ongoing project has been reported in Kummer 2006 and Babeu et al. 2007.
39. Indeed, simply integrating several disparate textual and material collections within one domain, such as classics, can be very challenging, as explained by Jackson et al. 2009.
40. For more on the issue of philology in the digital world, see Crane, Seales and Terras 2009 and Crane et al. 2009b. For a specific look at creating an infrastructure that will support the new tasks of digital philology, see Deckers et al. 2009.
42. For more on the TLL, see Hillen 2007.
43. http://www.lrz-muenchen.de/~ramminger/words/start.htm.
44. http://archive.org.
45. The challenges presented by such massive collections to the discipline of classics has been investigated in Crane et al. 2009a.
46. This issue is more thoroughly explored in Bamman and Crane 2009.
47. For a good overview of the relationship between corpus linguistics, computational linguistics and historical corpora, see Lüdeling and Zeldes 2008.
48. For more on the tradition of creating critical editions in classics see Bodard and Garcés 2008 and Monella 2008.
49. http://bmcr.brynmawr.edu/.
50. This figure was calculated by downloading the web pages for each year and counting all the entries except for the twelve monthly “books received” entries.
51. See Bodard 2008 for some of the new opportunities of electronic publication and digital classics, with a particular focus on epigraphy.
52. The need for digital repositories and services that can both preserve and provide sustainable access to the wide range of digital objects now becoming available and the challenge this provides to the traditional models of research libraries is a widely discussed topic; for some recent work see ARL 2009 and Sennyey et al. 2008.
53. http://www.columbia.edu/cu/lweb/projects/digital/apis/index.html.
54. http://chs.harvard.edu/wa/pageR?tn=ArticleWrapper&bdc=12&mn=1169.
55. For more on the use of APIS and papyrological collections online see Hanson 2001, and for the Homer Multitext Project see Dué and Ebbott 2009.
56. For one of Robinson’s most recent discussions of the creation of digital editions, see Robinson 2009.
57. The Perseus Project has been developing a Latin Dependency Treebank since 2006, and work on an Ancient Greek Dependency Treebank began in 2008. Both treebanks can be downloaded from http://nlp.perseus.tufts.edu/syntax/treebank/ .
58. For example, the Latin Treebank has been utilized in various projects, including the development of a dynamic lexicon (Bamman and Crane 2008a) and the automatic detection of textual allusion (Bamman and Crane 2008b).
59. Robinson 2010 makes a similar argument for needing to harness the contributions of hundreds or thousands in the process of making digital editions.
60. Ruhleder 1995 offered a brief exploration of how digital publication and scholarship might challenge this tradition of apprenticeship in her larger consideration of the impact of the Thesaurae Linguae Graecae on classics as a discipline.
61. More information on this project can be found in Blackwell and Martin 2009.
62. This annotation model is explained further in Bamman, Mambrini and Crane 2009.
63. The importance of this fact, or of the need to build digital infrastructures or virtual research environments that help scholars with both their “traditional tasks” as well as “cutting edge” scholarship has also been echoed in various discussions of humanities cyberinfrastructure; see for example Blanke 2010, Pybus and Kirkham 2009.
64. For more on this work, see the website for the “Integrating Digital Papyrology” project, http://idp.atlantides.org/trac/idp/wiki/ .
65. E. G. Turner, T. C. Skeat, and J. David Thomas, “Sir Harold Idris Bell,” The Journal of Egyptian Archaeology, 53 (1967) 138.
66. #0910165: Collaborative Research: Mining a Million Scanned Books: Linguistic and Structure Analysis, Fast Expanded Search, and Improved OCR.
67. His arguments can be found in Rosenzweig 2006.
68. http://www.stoa.org/sol/
69. For an overview of the SOL and its editorial process, see Mahoney 2009.

