Featured Abstract: “Text, Intertext, and Context: Modeling a Map of Medieval Scholarly Practices”

Malte Rehbein, University of Würzburg

Medieval scholarly practices can be fully understood “erst im Zusammenspiel verschiedener methodischer Zugänge und nur unter Berücksichtigung der Gesamtüberlieferung” (only by interplay between various methodological approaches and with consideration of the whole written tradition) as Claudine Moulin has pointed out, analyzing early medieval vernacular glosses [1, p. 76]. It hence requires a holistic approach that takes into account not only texts but also intertextual relations and contextual information: historical, paleographic, codicological, bibliographical, and appreciation of processes of textual production and usage. Representing such comprehensive information is challenging as it requires the interplay of different data models. This presentation discusses this challenge along the case-study of the 8th century
“Würzburg Saint Matthew”[2]: a gospel text with glosses and commentaries compiled from more than forty different sources making it an intriguing, though complicated object of study[3].  The presentation further outlines considerations on the development of models for presenting this information within a research environment and concludes with an outlook toward a comprehensive visual map of medieval scholarly practices.

[1] Moulin, Claudine (2009): Paratextuelle Netzwerke: Kulturwissenschaftliche Erschließung und soziale Dimensionen der althochdeutschen Glossenüberlieferung. In: Gerhard Krieger (Hg.): Verwandtschaft, Freundschaft, Bruderschaft. Soziale Lebens- und Kommunikationsformen im Mittelalter, 56–77.
[2] Rehbein, Malte (2011). “A Data Model for Visualising Textuality — The Würzburg Saint Matthew”. Digital Humanities 2011: Conference Abstracts. Ed. The Alliance of Digital Humanities Organizations. Stanford: Stanford University Library. 204–205.
[3] Cahill, Michael (2002): The Würzburg Matthew: Status Quaestionis. In: Peritia 16, S. 1–25.

Elisabeth Burr, Universität Leipzig, Germany

“It’s all about Integration and Conceptual Change”

In Romance Philology the ordering of information extracted from sources into knowledge domains has always been part of the research process. However, the ordering of index cards containing such information according to certain categories and the setting up of relations between them has never been regarded as knowledge modelling or as the building of ontologies. Similar things could be said about ‘data’, ‘filtering sources for information’ or ‘markup’. Still today, doing research and presenting research results tends to be seen more as mental processes than as disciplined activities which need to be made explicit and taught. This state of affairs has serious implications for students. Not only is doing research and writing a ‘disciplined’ academic paper not widely taught, but methods of research and of good academic practice are also not conceived of as being of epistemological interest. Instead, they are considered to be mere skills which can be acquired in courses offered by non-academic service centres. Furthermore, as the computer is still looked upon and used as if it was just a modern form of typewriter, no need is seen to teach students a meaningful way of exploiting computer technologies for their research and writing in academic courses. If anything, students are encouraged to believe that writing an academic paper is all about ideas, creativity and genius and that the structure and the layout of a paper, the consistency of citations or the integrity of bibliographies among other things are formalities.
In order to change the concept of doing research and of academic paper writing and to foster a meaningful exploitation of computer technologies I have implemented in one of my courses of Romance linguistics a project-oriented approach where methods and tools of, and questions posed by the Digital Humanities play an important role. By involving students in the creation of a digital version of a text and showing them how to apply a TEI schema with the help of an xml-editor like oXygen or by getting them to archive the information about sources in a database like the one EndNote provides, students can actually learn a lot about texts, about data and styles, about systematization and consistency. Furthermore, the linking of information and the building of ontologies makes many aspects of scholarly work explicit to them. If the work they are doing contributes, moreover, to a portal like the one which has been created within the framework of the project (see “Von Leipzig in die Romania”, http://www.culingtec.uni-leipzig.de/JuLeipzigRo/) and which can itself be used to write TEI compliant academic papers, they not only get the chance to develop a different concept of doing research and writing academic papers, but also to conceive of the computer as a device for the manipulation of systematized data and thus also for the modelling of knowledge and not as a mere high-tech typewriter.

Featured Abstract: “Tagging in the cloud. A data model for collaborative markup”

Jan Christoph Meister, University of Hamburg

This paper discusses the data model underlying CLÉA, short for “Collaborative Literature Exploration and Annotation”, a Google DH Award funded project based on the CATMA software developed at Hamburg University. The goal of CLÉA is to build a web based annotation platform supporting multi-user, multi-instance, non-deterministic (and, if required, even contradictory) markup of literary texts in a TEI conformant approach. Apart from technical considerations, this approach to markup has some more fundamental consequences: First, when one and the same text is marked up from different functional perspectives, markup itself starts to become fluid, allowing researchers to aggregate markup just as we aggregate other meta-texts, namely according to their specific research interest. Second and in addition to the functional enhancement, there is also a social aspect to this new approach: the production of markup becomes a team effort.This paradigm shift from individual expert annotation to an “open workgroup”, crowd sourced approach is based on what we call a “one-to-many” data model that can be implemented using “cloud” technology. “Tagging in the cloud”, therefore, combines three new aspects on text markup – the social, the technological, and the conceptual.

Featured Abstract: “On the Value of Comparing Truly Remarkable Texts”

Gregor Middell, University of Würzburg

Looking at the comparatively short history of editions in the digital medium, one notices
that those projects which are highly critical about the edited text invariably end up pushing the boundaries of established practices in text modeling and encoding. For example, this has been the case for the HyperNietzsche edition, which developed its own genetic XML markup dialect, or for the Wittgenstein edition, which went as far as developing its own markup language. The ongoing genetic edition of Goethes Faust is no different in as much as it makes use of common XML-based encoding practices and de-facto standards like the guidelines of the Text Encoding Initiative but at the same time felt the need to transcend those, so it can cope with the inherent complexity of modeling its subject matter. Rooted in the tradition of German editorial theory, the Faust edition strives for a strict conceptual distinction between the material evidence of the the text’s genesis as found in the archives on the one hand and the interpretative conclusion drawn from this evidence on the other hand, the latter eventually giving rise to a justied hypothesis of how the text came into being. These two perspectives on the edited text, though complementary, are structured very differently and moreover cannot be modeled via context-free grammars in their entirety. Therefore it is already hard to encode, validate and process a single perspective via XML concisely and eciently, let alone both of them in an integrated fashion. Given this problem and the need to solve it in order to meet the expectations of scholarly users towards an edition which in the end claims to be “historical-critical”, the Faust project turned to multiple, parallel encodings of the same textual data, each describing the textual material from one of the desired perspectives. Necessarily the different encodings have to be correlated then, consequently resulting not in the common compartmentalized model of an edited text but in an integrated, inherently more complex one. In the work of the Faust project, this crucial task of correlating perspectives on a text is achieved semi-automatically by means of computer-aided collation and a markup document model supporting arbitrarily overlapping standoff annotations. The presentation of both this editorial workflow as well as its underlying techniques and models might not only be of interest in its own right; it might as well contribute to the answer of a broader question: Can we gradually increase our notion’s complexity of “what text really is” while still being able to rely on encoding practices widely endorsed by the DH community today.

Featured Abstract: “Comparing representations of and operations on overlap”

Claus Huitfeldt, University of Bergen

Overlapping document structures have been studied by markup theorists for more than twenty years. A large number of solutions has been proposed. Some of the proposals are based on XML, others not. Some are proposals for use of alternate serial forms or data models, and some for stand-off markup.  Algorithms for transformations between the different forms have also been proposed. Even so, there are few systematic comparative studies of the various proposals, and there seems to be little consensus on what is the best approach.
The aim of the MLCD Overlap Corpus (MOC) is to make it easier to compare the different
proposals by providing concrete examples of documents marked up according to a variety of proposed solutions. The examples are intended to range from small, constructed documents to full-length, real texts.  We believe that the provision of such different parallel representations of the same texts in various formats may serve a number of purposes.
Many of the proposals for markup of overlapping structures are not fully worked out, or not well documented, or known only from scattered examples. Encoding a larger body of different texts according to each of the proposed solutions may help resolving unclarities or shed new light on difficulties about the proposals themselves.
Running or developing software to perform various operations on the same data represented in different forms may also help in finding out which forms are optimal for which operations. Some operations, even though well understood for non-overlapping data, may turn out not to be clearly defined for overlapping data.
Finally, a parallel corpus may serve as reference data for work on translations between the various formats, for testing conversion algorithms, and for developing performance tests for software.

Featured Abstract: “Modeling Collaboration”

Julia Flanders, Brown University

If collaboration, in practical terms, is predicated on the compatibility of data (expressed variously and debatably as interoperability or interchange), then we can also say that it requires a kind of meta-modeling: that is, a clear expression of the differences and similarities between models. Tools like the TEI customization mechanism offer one approach to this kind of meta-modeling, but many questions require more detailed consideration: the level of precision at which this meta-modeling must take place, the specific vectors of similarity to be expressed, and the meaning or motivations of our customizations. Is it possible to use a mechanism of this kind in a rigorous way to support more effective collaboration?

Featured Abstract: “Virtual Scriptorium St. Matthias”

Andrea Rapp, TU Darmstadt

In order to virtually reconstruct the mediaeval library of the Benedictine Abbey of St. Matthias of Trier the project is digitizing all of the approximately 500 manuscripts that originate from the Abbey but are now dispersed throughout the world. The integration and modeling of catalog data allows presentation, navigation, research and networking of the library holdings. For the modeling of the project there are various challenges: Each manuscript is a witness of a text or a work, it should be associated with information about this specific manuscript / text / work and with critical text editions (e.g. Perseus) or other manuscript witnesses. At the same time, the collection of the library can be seen as an ensemble, which has been collected, maintained and curated with care. The DFG-Viewer is used to easily present and navigate each manuscript, although this tool has been developed primarily for printed works. This decision brings about some problems for data modeling (TEI, METS-MODS). At the same time all data will be incorporated into the virtual research environment TextGrid, where they are released for further processing. On the one hand, this allows to support the scholarly work of individual researchers (or research groups), on the other hand the virtual scriptorium St. Matthias can be “edited” as a social edition by the research community. One of the key questions will therefore be, whether and how the collaborative data modeling can be designed. www.stmatthias.uni-trier.de