Featured Abstract: March 8

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract: “On the Value of Comparing Truly Remarkable Texts”

Gregor Middell, University of Würzburg

Looking at the comparatively short history of editions in the digital medium, one notices
that those projects which are highly critical about the edited text invariably end up pushing the boundaries of established practices in text modeling and encoding. For example, this has been the case for the HyperNietzsche edition, which developed its own genetic XML markup dialect, or for the Wittgenstein edition, which went as far as developing its own markup language. The ongoing genetic edition of Goethes Faust is no different in as much as it makes use of common XML-based encoding practices and de-facto standards like the guidelines of the Text Encoding Initiative but at the same time felt the need to transcend those, so it can cope with the inherent complexity of modeling its subject matter. Rooted in the tradition of German editorial theory, the Faust edition strives for a strict conceptual distinction between the material evidence of the the text’s genesis as found in the archives on the one hand and the interpretative conclusion drawn from this evidence on the other hand, the latter eventually giving rise to a justied hypothesis of how the text came into being. These two perspectives on the edited text, though complementary, are structured very differently and moreover cannot be modeled via context-free grammars in their entirety. Therefore it is already hard to encode, validate and process a single perspective via XML concisely and eciently, let alone both of them in an integrated fashion. Given this problem and the need to solve it in order to meet the expectations of scholarly users towards an edition which in the end claims to be “historical-critical”, the Faust project turned to multiple, parallel encodings of the same textual data, each describing the textual material from one of the desired perspectives. Necessarily the different encodings have to be correlated then, consequently resulting not in the common compartmentalized model of an edited text but in an integrated, inherently more complex one. In the work of the Faust project, this crucial task of correlating perspectives on a text is achieved semi-automatically by means of computer-aided collation and a markup document model supporting arbitrarily overlapping standoff annotations. The presentation of both this editorial workflow as well as its underlying techniques and models might not only be of interest in its own right; it might as well contribute to the answer of a broader question: Can we gradually increase our notion’s complexity of “what text really is” while still being able to rely on encoding practices widely endorsed by the DH community today.

Featured Abstract: March 5

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract: “Comparing representations of and operations on overlap”

Claus Huitfeldt, University of Bergen

Overlapping document structures have been studied by markup theorists for more than twenty years. A large number of solutions has been proposed. Some of the proposals are based on XML, others not. Some are proposals for use of alternate serial forms or data models, and some for stand-off markup.  Algorithms for transformations between the different forms have also been proposed. Even so, there are few systematic comparative studies of the various proposals, and there seems to be little consensus on what is the best approach.
The aim of the MLCD Overlap Corpus (MOC) is to make it easier to compare the different
proposals by providing concrete examples of documents marked up according to a variety of proposed solutions. The examples are intended to range from small, constructed documents to full-length, real texts.  We believe that the provision of such different parallel representations of the same texts in various formats may serve a number of purposes.
Many of the proposals for markup of overlapping structures are not fully worked out, or not well documented, or known only from scattered examples. Encoding a larger body of different texts according to each of the proposed solutions may help resolving unclarities or shed new light on difficulties about the proposals themselves.
Running or developing software to perform various operations on the same data represented in different forms may also help in finding out which forms are optimal for which operations. Some operations, even though well understood for non-overlapping data, may turn out not to be clearly defined for overlapping data.
Finally, a parallel corpus may serve as reference data for work on translations between the various formats, for testing conversion algorithms, and for developing performance tests for software.

Featured Abstract: March 1

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract: “Modeling Collaboration”

Julia Flanders, Brown University

If collaboration, in practical terms, is predicated on the compatibility of data (expressed variously and debatably as interoperability or interchange), then we can also say that it requires a kind of meta-modeling: that is, a clear expression of the differences and similarities between models. Tools like the TEI customization mechanism offer one approach to this kind of meta-modeling, but many questions require more detailed consideration: the level of precision at which this meta-modeling must take place, the specific vectors of similarity to be expressed, and the meaning or motivations of our customizations. Is it possible to use a mechanism of this kind in a rigorous way to support more effective collaboration?

Featured Abstract: Feb. 27

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract: “Virtual Scriptorium St. Matthias”

Andrea Rapp, TU Darmstadt

In order to virtually reconstruct the mediaeval library of the Benedictine Abbey of St. Matthias of Trier the project is digitizing all of the approximately 500 manuscripts that originate from the Abbey but are now dispersed throughout the world. The integration and modeling of catalog data allows presentation, navigation, research and networking of the library holdings. For the modeling of the project there are various challenges: Each manuscript is a witness of a text or a work, it should be associated with information about this specific manuscript / text / work and with critical text editions (e.g. Perseus) or other manuscript witnesses. At the same time, the collection of the library can be seen as an ensemble, which has been collected, maintained and curated with care. The DFG-Viewer is used to easily present and navigate each manuscript, although this tool has been developed primarily for printed works. This decision brings about some problems for data modeling (TEI, METS-MODS). At the same time all data will be incorporated into the virtual research environment TextGrid, where they are released for further processing. On the one hand, this allows to support the scholarly work of individual researchers (or research groups), on the other hand the virtual scriptorium St. Matthias can be “edited” as a social edition by the research community. One of the key questions will therefore be, whether and how the collaborative data modeling can be designed. www.stmatthias.uni-trier.de

Featured Abstract: Feb. 23

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract: “Where Semantics Lies”

Stephen Ramsay, University of Nebraska

Should the syntax of XML have been scrapped in favor of s-expressions?  This debate, which raged for years and which occasionally reappears, has all the ring of a religious war (Windows vs. Mac, Emacs vs. Vi, big-endian vs. little endian).  In this talk, I will suggest that while in general this discussion generated more heat than light, it pointed toward an important set of issues that bears on the problem of data modeling in the humanities.  The question isn’t which syntax is superior, but rather, what does it mean for a syntax to have a semantics and (more critically) where does that semantics lie within the overall system?

I begin by claiming that our common definitions of “semantics” (within computer science) are too vague, and offer a definition loosely based on Wittgenstein’s notion of meaning as a function of use.  I then use that definition to distinguish between XML as a syntax that binds its semantics late in the overall computational process, and an s-expression-based language (like Lisp) that defines its semantics early.  I then pose the question: What would it look like if we were to imagine systems that take our present data models and bind them early?

The purpose of this exercise is neither to rekindle this debate, nor even to suggest that the conception of semantics within XML or s-expressions is flawed.  It is, rather, to reimagine our current data models as having options beyond what has been commonly offered — not just data to which we apply algorithms, but data that is itself algorithmic.

Featured Abstract: Feb. 20

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract: “What is the Thing that Changes?: Space and Time through the Atlas of Historical County Boundaries”

Douglas Knox, Newberry Library

One would think that modeling historical administrative boundaries would be a straightforward matter, relatively free of the complications of more fuzzy phenomena in the humanities. In fact, however, the precision of modeling tools casts the inherent difficulties of modeling administrative change over time and space in sharp relief. This presentation will draw on examples from the Atlas of Historical County Boundaries, an NEH-funded project of the Newberry Library completed in 2010, which documents every change in county boundaries in what is now the United States from colonial times through the year 2000. In addition to reviewing fundamental data modeling decisions of the project, the presentation will explore interesting edge cases, connections and similarities to other kinds of data, implicit models, and alternative ways of approaching the question of what are the objects of interest that we imagine persisting through changes over time.

Featured Abstract: Feb. 16

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract: “Digital Literary History and its Discontent”

Fotis Jannidis, University of Wuerzburg

Literary history and digital humanities present themselves to the interested like an unfinished bridge marked by a huge gap the two sides. The side of literary history has been busy discussing the principles histories are constructed by and the demand of ever wider concepts of their subject. On the side of digital literary studies there are various attempts to read ‘a million books’ stretching the notion of ‘reading’ to new limits. Most work has been done on classification, specifically on genre classification (e.g. Mueller or Jockers). Work to close the gap can start from both sides. My talk will discuss some of the concepts underlying contemporary literary histories pushing towards a more formalized description. But this will only be possible for a very small part of the entire enterprise ‘literary history’ thus putting methods of digital literary studies in a subsidiary role. And even when it is possible to describe some aspect (genre, concepts like author, reader etc.) more formalized, most of the time this formal description cannot be applied automatically to larger collections of text. In a self-reflexive turn I will try to analyze how this ‘more formalized description’ is achieved describing thereby the gap between the conceptual modelling done in any humanities research and the demands of a more formal description.

Featured Abstract: Feb. 13

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract: “Objects, Process, Context in Time and Space – and how we model all this in the Europeana Data Model”

Stefan Gradmann, Humboldt University of Berlin

Once we start modeling complex objects as RDF-graphs, as aggregations of web resources in a linked data environment we quickly get into questions regarding the boundaries of these aggregations of web resources, the ways we could describe their provenance, the way we could version them including their context (and what are the boundaries of that ‘context’?). How do we model time and process context in such environments? Herbert van de Sompel has done some initial groundbreaking work in that area with his Memento project – but that is just one first step. We seem to have firmer ground for contextualisation on the spatial side: GeoNames, GeoCoordinates and the like seem to be much more stabilized conceptual areas. Maybe because the denotative aspect is stronger in space than in time?!

Featured Abstract: Feb. 9

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract – “The Person Data Repository”

Alexander Czmiel, Berlin-Brandenburg Academy of Sciences and Humanities

I will present the data model of the Person Data Repository, a project based at the Berlin-Brandenburg Academy of Sciences and Humanities, which pursues a novel approach to structure heterogeneous biographical data. The approach does not define a person as single data record, but rather as compilation of all statements concerning that person. Thus, it is possible to display complementing as well as contradicting statements in parallel, which meets one of the basic challenges of biographic research. In order to satisfy different research approaches and perspectives, the smallest entity of the Person Data Repository is not a person, but a single statement on a person, which is named “aspect” in the data model. An aspect bundles references to persons, places, dates and sources. By proper queries it will be possible to create further narrations, whose first dimension is not necessarily a person, but possibly also a time span or a certain location. Additionally, all aspects are connected to the corresponding source and to current identification systems respectively, like the LCCN or the German PND. Thus, scientific transparency and compatibility with existing and future systems is guaranteed. To collect and create aspects of a person we built the “Archive-Editor”, a java based tool with a user friendly but powerful interface for the Person Data Repository.

Featured Abstract: Feb. 6

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract: “Discovering our models: aiming at metaleptic markup applications through TEI customization”

Trevor Munoz, University of Maryland

The activity of data modeling involves generating a description of the structure that data will have in an information system. In practice, for many current humanities text projects, the outlines of these  descriptions and the information systems they will work within are already known—the vocabulary of the Text Encoding Initiative (TEI) and some kind of toolchain suited to storing, processing, and retrieving XML. This should not obscure the modeling challenges involved in humanities text projects. Indeed it is in the investigation and manipulation of these complex systems of information representation—through the customization mechanisms provided by the TEI—that much of the intellectual contribution of “small data” projects to the digital humanities can be found. Also at this point in a project, the roles of (digital) humanist and librarian are most closely aligned. An examination of the process of developing TEI customizations for several projects will show some of the decisions whereby the digital representation of texts become strategic models and also where the strategic emphases of librarians and humanists for those representations begin to fall out in slightly different ways. As the primary case study among those presented, the development of the Shelley-Godwin archive project will exhibit how TEI customization as an act of data modeling looks backward to traditions of editing and forward to new kinds of computer-enabled processing in an attempt to develop a rich, critically-engaged record of engagement with an important body of texts.