Featured Abstract: Feb. 27

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract: “Virtual Scriptorium St. Matthias”

Andrea Rapp, TU Darmstadt

In order to virtually reconstruct the mediaeval library of the Benedictine Abbey of St. Matthias of Trier the project is digitizing all of the approximately 500 manuscripts that originate from the Abbey but are now dispersed throughout the world. The integration and modeling of catalog data allows presentation, navigation, research and networking of the library holdings. For the modeling of the project there are various challenges: Each manuscript is a witness of a text or a work, it should be associated with information about this specific manuscript / text / work and with critical text editions (e.g. Perseus) or other manuscript witnesses. At the same time, the collection of the library can be seen as an ensemble, which has been collected, maintained and curated with care. The DFG-Viewer is used to easily present and navigate each manuscript, although this tool has been developed primarily for printed works. This decision brings about some problems for data modeling (TEI, METS-MODS). At the same time all data will be incorporated into the virtual research environment TextGrid, where they are released for further processing. On the one hand, this allows to support the scholarly work of individual researchers (or research groups), on the other hand the virtual scriptorium St. Matthias can be “edited” as a social edition by the research community. One of the key questions will therefore be, whether and how the collaborative data modeling can be designed. www.stmatthias.uni-trier.de

Featured Abstract: Feb. 23

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract: “Where Semantics Lies”

Stephen Ramsay, University of Nebraska

Should the syntax of XML have been scrapped in favor of s-expressions?  This debate, which raged for years and which occasionally reappears, has all the ring of a religious war (Windows vs. Mac, Emacs vs. Vi, big-endian vs. little endian).  In this talk, I will suggest that while in general this discussion generated more heat than light, it pointed toward an important set of issues that bears on the problem of data modeling in the humanities.  The question isn’t which syntax is superior, but rather, what does it mean for a syntax to have a semantics and (more critically) where does that semantics lie within the overall system?

I begin by claiming that our common definitions of “semantics” (within computer science) are too vague, and offer a definition loosely based on Wittgenstein’s notion of meaning as a function of use.  I then use that definition to distinguish between XML as a syntax that binds its semantics late in the overall computational process, and an s-expression-based language (like Lisp) that defines its semantics early.  I then pose the question: What would it look like if we were to imagine systems that take our present data models and bind them early?

The purpose of this exercise is neither to rekindle this debate, nor even to suggest that the conception of semantics within XML or s-expressions is flawed.  It is, rather, to reimagine our current data models as having options beyond what has been commonly offered — not just data to which we apply algorithms, but data that is itself algorithmic.

Featured Abstract: Feb. 20

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract: “What is the Thing that Changes?: Space and Time through the Atlas of Historical County Boundaries”

Douglas Knox, Newberry Library

One would think that modeling historical administrative boundaries would be a straightforward matter, relatively free of the complications of more fuzzy phenomena in the humanities. In fact, however, the precision of modeling tools casts the inherent difficulties of modeling administrative change over time and space in sharp relief. This presentation will draw on examples from the Atlas of Historical County Boundaries, an NEH-funded project of the Newberry Library completed in 2010, which documents every change in county boundaries in what is now the United States from colonial times through the year 2000. In addition to reviewing fundamental data modeling decisions of the project, the presentation will explore interesting edge cases, connections and similarities to other kinds of data, implicit models, and alternative ways of approaching the question of what are the objects of interest that we imagine persisting through changes over time.

Featured Abstract: Feb. 16

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract: “Digital Literary History and its Discontent”

Fotis Jannidis, University of Wuerzburg

Literary history and digital humanities present themselves to the interested like an unfinished bridge marked by a huge gap the two sides. The side of literary history has been busy discussing the principles histories are constructed by and the demand of ever wider concepts of their subject. On the side of digital literary studies there are various attempts to read ‘a million books’ stretching the notion of ‘reading’ to new limits. Most work has been done on classification, specifically on genre classification (e.g. Mueller or Jockers). Work to close the gap can start from both sides. My talk will discuss some of the concepts underlying contemporary literary histories pushing towards a more formalized description. But this will only be possible for a very small part of the entire enterprise ‘literary history’ thus putting methods of digital literary studies in a subsidiary role. And even when it is possible to describe some aspect (genre, concepts like author, reader etc.) more formalized, most of the time this formal description cannot be applied automatically to larger collections of text. In a self-reflexive turn I will try to analyze how this ‘more formalized description’ is achieved describing thereby the gap between the conceptual modelling done in any humanities research and the demands of a more formal description.

Featured Abstract: Feb. 13

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract: “Objects, Process, Context in Time and Space – and how we model all this in the Europeana Data Model”

Stefan Gradmann, Humboldt University of Berlin

Once we start modeling complex objects as RDF-graphs, as aggregations of web resources in a linked data environment we quickly get into questions regarding the boundaries of these aggregations of web resources, the ways we could describe their provenance, the way we could version them including their context (and what are the boundaries of that ‘context’?). How do we model time and process context in such environments? Herbert van de Sompel has done some initial groundbreaking work in that area with his Memento project – but that is just one first step. We seem to have firmer ground for contextualisation on the spatial side: GeoNames, GeoCoordinates and the like seem to be much more stabilized conceptual areas. Maybe because the denotative aspect is stronger in space than in time?!

Featured Abstract: Feb. 9

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract – “The Person Data Repository”

Alexander Czmiel, Berlin-Brandenburg Academy of Sciences and Humanities

I will present the data model of the Person Data Repository, a project based at the Berlin-Brandenburg Academy of Sciences and Humanities, which pursues a novel approach to structure heterogeneous biographical data. The approach does not define a person as single data record, but rather as compilation of all statements concerning that person. Thus, it is possible to display complementing as well as contradicting statements in parallel, which meets one of the basic challenges of biographic research. In order to satisfy different research approaches and perspectives, the smallest entity of the Person Data Repository is not a person, but a single statement on a person, which is named “aspect” in the data model. An aspect bundles references to persons, places, dates and sources. By proper queries it will be possible to create further narrations, whose first dimension is not necessarily a person, but possibly also a time span or a certain location. Additionally, all aspects are connected to the corresponding source and to current identification systems respectively, like the LCCN or the German PND. Thus, scientific transparency and compatibility with existing and future systems is guaranteed. To collect and create aspects of a person we built the “Archive-Editor”, a java based tool with a user friendly but powerful interface for the Person Data Repository.

Featured Abstract: Feb. 6

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract: “Discovering our models: aiming at metaleptic markup applications through TEI customization”

Trevor Munoz, University of Maryland

The activity of data modeling involves generating a description of the structure that data will have in an information system. In practice, for many current humanities text projects, the outlines of these  descriptions and the information systems they will work within are already known—the vocabulary of the Text Encoding Initiative (TEI) and some kind of toolchain suited to storing, processing, and retrieving XML. This should not obscure the modeling challenges involved in humanities text projects. Indeed it is in the investigation and manipulation of these complex systems of information representation—through the customization mechanisms provided by the TEI—that much of the intellectual contribution of “small data” projects to the digital humanities can be found. Also at this point in a project, the roles of (digital) humanist and librarian are most closely aligned. An examination of the process of developing TEI customizations for several projects will show some of the decisions whereby the digital representation of texts become strategic models and also where the strategic emphases of librarians and humanists for those representations begin to fall out in slightly different ways. As the primary case study among those presented, the development of the Shelley-Godwin archive project will exhibit how TEI customization as an act of data modeling looks backward to traditions of editing and forward to new kinds of computer-enabled processing in an attempt to develop a rich, critically-engaged record of engagement with an important body of texts.

Featured Abstract: Feb. 2

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract: “To use ontology, or not?”

Paul Caton, King’s College 

Should formal ontology always accompany data modelling? Both are very broad terms, but clearly the part of the former that Dale Jacquette calls “applied scientific ontology” (2002), ie. the categorisation and organisation of actually existent things, overlaps with the latter. There is plenty of room left, however, for a simple intuitive data modelling free of any rigorous logical constraints, and based instead on common sense and experience of the world. This approach has advantages of speed and familiarity to recommend it, and tools such as Entity Authority Transaction Service (EATS) to implement it. When a seemingly straightforward data modelling task becomes unexpectedly awkward because ‘intuitive’ entities and properties fail to elegantly capture a conjunction of particulars, an obvious step is to work back through the series of assumptions that led to those entities and properties. At this point formal ontology begins to exert it’s attractive force, and it is hard to avoid being drawn in, particularly because it seems as though all answers might be found there. But it is hard to introduce only a little formal ontology; recursive questioning inexorably pulls one beyond the relatively safe applied scientific fringes towards the core of fundamental categories, and that turns out to be a strange and disconcerting place for the philosophically-naïve digital humanist (and I am one such). User-friendly tools such as Protégé and widely available upper-level ontologies such as Cyc may give a comforting impression that every concept in one’s data set can securely grounded somewhere, but a trip into the deeper reaches of ontology quickly gives the lie to that.  Answers there are plenty, but few that agree with each other on even the most basic issues (see, for example, the first chapter of Westerhoff 2005). In this presentation I shall describe a case of unsatisfactory representation in the preparatory data modelling for the digital edition of the new Cambridge Edition of the Works of Ben Jonson, and consider whether the best response is to ‘fudge and forget’ – thereby staying in the open field of the informal, intuitive approach – or to follow the path into the ontological forest: a more honourable strategy, perhaps, but fraught with the risk of going too far in and becoming hopelessly lost.

Framing Questions

  1. What is the relation between the narrow focus of modelling for implementation and the broader view of modelling the nature of a domain?
  2. How do you model an historical happening?
  3. What is a document object (a string of characters, an XML element, a node in  a graph, …)?
  4. Which are the basic operations on marked up document objects (deletion, insertion, extraction, …), and how should they be defined?

Featured Abstract: Jan. 30

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract: “Schema as architectural blueprint for a model supporting overlap”

Wendell Piez, Mulberry Technologies, Inc.

I believe that schemas will play an important role in a system providing an adequate data model for humanities (and generalized) text and document processing that provides for arbitrary overlap and multiple concurrent hierarchies (MCH). In particular, I believe schema languages describing permissible document structures including permissible overlap, such as Jeni Tennison’s CREOLE, will be a useful and possibly essential tool in bridging the gap between “flat” models such as range-only models (for example, models implemented using standoff annotation in order to assert the presence of ranges without the encumbrance of hierarchies), and models supporting multiple hierarchies such as GODDAG. Additionally, a special problem of models supporting MCH such as GODDAG is how to determine the “correct” GODDAG out of the many possible GODDAGs that can feasibly be projected onto a markup instance; and I believe schemas can provide most of a solution.

Framing Question:

Why do we (when do we) need a model that represents multiple concurrent hierarchies as such, as opposed to a simpler model supports a set of arbitrary range annotations, with hierarchies present only implicitly?

Featured Abstract: Jan. 26

Each Monday and Thursday, an abstract from one of the symposium participants will be posted to facilitate discussion.  We welcome your comments!

Featured Abstract: “Data modeling for early modern emblems”

Thomas Stäcker, Wolfenbüttel

Emblem studies are unique in a number of respects. First, they do not belong to a single scholarly discipline or academic department, but address a wide range of research areas, be they philological, historical, or art historical. Second, emblems, which ideally consist of a motto, a pictura, and an epigram, constitute intricate combinations of texts and images that appear in a variety of media. Printed emblem books are merely one manifestation of emblems. Emblems can be found in architecture, paintings, manuscripts or majolica. In addition, there are multiple interrelations among these emblem forms. Emblems on paintings may be copied from engravings, or vice versa. Accordingly, emblems can pose challenges for digital humanists.
In order to facilitate data exchange and the capture of emblem data, a common data model was developed by the so called OpenEmblem Group, an international group of scholars with interests in digital humanities, whose aim is not only to improve data exchange and aggregation, but also to develop new ways of conducting research on emblems by taking advantage of the new technologies. In recent years the University of Illinois at Urbana-Champaign, USA, along with the Herzog August Bibliothek, Wolfenbüttel, began building a portal for emblems in a project funded by a joint initiative of the DFG/NEG. Wolfenbüttel designed a XML schema, the so called Emblem namespace, based on a set of categories describing emblems formally developed by Stephen Rawles at the University of Glasgow (the so called spine of information). Furthermore, Illinois established a registry for a Handle service allowing the unique identification of each emblem worldwide. Iconclass notations–one of the most important art history standards for indexing pictures–were added by experts in Rotterdam (Netherlands) and Marburg (Germany) and delivered via OAI-PMH to Illinois and Wolfenbüttel to enrich their transcriptions of mottos by descriptions of the picturae. Other contributions came from Utrecht (Netherlands) and Munich (Germany).
My statement provides a brief account of the development and components of the data model for emblems and of how the model was made operative. It sketches the outlines of further developments, e.g. the inclusion of SKOS elements for the description of Iconclass notations, and more generally, the adoption of semantic web techniques revolving around issues of persistent and reliable identification and seamless integration of emblem data in different kind of contexts. By the same token it demonstrates how the various projects initiated by the OpenEmblem Group showcase how future collaborative research may be put to work in a distributed web based environment.

Framing Questions

  1. What will be the role of digital humanities in the future? Will it evolve rather into a specialist knowledge existing external to the various disciplines (for instance, a specialist in medical technique needn’t be an expert in curing people) or will it be the case that digital humanities are becoming a constitutive part of the discipline itself?
  2. What role will linked (open) data play for stakeholders in the digital community?
  3. In what cases are RDF techniques reasonable to use, or not use?
  4. Relating to editions: can disciplinary dependent tagging be overcome by stand-off markup?
  5. How are data collections in the humanities to be treated? Is there an analogy to research data in the sciences?