Knowledge Organization and Data Modeling in the Humanities: An ongoing conversation


In March 2012, a three-day workshop was held at Brown University on data modeling in the humanities, sponsored by the NEH and the DFG, and co-organized by Fotis Jannidis and Julia Flanders. Attended by approximate 40 experts with diverse disciplinary backgrounds, the event included theoretical presentations, case studies, panels, and wide-ranging open discussion. What we present here is a record of the event, with links to slides, video footage, and transcriptions of all presentations and discussion. In order to open up the conversation to a broader audience, the transcriptions have been extensively annotated to elucidate informal references, and to provide links and glosses on the many projects, tools, standards, people, and specialized terms that were referenced in discussion.

March 14

Keynote presentation: Wendell Piez, “Data Modeling for the Humanities: Three Questions and One Experiment” (paper, slides, video, transcription)

Panel discussion: Data models in humanities theory and practice (video, transcription)

Stephen Ramsay, Laurent Romary, Kari Kraus, Maximilian Schich, Desmond Schmidt, Andrew Ashton; Julia Flanders and Fotis Jannidis (moderators)

Theoretical perspectives I

Case studies: Critical editions

March 15

Open discussion: Key themes (video, transcription)

Case studies: Research ontologies

  • Daniel Pitti, “EAC-CPF” (video, transcription)
  • Stefan Gradmann, “Objects, Process, Context in Time and Space – and how we model all this in the Europeana Data Model” (slides, video, transcription)
  • Trevor Muñoz, “Discovering our models: aiming at metaleptic markup applications through TEI customization” (slides, video, transcripition)

Panel discussion: Data modeling and humanities pedagogy (video, transcription)

Elisabeth Burr, Elizabeth Swanstrom, Susan Schreibman, Elena Pierazzo; Julia Flanders, moderator

Theoretical perspectives II


March 16

Open discussion: Key themes (video, transcription)

Case studies: Historical archives

Theoretical perspectives III

Closing keynote presentation: C. M. Sperberg-McQueen (video, transcription)

Continuing the discussion

We know there will be continued interest in the topic of data modeling in the (digital) humanities. For a record of this event, including video footage of all the sessions and links to slides and presentation notes, please visit the workshop site at the Women Writers Project.

Thanks to all who participated in Knowledge Organization and Data Modeling!

Featured Abstract: March 14

Check in frequently this week to view featured abstracts, leading up to the symposium! We welcome your comments.

Featured Abstract: “A theoretically-rich approach to teaching to model”

Elena Pierazzo, King’s College

Modelling is at the heart of most of my teaching: when teaching XML, XSLT, TEI within an MA in Digital Humanities you need to provide the students with intellectual challenges as well as technical skills. In fact, modelling can be seen as the intellectual activity which lies at the base of any computational effort, namely the methods and the languages we invent to communicate our understanding of a particular cultural object (such as a text, a statue, a piece of music) to the computer and, via the computer, to the users. Effective modelling depends on a deep analysis and understanding of the object to be modelled, so it is also essential to encourage and train students’ analytical skills as part of introducing them to modelling; the provision of theoretical frameworks within which to conduct the analysis and subsequent modelling has proven to be a highly successful approach with MA and PhD students. The case study to be presented here will be the modelling of texts of manuscripts and of the transmission of texts through centuries, materials and people. Transmission of texts can be seen as an act of communication, and so communication and linguistic theories (particularly those of Shannon-Weaver 1948/63, Berlo 1960, Saussure 1961 and Jakobson 1960 ) can cast some new light over the way we analyse, model and understand the texts contained in manuscripts as well as their relationship with the author’s intentions and the reader’s experience. The use of such a complex theoretical framework has proven to help students move conceptually from the empirical to the abstract, a process that is fundamental for modelling. My talk will present some considerations and examples of analytical and modelling activities applied to text transmission and which have been used in the classroom at King’s College London.

Featured Abstract: March 14

Check in frequently this week to view featured abstracts, leading up to the symposium! We welcome your comments.

Featured Abstract: “Taking Modeling Seriously”

Allen Renear, University of Illinois, Urbana-Champaign

There are many kinds of modeling. I am concerned here with the sort of
modeling that emphasizes theoretical or epistemic objectives, modeling,
that this, that purports to provide an account of how things are in a
domain of interest.   The demands of this sort of modeling are exacting –
and not to everyone’s taste. But the rewards in insight and understanding
are worth the effort. This sort of modeling may sound like straightforward
philosophical ontology development, and of the usual naïve and realistic
sort. Perhaps in a sense it is. However my focus throughout will not be on
self-declared ontologies or ontology design, but rather on examples that
have more practical objectives (such as systems design) and are typically
carried out in familiar graphic conceptual modeling languages, such as
entity relationship diagrams and UML class diagrams. It is these ordinary
modeling efforts I will be taking seriously, and in doing that I will be
thereby doing some serious modeling of my own. In my experience the
stresses and paradoxes latent in familiar unpretentious conceptual
modeling give us a natural manageable start in thinking through some of
the hardest problems in developing a formal understanding of cultural
objects and relationships. In the end however I will argue, as you
probably suspect, that taking modeling seriously requires specific
logic-based formal methods. Serious modeling takes modeling more seriously
than it takes itself.

Featured Abstract: March 13

Check in frequently this week to view featured abstracts, leading up to the symposium! We welcome your comments.

Featured Abstract: “Modeling: Perspectives, Objectives, and Context”
Daniel Pitti, Institute for Advanced Technology in the Humanities, University of Virginia

Humanists, scholars and cultural heritage professionals (archivists, librarians, museum curators, and keepers of sites and monuments) share a common focus on artifacts, objects created by humans that provide the historical evidence for our understanding of what it means to be human. Cultural heritage professionals focus on preserving and facilitating access to selected artifacts, and scholars study the artifacts, from a variety of perspectives, and attempt to analyze and understand them.

Humanists have turned to (and become increasingly comfortable with) information technologies for practical reasons: the technologies allow them to achieve particular professional or scholarly objectives. Broadly speaking, those are preservation and access for the cultural heritage professionals, and analysis and understanding for the scholars. To facilitate achieving their objectives, the scholars and professionals seek to represent an artifact or class of artifacts, descriptive representation (for example, a catalog record) or content representation (for example, a TEI-encoded text). The ways in which any given artifact or class of artifacts can be represented is unlimited, but the mission of the cultural heritage professional and the disciplinary perspective of the scholar narrow the possible representations.

Modeling or representing artifacts digitally involves philosophical issues—metaphysical, epistemological, and even ethical issues—as well as quite practical issues. A particular technology’s capacity to represent artifacts will limit or direct its application, making it a better or worse servant to our objectives. Economy and efficiency must be a factor, both in the processing efficiency of a chosen technology, and the financial and administrative economy of creating and maintaining the representation data. Social context and objectives also have an impact on the modeling design and process. Representations that are created and maintained by a lone scholar, a small group of two or three working together closely, or a large distributed community, have their own specific design challenges. Finally, for both scholars and cultural heritage professionals, the desired audience must be an important social factor to be considered in the modeling process.

Featured Abstract: March 13

Check in frequently this week to view featured abstracts, leading up to the symposium! We welcome your comments.

Featured Abstract: “Analyzing linguistic variation: From corpus query towards feature discovery”

Elke Teich, Universität des Saarlandes, Saarbrücken, Germany

In the study of linguistic variation (dialect, sociolect, register), we can distinguish two types of analytical situations: the feature-centric and the variable-centric. In a feature-centric perspective, we start from a given feature (or set of features) and want to derive the variables (e.g., place, social group, user group) associated with the given feature/features. This is a typical situation in dialect studies, where we are interested in the geographical distribution of a particular phonetic realization (e.g., +/- rhoticity and British dialect areas). In a variable-centric perspective, we start from a given variable (e.g., a register) and want to determine the features that are typically associated with that variable. In the feature-centric perspective, we obtain the necessary information (typically a frequency distribution of a feature) employing a corpus query approach, by means of which we extract the instances of a given feature from an appropriate set of language data. In a variable- centric perspective, we are faced with the problem that we may not know a priori what the relevant features are; instead, we have to find ways of discovering linguistic features potentially suitable for analysis.

In my presentation, I will illustrate these two perspectives with examples from the study of register variation in the scientific domain, looking at selected lexico-grammatical features and the variables of discourse field (scientific discipline) and time (diachronic evolution of registers) (cf. Teich & Fankhauser, 2010; Degaetano et al., 2011; Degaetano & Teich, 2011).

Featured Abstract: March 13

Check in frequently this week to view featured abstracts, leading up to the symposium! We welcome your comments.

Featured Abstract: “Modelling as a centre of Practice and Pedagogy”
Susan Schreibman, Trinity College Dublin

Last year I designed a MPhil in Digital Humanities and Culture for Trinity College Dublin. We are now in the second semester of the first year. Unlike teaching a single DH course that centres on a specific area (from introductory courses to more specific text encoding, digital scholarly editing, web technologies, etc) the longer timescale of a full year allows for significant cross-fertilisation in understanding how disparate technologies, methodologies, and theories interrelate to comprise the salient core of this new and somewhat abstract discipline of humanities computing.

Many of us have been in this field long enough to know that the technologies we teach our students today will be surpassed and replaced by the yet-to-be invented.  What is more lasting, however, is the understanding how to model the objects of our contemplation: from the narrative arc of a thematic research collection, to a TEI-encoded document, to a relational database. Models serve as an abstraction of an analogue object, its relationship to other objects, as well as a representation of what we think is important about them. By teaching our students how to model, we give them the tools to represent and re-present the yet-to-be-encountered.

The question then remains: how do we teach this interrelatedness of things and their properties. Should there be a knowledge representation course which covers modelling more abstractly, or should it be covered within context, when teaching subjects such as relational databases, TEI encoding, or virtual world construction. Should knowledge representation be the theme that binds the disparate strands of DH together, or a theme. If knowledge representation, as Willard McCarty suggests, is the coherent or cohesible practice that binds all of DH together, then it follows, that KR must be at the centre of our pedagogy.

Featured Abstract: March 12

Check in frequently this week to view featured abstracts, leading up to the symposium! We welcome your comments.

Featured Abstract: “Text, Intertext, and Context: Modeling a Map of Medieval Scholarly Practices”

Malte Rehbein, University of Würzburg

Medieval scholarly practices can be fully understood “erst im Zusammenspiel verschiedener methodischer Zugänge und nur unter Berücksichtigung der Gesamtüberlieferung” (only by interplay between various methodological approaches and with consideration of the whole written tradition) as Claudine Moulin has pointed out, analyzing early medieval vernacular glosses [1, p. 76]. It hence requires a holistic approach that takes into account not only texts but also intertextual relations and contextual information: historical, paleographic, codicological, bibliographical, and appreciation of processes of textual production and usage. Representing such comprehensive information is challenging as it requires the interplay of different data models. This presentation discusses this challenge along the case-study of the 8th century
“Würzburg Saint Matthew”[2]: a gospel text with glosses and commentaries compiled from more than forty different sources making it an intriguing, though complicated object of study[3].  The presentation further outlines considerations on the development of models for presenting this information within a research environment and concludes with an outlook toward a comprehensive visual map of medieval scholarly practices.

[1] Moulin, Claudine (2009): Paratextuelle Netzwerke: Kulturwissenschaftliche Erschließung und soziale Dimensionen der althochdeutschen Glossenüberlieferung. In: Gerhard Krieger (Hg.): Verwandtschaft, Freundschaft, Bruderschaft. Soziale Lebens- und Kommunikationsformen im Mittelalter, 56–77.
[2] Rehbein, Malte (2011). “A Data Model for Visualising Textuality — The Würzburg Saint Matthew”. Digital Humanities 2011: Conference Abstracts. Ed. The Alliance of Digital Humanities Organizations. Stanford: Stanford University Library. 204–205.
[3] Cahill, Michael (2002): The Würzburg Matthew: Status Quaestionis. In: Peritia 16, S. 1–25.

Featured Abstract: March 12

Check in frequently this week to view featured abstracts, leading up to the symposium! We welcome your comments.

Elisabeth Burr, Universität Leipzig, Germany

“It’s all about Integration and Conceptual Change”

In Romance Philology the ordering of information extracted from sources into knowledge domains has always been part of the research process. However, the ordering of index cards containing such information according to certain categories and the setting up of relations between them has never been regarded as knowledge modelling or as the building of ontologies. Similar things could be said about ‘data’, ‘filtering sources for information’ or ‘markup’. Still today, doing research and presenting research results tends to be seen more as mental processes than as disciplined activities which need to be made explicit and taught. This state of affairs has serious implications for students. Not only is doing research and writing a ‘disciplined’ academic paper not widely taught, but methods of research and of good academic practice are also not conceived of as being of epistemological interest. Instead, they are considered to be mere skills which can be acquired in courses offered by non-academic service centres. Furthermore, as the computer is still looked upon and used as if it was just a modern form of typewriter, no need is seen to teach students a meaningful way of exploiting computer technologies for their research and writing in academic courses. If anything, students are encouraged to believe that writing an academic paper is all about ideas, creativity and genius and that the structure and the layout of a paper, the consistency of citations or the integrity of bibliographies among other things are formalities.
In order to change the concept of doing research and of academic paper writing and to foster a meaningful exploitation of computer technologies I have implemented in one of my courses of Romance linguistics a project-oriented approach where methods and tools of, and questions posed by the Digital Humanities play an important role. By involving students in the creation of a digital version of a text and showing them how to apply a TEI schema with the help of an xml-editor like oXygen or by getting them to archive the information about sources in a database like the one EndNote provides, students can actually learn a lot about texts, about data and styles, about systematization and consistency. Furthermore, the linking of information and the building of ontologies makes many aspects of scholarly work explicit to them. If the work they are doing contributes, moreover, to a portal like the one which has been created within the framework of the project (see “Von Leipzig in die Romania”, and which can itself be used to write TEI compliant academic papers, they not only get the chance to develop a different concept of doing research and writing academic papers, but also to conceive of the computer as a device for the manipulation of systematized data and thus also for the modelling of knowledge and not as a mere high-tech typewriter.

Featured Abstract: March 12

Check in frequently this week to view featured abstracts, leading up to the symposium! We welcome your comments.

Featured Abstract: “Tagging in the cloud. A data model for collaborative markup”

Jan Christoph Meister, University of Hamburg

This paper discusses the data model underlying CLÉA, short for “Collaborative Literature Exploration and Annotation”, a Google DH Award funded project based on the CATMA software developed at Hamburg University. The goal of CLÉA is to build a web based annotation platform supporting multi-user, multi-instance, non-deterministic (and, if required, even contradictory) markup of literary texts in a TEI conformant approach. Apart from technical considerations, this approach to markup has some more fundamental consequences: First, when one and the same text is marked up from different functional perspectives, markup itself starts to become fluid, allowing researchers to aggregate markup just as we aggregate other meta-texts, namely according to their specific research interest. Second and in addition to the functional enhancement, there is also a social aspect to this new approach: the production of markup becomes a team effort.This paradigm shift from individual expert annotation to an “open workgroup”, crowd sourced approach is based on what we call a “one-to-many” data model that can be implemented using “cloud” technology. “Tagging in the cloud”, therefore, combines three new aspects on text markup – the social, the technological, and the conceptual.