“It’s all about Integration and Conceptual Change”
In Romance Philology the ordering of information extracted from sources into knowledge domains has always been part of the research process. However, the ordering of index cards containing such information according to certain categories and the setting up of relations between them has never been regarded as knowledge modelling or as the building of ontologies. Similar things could be said about ‘data’, ‘filtering sources for information’ or ‘markup’. Still today, doing research and presenting research results tends to be seen more as mental processes than as disciplined activities which need to be made explicit and taught. This state of affairs has serious implications for students. Not only is doing research and writing a ‘disciplined’ academic paper not widely taught, but methods of research and of good academic practice are also not conceived of as being of epistemological interest. Instead, they are considered to be mere skills which can be acquired in courses offered by non-academic service centres. Furthermore, as the computer is still looked upon and used as if it was just a modern form of typewriter, no need is seen to teach students a meaningful way of exploiting computer technologies for their research and writing in academic courses. If anything, students are encouraged to believe that writing an academic paper is all about ideas, creativity and genius and that the structure and the layout of a paper, the consistency of citations or the integrity of bibliographies among other things are formalities.
In order to change the concept of doing research and of academic paper writing and to foster a meaningful exploitation of computer technologies I have implemented in one of my courses of Romance linguistics a project-oriented approach where methods and tools of, and questions posed by the Digital Humanities play an important role. By involving students in the creation of a digital version of a text and showing them how to apply a TEI schema with the help of an xml-editor like oXygen or by getting them to archive the information about sources in a database like the one EndNote provides, students can actually learn a lot about texts, about data and styles, about systematization and consistency. Furthermore, the linking of information and the building of ontologies makes many aspects of scholarly work explicit to them. If the work they are doing contributes, moreover, to a portal like the one which has been created within the framework of the project (see “Von Leipzig in die Romania”, http://www.culingtec.uni-leipzig.de/JuLeipzigRo/) and which can itself be used to write TEI compliant academic papers, they not only get the chance to develop a different concept of doing research and writing academic papers, but also to conceive of the computer as a device for the manipulation of systematized data and thus also for the modelling of knowledge and not as a mere high-tech typewriter.
“To Use Ontology, or Not?”
Should formal ontology always accompany data modelling? Both are very broad terms, but clearly the part of the former that Dale Jacquette calls “applied scientific ontology” (2002), ie. the categorisation and organisation of actually existent things, overlaps with the latter. There is plenty of room left, however, for a simple intuitive data modelling free of any rigorous logical constraints, and based instead on common sense and experience of the world. This approach has advantages of speed and familiarity to recommend it, and tools such as Entity Authority Transaction Service (EATS) to implement it. When a seemingly straightforward data modelling task becomes unexpectedly awkward because ‘intuitive’ entities and properties fail to elegantly capture a conjunction of particulars, an obvious step is to work back through the series of assumptions that led to those entities and properties. At this point formal ontology begins to exert it’s attractive force, and it is hard to avoid being drawn in, particularly because it seems as though all answers might be found there. But it is hard to introduce only a little formal ontology; recursive questioning inexorably pulls one beyond the relatively safe applied scientific fringes towards the core of fundamental categories, and that turns out to be a strange and disconcerting place for the philosophically-naïve digital humanist (and I am one such). User-friendly tools such as Protégé and widely available upper-level ontologies such as Cyc may give a comforting impression that every concept in one’s data set can securely grounded somewhere, but a trip into the deeper reaches of ontology quickly gives the lie to that. Answers there are plenty, but few that agree with each other on even the most basic issues (see, for example, the first chapter of Westerhoff 2005). In this presentation I shall describe a case of unsatisfactory representation in the preparatory data modelling for the digital edition of the new Cambridge Edition of the Works of Ben Jonson, and consider whether the best response is to ‘fudge and forget’ – thereby staying in the open field of the informal, intuitive approach – or to follow the path into the ontological forest: a more honourable strategy, perhaps, but fraught with the risk of going too far in and becoming hopelessly lost.
“The Person Data Repository”
I will present the data model of the Person Data Repository, a project based at the Berlin-Brandenburg Academy of Sciences and Humanities, which pursues a novel approach to structure heterogeneous biographical data. The approach does not define a person as single data record, but rather as compilation of all statements concerning that person. Thus, it is possible to display complementing as well as contradicting statements in parallel, which meets one of the basic challenges of biographic research. In order to satisfy different research approaches and perspectives, the smallest entity of the Person Data Repository is not a person, but a single statement on a person, which is named “aspect” in the data model. An aspect bundles references to persons, places, dates and sources. By proper queries it will be possible to create further narrations, whose first dimension is not necessarily a person, but possibly also a time span or a certain location. Additionally, all aspects are connected to the corresponding source and to current identification systems respectively, like the LCCN or the German PND. Thus, scientific transparency and compatibility with existing and future systems is guaranteed. To collect and create aspects of a person we built the “Archive-Editor”, a java based tool with a user friendly but powerful interface for the Person Data Repository.
If digital scholarship, in practical terms, is predicated on the compatibility of data (expressed variously and debatably as interoperability or interchange), then we can also say that it requires a kind of meta-modeling: that is, a clear expression of the relationships between models. Tools like the TEI, with its customization mechanism, offer a layered approach to data modeling that enables us to represent many different information vectors (including temporal ones). However, many questions require more detailed consideration: the level of precision at which this meta-modeling must take place, the specific vectors of similarity to be expressed, and the meaning or motivations of our customizations. Is it possible to use a mechanism of this kind in a rigorous way to support more effective scholarship and scholarly collaboration?
“Objects, Process, Context in Time and Space – and how we model all this in the Europeana Data Model”
Once we start modeling complex objects as RDF-graphs, as aggregations of web resources in a linked data environment we quickly get into questions regarding the boundaries of these aggregations of web resources, the ways we could describe their provenance, the way we could version them including their context (and what are the boundaries of that ‘context’?). How do we model time and process context in such environments? Herbert van de Sompel has done some initial groundbreaking work in that area with his Memento project – but that is just one first step. We seem to have firmer ground for contextualisation on the spatial side: GeoNames, GeoCoordinates and the like seem to be much more stabilized conceptual areas. Maybe because the denotative aspect is stronger in space than in time?!
“Comparing representations of and operations on overlap”
Overlapping document structures have been studied by markup theorists for more than twenty years. A large number of solutions has been proposed. Some of the proposals are based on XML, others not. Some are proposals for use of alternate serial forms or data models, and some for stand-off markup. Algorithms for transformations between the different forms have also been proposed. Even so, there are few systematic comparative studies of the various proposals, and there seems to be little consensus on what is the best approach.
The aim of the MLCD Overlap Corpus (MOC) is to make it easier to compare the different proposals by providing concrete examples of documents marked up according to a variety of proposed solutions. The examples are intended to range from small, constructed documents to full-length, real texts. We believe that the provision of such different parallel representations of the same texts in various formats may serve a number of purposes.
Many of the proposals for markup of overlapping structures are not fully worked out, or not well documented, or known only from scattered examples. Encoding a larger body of different texts according to each of the proposed solutions may help resolving unclarities or shed new light on difficulties about the proposals themselves.
Running or developing software to perform various operations on the same data represented in different forms may also help in finding out which forms are optimal for which operations. Some operations, even though well understood for non-overlapping data, may turn out not to be clearly defined for overlapping data.
Finally, a parallel corpus may serve as reference data for work on translations between the various formats, for testing conversion algorithms, and for developing performance tests for software.
“Digital Literary History and its Discontent”
Literary history and digital humanities present themselves to the interested like an unfinished bridge marked by a huge gap the two sides. The side of literary history has been busy discussing the principles histories are constructed by and the demand of ever wider concepts of their subject. On the side of digital literary studies there are various attempts to read ‘a million books’ stretching the notion of ‘reading’ to new limits. Most work has been done on classification, specifically on genre classification (e.g. Mueller or Jockers). Work to close the gap can start from both sides. My talk will discuss some of the concepts underlying contemporary literary histories pushing towards a more formalized description. But this will only be possible for a very small part of the entire enterprise ‘literary history’ thus putting methods of digital literary studies in a subsidiary role. And even when it is possible to describe some aspect (genre, concepts like author, reader etc.) more formalized, most of the time this formal description cannot be applied automatically to larger collections of text. In a self-reflexive turn I will try to analyze how this ‘more formalized description’ is achieved describing thereby the gap between the conceptual modelling done in any humanities research and the demands of a more formal description.
“What is the Thing that Changes?: Space and Time through the Atlas of Historical County Boundaries”
One would think that modeling historical administrative boundaries would be a straightforward matter, relatively free of the complications of more fuzzy phenomena in the humanities. In fact, however, the precision of modeling tools casts the inherent difficulties of modeling administrative change over time and space in sharp relief. This presentation will draw on examples from the Atlas of Historical County Boundaries, an NEH-funded project of the Newberry Library completed in 2010, which documents every change in county boundaries in what is now the United States from colonial times through the year 2000. In addition to reviewing fundamental data modeling decisions of the project, the presentation will explore interesting edge cases, connections and similarities to other kinds of data, implicit models, and alternative ways of approaching the question of what are the objects of interest that we imagine persisting through changes over time.
“Tagging in the cloud. A data model for collaborative markup”
This paper discusses the data model underlying CLÉA, short for “Collaborative Literature Exploration and Annotation”, a Google DH Award funded project based on the CATMA software developed at Hamburg University. The goal of CLÉA is to build a web based annotation platform supporting multi-user, multi-instance, non-deterministic (and, if required, even contradictory) markup of literary texts in a TEI conformant approach. Apart from technical considerations, this approach to markup has some more fundamental consequences: First, when one and the same text is marked up from different functional perspectives, markup itself starts to become fluid, allowing researchers to aggregate markup just as we aggregate other meta-texts, namely according to their specific research interest. Second and in addition to the functional enhancement, there is also a social aspect to this new approach: the production of markup becomes a team effort.This paradigm shift from individual expert annotation to an “open workgroup”, crowd sourced approach is based on what we call a “one-to-many” data model that can be implemented using “cloud” technology. “Tagging in the cloud”, therefore, combines three new aspects on text markup – the social, the technological, and the conceptual.
“On the Value of Comparing Truly Remarkable Texts”
Looking at the comparatively short history of editions in the digital medium, one notices that those projects which are highly critical about the edited text invariably end up pushing the boundaries of established practices in text modeling and encoding. For example, this has been the case for the HyperNietzsche edition, which developed its own genetic XML markup dialect, or for the Wittgenstein edition, which went as far as developing its own markup language. The ongoing genetic edition of Goethes Faust is no different in as much as it makes use of common XML-based encoding practices and de-facto standards like the guidelines of the Text Encoding Initiative but at the same time felt the need to transcend those, so it can cope with the inherent complexity of modeling its subject matter. Rooted in the tradition of German editorial theory, the Faust edition strives for a strict conceptual distinction between the material evidence of the the text’s genesis as found in the archives on the one hand and the interpretative conclusion drawn from this evidence on the other hand, the latter eventually giving rise to a justied hypothesis of how the text came into being. These two perspectives on the edited text, though complementary, are structured very differently and moreover cannot be modeled via context-free grammars in their entirety. Therefore it is already hard to encode, validate and process a single perspective via XML concisely and eciently, let alone both of them in an integrated fashion. Given this problem and the need to solve it in order to meet the expectations of scholarly users towards an edition which in the end claims to be “historical-critical”, the Faust project turned to multiple, parallel encodings of the same textual data, each describing the textual material from one of the desired perspectives. Necessarily the different encodings have to be correlated then, consequently resulting not in the common compartmentalized model of an edited text but in an integrated, inherently more complex one. In the work of the Faust project, this crucial task of correlating perspectives on a text is achieved semi-automatically by means of computer-aided collation and a markup document model supporting arbitrarily overlapping standoff annotations. The presentation of both this editorial workflow as well as its underlying techniques and models might not only be of interest in its own right; it might as well contribute to the answer of a broader question: Can we gradually increase our notion’s complexity of “what text really is” while still being able to rely on encoding practices widely endorsed by the DH community today.
“Discovering our models: aiming at metaleptic markup applications through TEI customization”
The activity of data modeling involves generating a description of the structure that data will have in an information system. In practice, for many current humanities text projects, the outlines of these descriptions and the information systems they will work within are already known—the vocabulary of the Text Encoding Initiative (TEI) and some kind of toolchain suited to storing, processing, and retrieving XML. This should not obscure the modeling challenges involved in humanities text projects. Indeed it is in the investigation and manipulation of these complex systems of information representation—through the customization mechanisms provided by the TEI—that much of the intellectual contribution of “small data” projects to the digital humanities can be found. Also at this point in a project, the roles of (digital) humanist and librarian are most closely aligned. An examination of the process of developing TEI customizations for several projects will show some of the decisions whereby the digital representation of texts become strategic models and also where the strategic emphases of librarians and humanists for those representations begin to fall out in slightly different ways. As the primary case study among those presented, the development of the Shelley-Godwin archive project will exhibit how TEI customization as an act of data modeling looks backward to traditions of editing and forward to new kinds of computer-enabled processing in an attempt to develop a rich, critically-engaged record of engagement with an important body of texts.
“A theoretically-rich approach to teaching to model”
Modelling is at the heart of most of my teaching: when teaching XML, XSLT, TEI within an MA in Digital Humanities you need to provide the students with intellectual challenges as well as technical skills. In fact, modelling can be seen as the intellectual activity which lies at the base of any computational effort, namely the methods and the languages we invent to communicate our understanding of a particular cultural object (such as a text, a statue, a piece of music) to the computer and, via the computer, to the users. Effective modelling depends on a deep analysis and understanding of the object to be modelled, so it is also essential to encourage and train students’ analytical skills as part of introducing them to modelling; the provision of theoretical frameworks within which to conduct the analysis and subsequent modelling has proven to be a highly successful approach with MA and PhD students. The case study to be presented here will be the modelling of texts of manuscripts and of the transmission of texts through centuries, materials and people. Transmission of texts can be seen as an act of communication, and so communication and linguistic theories (particularly those of Shannon-Weaver 1948/63, Berlo 1960, Saussure 1961 and Jakobson 1960 ) can cast some new light over the way we analyse, model and understand the texts contained in manuscripts as well as their relationship with the author’s intentions and the reader’s experience. The use of such a complex theoretical framework has proven to help students move conceptually from the empirical to the abstract, a process that is fundamental for modelling. My talk will present some considerations and examples of analytical and modelling activities applied to text transmission and which have been used in the classroom at King’s College London.
“Data Modeling in the Humanities: Three Questions and One Experiment”
Thinking about data modeling in the humanities leads directly to paradoxical questions regarding digital data, textual media, and their proper or possible relations in a system of representation. Rather than answer these questions directly, I would like to pose three more. What do we mean by “data model”, and in particular, how can a data model be designed to support processes and methods that must be underspecified insofar as they are protean, contested, responsive to exigencies, and themselves objects of investigation? What about markup, and what is the relation of our data model to markup technologies? And what is the potential role of the schema, as an instrument of operations and transformations that can enable open-ended and experimental work? In order to help explore these issues, I will demonstrate a prototype toolkit, parsing a markup syntax capable of representing arbitrary overlapping ranges and providing them with structured annotations.
Daniel Pitti, Institute for Advanced Technology in the Humanities, University of Virginia
“Modeling: Perspectives, Objectives, and Context”
Humanists, scholars and cultural heritage professionals (archivists, librarians, museum curators, and keepers of sites and monuments) share a common focus on artifacts, objects created by humans that provide the historical evidence for our understanding of what it means to be human. Cultural heritage professionals focus on preserving and facilitating access to selected artifacts, and scholars study the artifacts, from a variety of perspectives, and attempt to analyze and understand them.
Humanists have turned to (and become increasingly comfortable with) information technologies for practical reasons: the technologies allow them to achieve particular professional or scholarly objectives. Broadly speaking, those are preservation and access for the cultural heritage professionals, and analysis and understanding for the scholars. To facilitate achieving their objectives, the scholars and professionals seek to represent an artifact or class of artifacts, descriptive representation (for example, a catalog record) or content representation (for example, a TEI-encoded text). The ways in which any given artifact or class of artifacts can be represented is unlimited, but the mission of the cultural heritage professional and the disciplinary perspective of the scholar narrow the possible representations.
Modeling or representing artifacts digitally involves philosophical issues—metaphysical, epistemological, and even ethical issues—as well as quite practical issues. A particular technology’s capacity to represent artifacts will limit or direct its application, making it a better or worse servant to our objectives. Economy and efficiency must be a factor, both in the processing efficiency of a chosen technology, and the financial and administrative economy of creating and maintaining the representation data. Social context and objectives also have an impact on the modeling design and process. Representations that are created and maintained by a lone scholar, a small group of two or three working together closely, or a large distributed community, have their own specific design challenges. Finally, for both scholars and cultural heritage professionals, the desired audience must be an important social factor to be considered in the modeling process.
“Where Semantics Lies”
Should the syntax of XML have been scrapped in favor of s-expressions? This debate, which raged for years and which occasionally reappears, has all the ring of a religious war (Windows vs. Mac, Emacs vs. Vi, big-endian vs. little endian). In this talk, I will suggest that while in general this discussion generated more heat than light, it pointed toward an important set of issues that bears on the problem of data modeling in the humanities. The question isn’t which syntax is superior, but rather, what does it mean for a syntax to have a semantics and (more critically) where does that semantics lie within the overall system?
I begin by claiming that our common definitions of “semantics” (within computer science) are too vague, and offer a definition loosely based on Wittgenstein’s notion of meaning as a function of use. I then use that definition to distinguish between XML as a syntax that binds its semantics late in the overall computational process, and an s-expression-based language (like Lisp) that defines its semantics early. I then pose the question: What would it look like if we were to imagine systems that take our present data models and bind them early?
The purpose of this exercise is neither to rekindle this debate, nor even to suggest that the conception of semantics within XML or s-expressions is flawed. It is, rather, to reimagine our current data models as having options beyond what has been commonly offered — not just data to which we apply algorithms, but data that is itself algorithmic.
“Virtual Scriptorium St. Matthias”
In order to virtually reconstruct the mediaeval library of the Benedictine Abbey of St. Matthias of Trier the project is digitizing all of the approximately 500 manuscripts that originate from the Abbey but are now dispersed throughout the world. The integration and modeling of catalog data allows presentation, navigation, research and networking of the library holdings. For the modeling of the project there are various challenges: Each manuscript is a witness of a text or a work, it should be associated with information about this specific manuscript / text / work and with critical text editions (e.g. Perseus) or other manuscript witnesses. At the same time, the collection of the library can be seen as an ensemble, which has been collected, maintained and curated with care. The DFG-Viewer is used to easily present and navigate each manuscript, although this tool has been developed primarily for printed works. This decision brings about some problems for data modeling (TEI, METS-MODS). At the same time all data will be incorporated into the virtual research environment TextGrid, where they are released for further processing. On the one hand, this allows to support the scholarly work of individual researchers (or research groups), on the other hand the virtual scriptorium St. Matthias can be “edited” as a social edition by the research community. One of the key questions will therefore be, whether and how the collaborative data modeling can be designed. www.stmatthias.uni-trier.de
“Text, Intertext, and Context: Modeling a Map of Medieval Scholarly Practices”
Medieval scholarly practices can be fully understood “erst im Zusammenspiel verschiedener methodischer Zugänge und nur unter Berücksichtigung der Gesamtüberlieferung” (only by interplay between various methodological approaches and with consideration of the whole written tradition) as Claudine Moulin has pointed out, analyzing early medieval vernacular glosses. It hence requires a holistic approach that takes into account not only texts but also intertextual relations and contextual information: historical, paleographic, codicological, bibliographical, and appreciation of processes of textual production and usage. Representing such comprehensive information is challenging as it requires the interplay of different data models. This presentation discusses this challenge along the case-study of the 8th century “Würzburg Saint Matthew”: a gospel text with glosses and commentaries compiled from more than forty different sources making it an intriguing, though complicated object of study. The presentation further outlines considerations on the development of models for presenting this information within a research environment and concludes with an outlook toward a comprehensive visual map of medieval scholarly practices.
“Taking Modeling Seriously”
There are many kinds of modeling. I am concerned here with the sort of
modeling that emphasizes theoretical or epistemic objectives, modeling,
that this, that purports to provide an account of how things are in a
domain of interest. The demands of this sort of modeling are exacting —
and not to everyone’s taste. But the rewards in insight and understanding
are worth the effort. This sort of modeling may sound like straightforward
philosophical ontology development, and of the usual naïve and realistic
sort. Perhaps in a sense it is. However my focus throughout will not be on
self-declared ontologies or ontology design, but rather on examples that
have more practical objectives (such as systems design) and are typically
carried out in familiar graphic conceptual modeling languages, such as
entity relationship diagrams and UML class diagrams. It is these ordinary
modeling efforts I will be taking seriously, and in doing that I will be
thereby doing some serious modeling of my own. In my experience the
stresses and paradoxes latent in familiar unpretentious conceptual
modeling give us a natural manageable start in thinking through some of
the hardest problems in developing a formal understanding of cultural
objects and relationships. In the end however I will argue, as you
probably suspect, that taking modeling seriously requires specific
logic-based formal methods. Serious modeling takes modeling more seriously
than it takes itself.
According to Jerome McGann, the question ‘what is text’ can hardly be answered. It’s like the question ‘how long is the coast of England’. But there is an answer to both questions. The answer lies in the tools you use to measure or “read” and in the degree of granularity you are looking at. Text is what you look at – and how you look at it. Transcription is reading made explicit. The code of what you see – and how you see it. The fine line between text and transcription lies in the reproductive force (Mats Dahlström) that adds to the productive force of every text creation. Transcription is the protocol of a textual perception that is based on the distinction between information and noise. Noise is ignored, information is processed – that means: interpreted. While transcription primarily looks back at a document, it also looks forward to unknown, potentially manifold usages. The most useful transcription must get as close as possible to the documents and to the (various) users at the same time. The current answer to this challenge is the transmedialized text (my neologism proposal): a textual code with a detailed description of the documentary phenomena, enriched with annotative and interpretative information. An abstract representation of our reading that can lead to arbitrary forms of presentation of that text (medializations). Text can thus be seen as a scale of incremental steps of processing and interpretation (thus raising the question of objectivity and subjectivity), going from the document to the user. Or reaching from visual signs over linguistic codes to semantic assertions. But sometimes the semantics don’t lie in the linguistic codes but in the visual signs. Thus, a comprehensive model takes a circular form rather than a simple line. Taking serious all approaches towards text and all possible usages leads to a pluralistic notion of text. Some areas in this model are served well by existing technical solutions and standards. The TEI-Standard is already pluralistic and allows for the encoding of textual representations at different noise-information-borders and on different layers of interpretation. Still, there are textual areas that are not as well supported as others. Recent developments seem to address these spaces: the work of the TEI-SIG on genetic encoding and elements like <tbo> (text bearing object) offer new possibilities for the description of document features while semantic web approaches and RDF as a standard seem to strengthen the semantic codability of text. Still, it seems unclear whether these semantic approaches can go beyond the level of metadata and really represent what is in the text. And it raises the further questions of how textual data and metadata are to be distinguished in a comprehensive model of text, how this model relates to the concepts of FRBR and how the relationship and interdependencies to other “texts” (as works, expressions, manifestations, items) are to be modeled.
A suitable digital infrastructure that fosters collaboration and sharing of information and functionality in the digital humanities is hard to build within the limitations imposed by embedded markup languages. Although integration with existing data representations is an important design goal, digital humanists have to start thinking about what kinds of features they really need in a digital infrastructure and how they can be implemented in a practical and agreed way.
The development of software in the digital humanities has so far followed the pattern of customised tools for particular projects. The same basic functions: import/export, textual comparison, searching, annotation, linking to images etc are implemented over and over again because the only basis those tools have for sharing is a set of subjectively chosen, subjectively defined and implemented tags. Putting all that common functionality into a simple service accessible over the web and implementing those services using best practice computer science methods will allow existing tools such as content management systems and common standards such as TEI to draw on those services, which builds only the lightest of dependencies between services and applications. But adding the ability to freely recombine markup, text and images and to allow markup to describe truly overlapping properties requires fundamental changes to the underlying digital technology that goes beyond embedded markup. This work is being done as part of the HRIT (humanities resources infrastructure and tools) project at the University of Loyola, Chicago, and in collaboration with Digital Variants at Roma3 in Italy, with the Tagore edition at the University of Jadavpur, India and with AustLit at the University of Queensland. We’re very interested in listening to reactions to this design, which we think is very flexible and easy to integrate into existing tools and methods.
Susan Schreibman, Trinity College Dublin
“Modelling as a centre of Practice and Pedagogy”
Last year I designed a MPhil in Digital Humanities and Culture for Trinity College Dublin. We are now in the second semester of the first year. Unlike teaching a single DH course that centres on a specific area (from introductory courses to more specific text encoding, digital scholarly editing, web technologies, etc) the longer timescale of a full year allows for significant cross-fertilisation in understanding how disparate technologies, methodologies, and theories interrelate to comprise the salient core of this new and somewhat abstract discipline of humanities computing.
Many of us have been in this field long enough to know that the technologies we teach our students today will be surpassed and replaced by the yet-to-be invented. What is more lasting, however, is the understanding how to model the objects of our contemplation: from the narrative arc of a thematic research collection, to a TEI-encoded document, to a relational database. Models serve as an abstraction of an analogue object, its relationship to other objects, as well as a representation of what we think is important about them. By teaching our students how to model, we give them the tools to represent and re-present the yet-to-be-encountered.
The question then remains: how do we teach this interrelatedness of things and their properties. Should there be a knowledge representation course which covers modelling more abstractly, or should it be covered within context, when teaching subjects such as relational databases, TEI encoding, or virtual world construction. Should knowledge representation be the theme that binds the disparate strands of DH together, or a theme. If knowledge representation, as Willard McCarty suggests, is the coherent or cohesible practice that binds all of DH together, then it follows, that KR must be at the centre of our pedagogy.
“Data modeling for early modern emblems”
Emblem studies are unique in a number of respects. First, they do not belong to a single scholarly discipline or academic department, but address a wide range of research areas, be they philological, historical, or art historical. Second, emblems, which ideally consist of a motto, a pictura, and an epigram, constitute intricate combinations of texts and images that appear in a variety of media. Printed emblem books are merely one manifestation of emblems. Emblems can be found in architecture, paintings, manuscripts or majolica. In addition, there are multiple interrelations among these emblem forms. Emblems on paintings may be copied from engravings, or vice versa. Accordingly, emblems can pose challenges for digital humanists.
In order to facilitate data exchange and the capture of emblem data, a common data model was developed by the so called OpenEmblem Group, an international group of scholars with interests in digital humanities, whose aim is not only to improve data exchange and aggregation, but also to develop new ways of conducting research on emblems by taking advantage of the new technologies. In recent years the University of Illinois at Urbana-Champaign, USA, along with the Herzog August Bibliothek, Wolfenbüttel, began building a portal for emblems in a project funded by a joint initiative of the DFG/NEG. Wolfenbüttel designed a XML schema, the so called Emblem namespace, based on a set of categories describing emblems formally developed by Stephen Rawles at the University of Glasgow (the so called spine of information). Furthermore, Illinois established a registry for a Handle service allowing the unique identification of each emblem worldwide. Iconclass notations–one of the most important art history standards for indexing pictures–were added by experts in Rotterdam (Netherlands) and Marburg (Germany) and delivered via OAI-PMH to Illinois and Wolfenbüttel to enrich their transcriptions of mottos by descriptions of the picturae. Other contributions came from Utrecht (Netherlands) and Munich (Germany).
My statement provides a brief account of the development and components of the data model for emblems and of how the model was made operative. It sketches the outlines of further developments, e.g. the inclusion of SKOS elements for the description of Iconclass notations, and more generally, the adoption of semantic web techniques revolving around issues of persistent and reliable identification and seamless integration of emblem data in different kind of contexts. By the same token it demonstrates how the various projects initiated by the OpenEmblem Group showcase how future collaborative research may be put to work in a distributed web based environment.
“Analyzing linguistic variation: From corpus query towards feature discovery”
In the study of linguistic variation (dialect, sociolect, register), we can distinguish two types of analytical situations: the feature-centric and the variable-centric. In a feature-centric perspective, we start from a given feature (or set of features) and want to derive the variables (e.g., place, social group, user group) associated with the given feature/features. This is a typical situation in dialect studies, where we are interested in the geographical distribution of a particular phonetic realization (e.g., +/- rhoticity and British dialect areas). In a variable-centric perspective, we start from a given variable (e.g., a register) and want to determine the features that are typically associated with that variable. In the feature-centric perspective, we obtain the necessary information (typically a frequency distribution of a feature) employing a corpus query approach, by means of which we extract the instances of a given feature from an appropriate set of language data. In a variable- centric perspective, we are faced with the problem that we may not know a priori what the relevant features are; instead, we have to find ways of discovering linguistic features potentially suitable for analysis.
In my presentation, I will illustrate these two perspectives with examples from the study of register variation in the scientific domain, looking at selected lexico-grammatical features and the variables of discourse field (scientific discipline) and time (diachronic evolution of registers) (cf. Teich & Fankhauser, 2010; Degaetano et al., 2011; Degaetano & Teich, 2011).