Opening Keynote Address (March 14):
Wendell Piez, “Data Modeling for the Humanities: Three Questions and One Experiment” (paper, slides, video)
[Julia Flanders] I would like to say first that I am very happy to welcome you all to Brown University on behalf of the Women Writers Project and the Center for Digital Scholarship, and also the Brown University Library, and second that Fotis [Jannidis] and I are very happy to welcome you all to this workshop, entitled “Knowledge Organization and Data Modeling in the Humanities,” on behalf of the Center for Digital Editions at the University of Würzburg and the Brown University entities that I just named. So a few words of housekeeping — welcome, warmth, thanks, all the rest of it — before we get going: first of all, a few words of very warm thanks are due to those who supported the workshop in many ways and whose contributions I think we are all going to be enjoying and benefiting from over the course the next few days. First of all, our major sponsors: the Deutsche Forschungsgemeinschaft and the National Endowment for the Humanities; also the Brown University Library, which is sponsoring our reception tonight; and Brown’s media production services are providing the videotaping and the streaming, and thank you very much, Nick, for being with us; and my colleagues Elizabeth Piette and Gosia Rymsza-Pawlowska, who have been and are about to be a tremendous help to all of us in making this event run smoothly, so you’ll see them at various points throughout the course of the event — Gosia’s over there in the corner (laughs) — so thank you very much. And of course our greatest thanks go to all of you for actually coming all this way — in some cases, tremendous distances — to contribute to this workshop, and we are hoping that it will yield a lot of insight and benefit during the course of the event itself and also afterwards, in the form of the various publications we are going to create from this. (Event logistics omitted) We are being videotaped, and the video is being streamed live, and do not let that freak you out, but the point is that is a very small group — necessarily so because of the scope of the funding and the nature of the discussion, which is very much about discussion — but we wanted a way to involve the virtual community, and so the live streaming is one way and also the video will be archived and made available for future reference so that we can all see ourselves and remember what we said, remember what each other said, and things like that. By virtue of that, and as part of the virtual participation, we have asked a few people in particular — Trevor Muñoz and Kari Kraus — to pay attention to the Twitter stream, which we hope will be vibrant, and to give voice to any questions or comments that come in via that stream from remote participants, and we may also get some comments via email, as well. So anyone who is following Twitter, anybody who does that during the course of conferences, should also feel free to be a surrogate voice for those who are not here; this is experimental — we are all going to be feeling our way, so let us just use our good sense and hopefully it will be fun. And finally, because this is a workshop rather than a conference and we are all more or less known to each other, we are not going to do elaborate, full-bore introductions of each speaker when he or she speaks, but if you want to find a full biography for anyone here, it’s on the website at datasymposium.wordpress.com.
So, with that I will just leave you with a few thoughts about what we are trying to accomplish here. This workshop is really an attempt to think about what “knowledge organization” and “data modeling” in the humanities really means, and to consider how to bring it into greater visibility within the digital humanities — and within the humanities as a whole — and to understand how our modeling approaches influence or shape humanities research practice. So, the presentations, the panels, and all of this wonderful material that we are bringing into evidence here constitutes what we have to work with, constitutes the material we can grapple with in considering these questions, and the discussion, I think, will have at least as much weight as those more formal interventions, and perhaps more so, because it’s in the discussion that we really have a chance to really step back from the specific cases and the particular projects that we are very invested in and think about what they can teach us more broadly about data modeling, how we think about data modeling both in a specific way and also in a more general, metaphorical way. So I urge you as discussants, as questioners, and as presenters to think about what you are saying in that larger light; please take a look at the questions that are included in your schedule and ask those questions and others like them whenever it seems useful, and all of the discussion and presentations will be synthesized by Fotis [Jannidis] and me — our next mission — into a final white paper, which will reflect everything that has gone on here and will be published electronically along with the slides and video and so forth. So with that, I will turn it over to Wendell [Piez].
[Wendell Piez] Thank you, Julia. This is really an honor to be invited to address all of you and the world, virtually, the little piece of the interested world that may be looking at this now or in the future on this topic. It is really quite a moment for us, I think, because we are working on a basis of twenty, thirty, fifty years of successful work in the digital humanities — and it was not always called that, of course — but we are also beginning some amazing things, so I think at that moment this is a fabulous opportunity for all of us and for everybody, really, to be thinking about these questions. I would like to start by thanking everyone for being here, and thanks to Brown and to the NEH and the Deutsche Forschungsgemeinschaft, and to everybody who is helping us to do this work including our home institutions, of course, because this work, as one of the themes you will see emerging from what I have to say, is by nature collaborative — something that we do together — and that is both interesting and important when we think about what the humanities is and because the stereotype of the humanities scholar as being a very solitary individual. Of course, that has never been the case, but it’s something that has come to be embedded in the culture as a notion, and it is one of the questions we want to be keeping in mind as we go forward with this. Basically, I have plenty of time this morning and I am probably — hopefully — not going to be using it all because I want to open up the opportunity for everybody to respond. What I am going to be doing is presenting a number of slides with illustrations and talking on the theme of data modeling in the humanities, the theme of our workshop, and in the course of that I am going to be showing you a demo. I hope the demo does not swamp the conversation because the demo is something we can all be doing, and I hope all of us will be doing that within the course of these next three days as well, but I think it is also important because we need to be concrete; we need to be talking not just about these big ideas but also we need to be talking about what we are actually doing and why that i’s interesting, rich, and useful.
I have three questions and one experiment. The experiment, of course, is where the demo comes in. The three questions — question number one is “What do we mean by data modeling?” — and I wanted to pose you that at the beginning because we are going to be asking it for the next three days. And that is a naturally humanistic question, right? I mean, how do we even begin to talk about what we are going to talk about before we know what it is that we are going to talk about, right? This is as old as philosophy, this problem. Secondly, I am going to be asking “What about markup?” This is a question that’s dear to me because I work in the field of markup technologies, and I know that it is also important to many of you, but I pose it partly because I know it is not a critical question to everybody in this room, and I think one of the things we want to be thinking about as we move forward is, “What is the role of markup technologies within digital humanities?” And I pose that as an honest question — it is not a rhetorical question for which I have an answer: “Of course, markup technologies must be the center of the digital humanities.” That is not really what I mean at all. On the other hand, I do find them to be essential for reason I will try to communicate to you, and even for those of us to whom they are not essential, I think they continue to be very useful and important. So, I am posing that question also as an open question. Then my last question — “What about schemas?”— is kind of the outlier among my questions because it seems like it is really there so you can have three. Well, actually not. I think in some ways that question is really going to be the focus because when we start looking at my demonstration what you’ll see is that there’s a point where my demonstration kind of leaves off with, “What do we do next?” I am going to have some things to say about schema technologies and the relationship between schema technologies and markup languages, their relationship together, and the larger questions of data modeling. So it is as if each question bears down into a more narrow concern which then also reflects on the larger question. Julia has very kindly offered to help out — sometimes when I get excited my hands fly off in all different directions – and so we have agreed on a code word. The code word is “Julia, I am freaking out.” And if I say that, then Julia will quietly, deftly take control of the mouse and take it from there. So, what do we mean by data model? “Julia, I am freaking out” (laughs). We have got some nice pictures here and I want to do some zooming and stuff. Julia is going to help with that. You will remember that this little icon up there will give you zoom capability.
So, of course, what you see here in the upper right is a Lego model of the Brandenburg Gate, and that is the data model. The Brandenburg Gate can be conceived of, if you like, as an artifact, a historical artifact, and a text, maybe — you might “read” the Brandenburg Gate. And of course, this is a model of the Brandenburg Gate, and it is conceivable that we might learn about the Brandenburg Gate by building this model. We might learn about the Brandenburg Gate itself by building the model, and we might have our curiosity about the Brandenburg Gate and what the Brandenburg Gate is in this place in history and so forth, stimulated by our interaction with this model. So Julia, if you can zoom out again. One of the interesting things about this to me is that when I work in my field with data modeling, I am not actually thinking about building models of the Brandenburg Gate; I am thinking about enabling others to build models of the Brandenburg Gate and other things like the Brandenburg Gate. So from my point of view, the data model that we are dealing with here is actually the stuff on the lower left — the Lego system itself is providing a data model which we can then build models out of. And this is something really important that I want to point out going into this, that when we talk about models we’re already confused about what we are talking about, because we are talking about several things at once. We are talking about a technology that allows us to build things, which gives us a model, these little pieces that interlock together, and there is sort of an internal consistency, an organization, but at the same time a range of possibilities that they offer us. At the same time we are talking about the things we build out of those little pieces, those building blocks. And then, if you can zoom out again, this is complicated even further by the fact that we also have rules about how we put these pieces together, for example, and we might conceive that it might be possible we might have a set of rules about how how to build eighteenth-century neoclassical, monumental architecture in Lego. Not the Brandenburg Gate necessarily, that might be one instance of our schematic neoclassical architecture of the eighteenth century, late-eighteenth century; then we have a scoping problem: are we going to take this all the way back to the Renaissance? We can conceive of this problem of defining what it is that we are going to model as itself a modeling problem. So, of course, in the picture here, what you have is a schematic view of the rules for assembling a Brandenburg Gate. When you buy the Lego set, you get this Brandenburg Gate that they have designed and implemented for you and you can build it and follow these rules, and that’s kind of the schema, of course, but in this particular case it’s the schema for the Brandenburg Gate, it is not just anything. And then, going back to the longer view, finally you have the fourth outlier which is a model of an artifact at a particular point in history, a particular place in time. This is the Brandenburg Gate in 1985, and one of the things that you’ll see when you consider this model is that in order to build this model, I had to go outside of the capabilities, the affordances, of my modeling architecture because I did not have enough Legos to build a wall, so I used alphabet blocks. And then for the sign, I had to go really out there and that’s just something we see, too. So, in the general theme of data modeling in the humanities we see this issue that we have tools and technologies to build models that we then interact with and learn from, and yet, at the same time, in order to to actually achieve our goals — of representation, of publishing, pedagogy, and so forth — we are also being somewhat opportunistic. There is a bricolage aspect of our work, where we are pulling together things that aren’t even within the scope.
So, let us go on to the next slide. In our case, of course, this problem runs head on to another interesting and profound problem, which is that we work with text; now, of course, we then have the problem of defining what “text” is, and that itself is an interesting problem that you could have a seminar on. Text is not ever really just text. Text has history; text has context. But at the same time, text is something we make stuff out of — text is our building blocks; text is the stuff we use. It is an encoding technology; it always has been an encoding technology. That is one way of defining text is an encoding technology. The cuneiform brick you see there on the left which is in the British Museum is really interesting because it represents a fairly, quite early instance of text. One of the interesting things about this is that, I am not a scholar of ancient Near East by any means, but I have read enough about it to understand that even at the beginning text and markup were deeply interconnected, and this brick has both text and markup on it, and it is hard to tell the difference between those two things. One of the things that we might say when we talk about text is that text is markup, or, you cannot have markup unless you can have text — but there is this sort of interesting, interesting problem there. Also on this slide, zooming back out again, you see this photograph of this fabulous edition of Petronius, it was an early eighteenth-century edition of Petronius.
This picture was taken on the roof of a car on a mountain in West Virginia — in Maryland, actually, across the river from West Virginia. One day I happened to be looking for an out-of-print book and I found this guy on the mountain who sells them on the internet and he happened to have this in his library. I said, “You have to let me take pictures of that. Would you let me take pictures of that?” And this is of course another beautiful illustration of where text takes us, because text is not just text here, right?
This particular edition, Petronius, the fragments of Petronius, would not make the 600-page volume by themselves. This thing is heavily annotated and edited, and of course, then you have text within the text and you have a lithograph which has text on it, the picture of a monument with text on it, and you just have layers upon layers. And this is part of our problem: that we are simultaneously trying to study and model something at the same time as we are also using that same thing we are studying and modeling in order to achieve our goals. This especially becomes interesting in the digital realm because, of course, digital text is another thing again, because the computer, if nothing else, is a device for encoding and a device for translating between abstract representations — it lends itself very naturally and properly to text processing — and in some ways, a computer is a text processor, fundamentally. So in our case, we have the particularly interesting problem of modeling with digital text.
Let’s look at the next slide: I want to bring to your attention how when we are talking about modeling, part of what we are talking about is the part of our system that we are not going to be thinking about. This gets back to this business of layers: just as experienced readers do not really think about the letters of the alphabet as we read the page, we do not even necessarily think about the words; they are, in a sense, almost communicating themselves directly into our minds as we read. The same thing is true of technologies. Part of the way in which we build these things is to embed their design in such a way that we do not need to be consciously interacting with them, in the way that we do not need to worry about the specifications of the Lego architecture when we use Lego. Interestingly enough, we live in a world where if you want to do that, you can. You can go up on the internet and get this fabulous SVG diagram that shows you the specifications of Lego architecture. I did not draw these beautiful yellow and red blocks, I just pulled that down and it’s a 4K SVG file that I was able to drop in — nice illustration, in a sense, of what we are talking about. So if you do want to improvise with Lego and go outside of the bounds of Lego, then you can find ways of doing that. But nevertheless, we have this thing where if we want to design and build something that we are actually going to be able to use, not just as individuals who are deeply invested but as a community, we are also going to be designing and building systems that have things about them of which we are going to be unconscious. And this is obviously be the case with us, right — this is trivial. The fact that many of us can map between ASCII code points and Latin letters is what identifies us as geeks, and most people do not use this and do not need to do this.
Let us get back to the problem of digital text. So, here you have this very interesting idea, and do not think that this spectrum here is by any means formally organized, this is much more impressionistic than it looks. But there is, in a sense, an implicit hierarchy in the way we actually work with text, and in particular I would like to draw your attention to the line between the XML region here and the “graph models” region here, because that is actually a particular line of interest to me. What is happening here is that as we move up in terms of the constraints that we are willing to enforce and the cost that we are willing to bear in order to build and support an infrastructure, we are also moving up in terms of the power that we get out of the technology. And this particular line, where we are crossed from multiple hierarchies into one hierarchy, is the line where we begin to get XML, because XML of course, famously, in its own self, only represents one hierarchy at a time. Now this is not to say that you cannot use XML to represent more than one hierarchy, but you will do that by using XML’s tree structure to represent things that are not trees.
What you get is as you go up the scale, you actually get the capability of representing things that are more and more and more complex, and more and more capable of being optimized for processing within the computer, as well as being optimized in the sense that they come pre-built and pre-packaged so that you can use them without having to develop them yourself, because they are supported by standards. At the same time, there is a certain loss of expressiveness because certain commitments have already been made. And that is in a sense almost the fundamental dilemma of the text encoder is that we are text people; we want to be on the low end of the scale here, in the sense that we want to be loose and shaggy and improvising, and to use the text to represent the world in arbitrary ways the way we are used to doing when we write into a text editor or doodle on a page. And yet, at the same time, as long as we are doing just that, we do not get any of the power that we get when we go down and bear down and make those commitments into the supporting technology that let us actually build things and use things let us actually build applications and deploy technologies.
This is a fundamental trade-off. My answer to that fundamental tradeoff is to say not that we want to be at the low end or that we want to be at the high end; my answer is to say we want to be both. We want to be able to move up and down; we want to be able to start with just XML then compile it into something that is not XML, having made that commitment. Yet at the same time if we are not ready to make that commitment, we want to be able to do something that is much more free-form, improvisational, playing around — and then find out what commitments we are willing to make in order to get what power within what stack or what architecture. I am going to come back to this, but before I move on, notice that within XML you also have certain kinds of choices that you can make in terms of commitments. One of the things, for example, that we have seen that is very interesting and very significant within the last ten, fifteen years is the advent of much more general-purpose text processing on the computer platform apart from the word processing, desktop publishing application. And that comes largely because the development of the XML platform has allowed us to go lower and get more loose and improvisational than, for example, you can do with relational databases, which are much more at the higher end, where you have to make your commitments up front, and having done that, you get the power of the platform’s capability to do operations with that sort of data.
Let us go to the next slide. This is where we get to what I am interested in in my hobby time: imagining if we could get the benefits of markup without the early commitment to a single hierarchy that XML requires of you.
Of course, a number of you know that this is a special interest of mine and have seen me present on this. What I want to show you is my latest work in this area, which Julia is going to get up in the browser here: LMNL — layer, markup and annotation language — is an experiment, a technology which is designed as a markup technology by myself and a couple of colleagues and cohorts in different parts of the world that is designed to give you the capability of markup but not require that single hierarchy that XML requires of you. So what we have, for example, over on the left — Julia’s going to open up the LMNL file — we have the very same idea that XML has where you have got a text and you have got characters in the text the are reserved as markup delimiters and then, using those characters and following a certain set of rules, you can mark your text. In particular, LMNL has two features that are very important in the context of XML, for example — variances, differences from XML. Number one is this business of allowing overlap because LMNL does not build a tree out of the markup that it sees. Instead, it uses a “range” model, where the text is conceived of as a plain text and then identifying an arbitrary set of ranges over that text. Secondly, the second feature is that LMNL supports arbitrary annotation, and what we mean by arbitrary annotation at this point is that ranges can also be marked with annotations, and annotations are isomorphic structurally to documents, so annotations can be marked up, and annotations can be structured in just the same way that a document can. It is as if your attributes could support structure. For example, in this instance, you can see that the text start tag ends all the way down the line before the body start tag, and all of that stuff inside of there is all inside of the text start tag, and it represents an annotation over the range that is been identified with that start tag. So for example, in this case, you can see that metadata for this file, the little bit of metadata, lives very naturally and nicely within an annotation and therefore is reserved out of the text in that way. And of course, it is interesting because of the way annotations work, you get a tree — it is just that the tree is not structuring the text as a whole. The tree is merely the way in which you can elaborate the text in tree structures that then recurse in a tree-like way.
Julia is going to go over to the main screen and then jump all the way over to the right side. [Motions to Julia, controlling the program.] Actually, before you do that, I will just comment on what is happening here. What this implementation does is that it actually uses XSLT. I know that drives you crazy, but it uses XSLT to parse LMNL tagging. That is what is going on with “Step 1,” “Step 2,” “Step 3.” [Points to onscreen chart.] It is going through a pipeline where each step is taking the results of the previous step and taking it up another little notch, up into a full-blown model of a LMNL document. And so that xLMNL [that] Julia is going to drill into here, you can see that by this point it is gotten to be a much more abstract thing — you can see how the annotations, for example, are nested all within that start tag. And if you scroll down, you will be able to see that you have all these spans that represent the text itself, and the spans are then cross-indexed. And if Julia goes down to the end of the file, you will be able to see that at a certain point, all these spans end and you get ranges, because these elements in the XML are representing actual ranges defined over the text, and the identifiers on the ranges will then index into the spans. There is a certain amount of redundancy here, because this is basically a version two of this format that I built after the earlier experiments where I decided I would save myself a lot of work by saving some information. So there is a certain amount of redundancy but then, in subsequent processing, can be leveraged and taken advantage of. So if Julia goes back, then you can see the bubble graph over on the right is an SVG representation of the LMNL document. Julia, if you switched applications to the Squiggle application. [Julia switches.] This is an SVG browser; Squiggle is part of the Batik Project, an Apache SVG project, and it has the virtue here of being able to, unlike a web browser, it will let us drill in and zoom in all the way as far as we want to go. So Julia can just grab a bunch of this little text here, and if you — this one is actually a different text. This is a depiction of the novel Frankenstein by Mary Shelley, the 1831 edition. Actually, before we go back to the first example, let me show you something for those of you who are students of this text. I know this is one of the most phenomenally interesting texts in so many points of view for our work. That bubble in the middle that is marked “nar”; “nar” in that markup of this text is how I marked “narrative,” “narrative structure.” And one of the interesting things about this novel is that it has nested narratives, and the nested narratives in this novel actually overlap chapters in a really kind of bizarre way, because it starts with letters, and then chapter 1 starts, starts a new narrative, and then around chapter 10 or 11, I forget which, the monster starts speaking. And that bubble in the middle is the creature’s, the created entity’s five chapter-long narration of what happened to him and how he educated himself by reading Milton and Volney and so forth and so on. Then at the very end, you see that big bubble, “chapter,” is the long last chapter where the victor, Dr. Frankenstein, leaves off his narration and you pick up again with the frame. In the bar depiction on the left, you see those yellow bits there, those are where quotes occur in the narrative. And so you can actually begin to get a sense of the shape, the architecture, of the way in which this novel works and where you get the density of direct interaction between the characters and so forth. Just imagine what we are going to be able to do when we have the capability to do this kind of very free-form markup over text. Because narrative structure is only the very beginning of what we are going to be able to mark once we actually have the ability to mark things without locking ourselves into one hierarchy and then use a lot of fancy, difficult workarounds that don’t really fit into the hierarchy. We are beginning to get a taste of that with LMNL. So if you just the forward button, the green arrow there, I believe that will take us to the other one. [Slide changes. Technical issues] This is a narrative poem by Percy Shelley, “Julian and Maddalo”. Really fascinating piece of work because it is a Shelley poem and if you know Shelley, he is a high romantic. I think of it as a Shelley poem wrapped inside of a [Lord] Byron poem wrapped inside of a Shelley poem, because the poem is about a conversation between two characters that represent Shelley and Byron, and the form is actually more Byronic than Shelley because it is rhymed couplets, which is a form that Shelley does no’t really use. But also the rhythm, the cadences, the diction seems very, very Byronic; the text that Julia grabbed is right out of the middle of this poem, and if you scroll down — well actually, right here — you can see how over on the lower right you have a pilcrow mark that occurs right there. That mark identifies the break in verse paragraphs. And what we have there is a verse paragraph break that happens in the middle of a line. Over at the bubbles on the left side you can see the line groups that represent rhymed couplets. And there is this one tercet in the middle — “Thou sealdest them with many a bare broad word, / And searedst my memory o’er them, —for I heard / And can forget not . . . they were ministered / one after one . . .” — we have a tercet, which is a classic English way of breaking out from the couplet structure and getting this little bit of motion and energy into the text. If you scroll down, [gives directions for onscreen demonstration] all the way down to the end of the poem. The poem structurally has got this long, long quote, marked as a “Q”, which is a speech by a crazy madman that these two guys go and find in a tower in Venice. After that story, you have these three “VP,” verse paragraphs, where suddenly the tone changes and it becomes much more sedate — well, sedate, no, but — composed. And there’s that Byron sense, right? Where in terms of the narrative also suddenly the time jumps, it is suddenly many years later, and you can actually see this structurally because you get these three verse paragraphs that are, I think two of them are 36 lines long and one of them is 37. Maybe there is a verse paragraph break in the middle of the line again. The poem formally becomes much more regular again at that point. For those of us who were students of poetry and poetic form, to be able to see poems like this and to work with them in this way is really kind of an interesting, fascinating way of doing that classical, old-fashioned humanistic exercise of connecting the poem’s form to its theme to its thesis. I can show you more about this, but let us go back to the demonstration.
One of the things that I can do is pull XML out from LMNL, which turns out to be really useful because if you want to format the text it is actually a lot easier to do that out of a tree, not impossible to do it. And in fact, I think that there is no particular reason why we cannot have technologies that do it very fluidly and flexibly and powerfully do formatting over a range model. But the technologies we have, and that I know how to use, are XML based. By the way, I should say in passing that although I think what you see is really cool, I am not a programmer. This is done by somebody who is just figuring it out as he goes along. And I hope and expect that people who are serious computer scientists and serious programmers will start looking at the problems that I am just beyond my depth in dealing with. Looking at, how are we going to actually do this kind of work in a much more sustainable way where it’s not just improvising. What I am doing is that I am working in that XML space then forcing myself down that slope out into something that is not really XML at all, then pulling myself up again so that I can be comfortable because this is where I am very comfortable. Let us just pull out from this and try to go back to our general themes.
I think the answer to this question, “Can we conceive of a viable markup regime?”, the answer is, “Well, absolutely yes, we can conceive of it.” Now can we actually build it so that it really works? Well, that is still a question mark. I think you can see some tantalizing suggestions that such a thing is possible. I also think it is important to keep in mind that this is not an either/or thing, right? I mean, if this works, this is going to work because it works along with and complements the technologies we already have. We need to be able to be using that XML space to take advantage of the power of the single hierarchy, even while we’re trying to deal with problems that do not lend themselves to that approach.
Let us go to the next slide. There is a particularly interesting problem that you begin to deal with when you think about processing ranges, and that is that ranges are really, really good for some things and really expensive in other ways. And this is the stuff that I am going to need the computer scientists to help me with. The middle graph here is a tree-structure, but you ca not do this tree structure in XML because as you can see — this text here is the opening quatrain of a Rilke sonnet — you see that where you have overlap, where for example you have a sentence that begins in one line and ends in the next, you have this interesting breakage or confusion in the way the structure represents itself. If you zoom back out again, you can see in the marked up version of the text how you could get from the markup into this kind of tree just reading the markup in what I think was a naive way, where every start tag identifies a node. And then your inheritance, your relationship, the dominance relations in the graph are inferred by the relations between the tags. I think this is interesting because we have talked for a number of years, when talking about the overlap problem in XML, about graph structures and in particular the GODDAG structure that Michael [Sperberg-McQueen] and Claus Huitfeldt felt and that a few others have worked on in this space, is to try to architect something that gives us the power of graphs and yet also provides for overlap. The problem as I see it is that the tree Julia just showed you is not actually the right tree; the right tree is the one on the lower right where in fact, of course this is only part of the tree because there are going to be further-up structures on which these guys hang off of, but the quatrain is neither a child nor a parent of the sentence, even if a sentence contains quatrains. You do not have a child/parent relationship between quatrains and sentences. Sentences and phrases occupy an entirely different hierarchic domain from quatrains and lines. The interesting aspect of this problem to me is how do we get from here to there, right? How do we get from the flat ranges into something which actually represents intellectually the abstraction that we conceive of when we want to design a text that we can work with and actually build optimal processes for query, formatting, or what have you. So this is a research area, and I am by no means qualified to conduct this, so I am going to need help.
Yet at the same time, I think we also do know some of the features, some of the answers to this. If you go to the next slide [next slide], this is where you get schemas, right? What is a schema? In working with schemas in the context of XML, we have learned a lot about how useful schemas actually are for all kinds of different things. In fact, it is sort of interesting how you can reflect on these things — if you can think of another application or area of schemas besides these three I have listed here, by all means buttonhole me and tell me about it because I am trying to comprehend this problem. But schemas do validation in the sense that we apply a schema to a text and get an answer as to whether the text conforms to the schema. And this is very, very important if we are working within an architecture where we need predictability, right? So that we can have a criterion, a set of rules that we can then apply and then we know our document conforms to that. And of course that is the way that we ordinarily present schemas — that they are a way of giving us error messages — but that is actually a kind of transformation, because the document is actually being transformed by virtue of the application of these rules into a set of reports about the state of this document, the relationship of this document to the set of abstract rules.
Then of course the second annotation enhancement which relates very closely with the first, because one of our sets of annotations might be reports on which rules have been followed or violated. And in the XML space we see this very interesting blurring in application between these two uses of schemas because schemas are also used to do data-type annotation of documents. So if we have a schema, we can say, “That 1-0-0 over there is a number,” and compile it into a number, make it a double. But this 1-0-0 just happens to be a random string; that one, you do not make a number out of that. Typically information is not held in the document itself — we use the schema to give us that bit of information — and the amended document is in a sense a transformation result. In the XML space, we call that the “Post-Schema Validation Infoset,” which is the information set about the document after it is been through the validation process. And finally, the third one is that we use schemas to configure our tools, right? We drop the schema into your editor and your editor knows which tags are valid at any given point, and so forth. And all of these three applications are applications that bear directly on this problem of getting from the naive structure to the true or correct or optimal structure. Because in principle it seems to me–and, you know, I say in principle because I do not know the details, right? I just imagine that there i’s a role to be played here not by document instances per se, but rather by ancillary information that is aggregated with those documents and presented with those documents in order to make them richer and more capable in our systems.
Finally, on this slide I want to stress how important XML is to this, not because we are going to be using it for everything, but because it gives us a platform on which to work. And I have this nice picture I got off the internet of the International Space Station. Of course, it is a very expensive project. Well, why did we build a Space Station? Well, we build a station because we learn about space stations. And we learn about the world. And we learn about space travel. We learn what we are going to know when we go from that space station out into something we cannot reach with a space station floating near earth. I see the technologies they are working with as having a very similar kind of relationship where you might have the guy who works in the space station saying “Why would anyone want to go to the moon?” Well, I do not think they actually do, but that is kind of what I think of whenever I hear an XML markup person say “Why would anyone ever want to do overlap?” Well, I do not know, but there are people who do, for whatever reason. And of course, they go together, right? XML gives us a fantastic platform, as well as being something that is worthwhile doing in itself because we can study the Earth with it. Let us go to the next slide.
So this is a flowchart diagram that a number of you have seen, but it is amended somewhat, because basically what I am imagining here is a transition from the current architecture of document processing technologies, specifically markup technologies, which is really oriented towards what we call “lights out” publishing. Oriented towards the seamless and friction-free production of documents within a processing environment to create certain kinds of outputs, probably formatted outputs. And the whole idea there, the role of schema there, is to provide that enforcement mechanism, that filter, so that you know what is fit to process for the system and if it is not, then you kick it back and you fix it. Whereas in the architecture that I think we really need within the humanities is significantly more complex, for one thing, which means it is going to be harder to build and maintain, but it is not really focused on that final product as its output. It is almost like that final product, the result that you get in that lower right there, is a side effect of the system in the same way that the output of a program is the side effect the program. Really what we have got is the system is there because it is an opportunity for us to learn about what we are studying and, with respect to that, schemas and specifications are not a set of external rules that we apply simply to know whether something is good and bad but themselves are something that we are building and developing and I think all of us who have worked with markup in the humanities know this very well. Trevor [Muñoz} has put it very nicely when he says that every TEI project is a TEI customization project. It is not always represented that way, and yet the people who capture the spirit of the TEI as something which is at the same time a shared community-based platform and also a basis for research knows that that that is kind of the idea; that we use it not just to do things the same way as everybody else but to do things in different and interesting and new ways that actually apply to our research problem. So, within the the flowchart on the right you have all kinds of opportunities for asking questions and for reengineering things and the analysis process is really the goal as well as being the means of achieving our goals.
So let us go on to next slide. The final question I want to leave you with, which stems from the others and which comes back to that question of data modeling is about what role we should be playing as humanities scholars in this, and this again is a question I do not have the answer to. The answer to, “Do scholars get their hands dirty?” is “Well, yeah,” but on the other hand, “Well, no”. Scholars are people who work with people, get their hands dirty, but they get their hands dirty while they are working with the guys who are getting their hands dirty. That is the way it has always been, and I think this is really important to us as we move forward in the next three days because when we are talking about data modelling we want to keep in mind that we do not necessarily even mean the same thing by data modelling. And so as I finish up here what I want to remind you is that you know I am sort of putting forward these ideas and these issues but I do not expect that you are going to necessarily agree with me. On the contrary, I would be kind of disappointed if you did, but I am not going to know until you ask your question whether you are asking your question because I got it wrong or because you did not know what I was trying to say. You might have your own idea of what I was trying to say and you might be right and I need to be corrected. But not about what I was saying but what I thought I was saying when I said it. I hope that we are going to be doing three days of this because this is the tradition and we need to be asking each other and assuming that you know this person is a person wants his hands dirty but he is working for somebody who does not want to get his hands dirty and she has a right to work without getting her hands dirty in that particular thing because she she has other things that she is thinking about and part of what we are doing as builders and workers in this in this domain is working with people who have other priorities for learning and doing and to what extent where they are going to get their hands dirty and we should do that without prejudging about the right way to approach it on because we really benefit from the differences in our approaches that way. So with that I will go to the last slide and invite you to respond.
[Laurent Romary] I take the entry point one of the tweets from Toma Tasovac (points to fellow audience member) asking why do you not do that in Standoff? And I tried to in the last half an hour, I just put that together so I am organizing your thoughts so I a’m trying you tell you what you should be thinking about. (Laughter) Taking the metaphor of Frankenstein, basically, what you are doing with LMNL is a chimera, in the sense that you are playing Frankenstein with markup trying to merge bits and pieces in one single flat structure (flat on the screen) which is a chimera in a way. And this question of standoff has made me think well data modelling is also like […], organizing bits and pieces which have their own autonomy. And standoff is about identifying what I sometimes call crystals and things which you do not want to put together. And this is exactly what you said about sentences. And this makes sense also for scholars because they have got word flows through which they think about the texts in a certain way and then think about it another way and you do not want to force all those ways together even if the technology allows you to do so.
[Wendell Piez] Right absolutely. There is a design problem there as well. And what I got up here is the XLMNL which is the form that this levitated, that is stepped into, but this is not standoff. I mean the answer to Toma’s question is I am using Standoff. It is just that it is not my initial representation. I do not see Standoff as being one way to do it and then versus other ways.
On the contrary I think that what we need is a system that allows us, just as you are saying, different ways of approaching the representation or models even at the same time as it allows more flexibility in the modeling itself. So by all means the standoff is the best most efficient way for you to manage your data set with all its complexities, do that. I am not trying to say that that is an inappropriate or wrong way to do it. In stressing that point I would like to point out how these ranges index into the text using standoff. That is a standoff representation, and it just happens that in this particular case its internal to the system because my demonstration is not a demonstration of standoff as such. My demonstration is a demonstration of how to parse a markup syntax into something that you can then work with in these ways. So, I do not really see that as being at all an opposition.
The other thing that I should stress in that context is that when we designed LMNL we set out very specifically to conceive of the design of LMNL itself as an abstract data model, irrespective of the syntax. What that means is that that you can use Standoff and pull it into the LMNL model. You can compile it if you would like. You can use this what we call Sawteeth Notation compiled into LMNL or you can use XML and in fact one of the things on my to-do list is to refine the little XML tag set that you could drop it into arbitrary XML so that any XML can be parsed into LMNL. As some of you know, I have done experiments on that basis in the past and they work to do very similar things to this.
A number of you saw the earlier version of this experiment which is doing sonnets in English. What you have here is this phrase business with the Rilke poem. As you mouse over you can see that the range indicators pop up. And if you mouse over the bars there you can see that the particular range is highlighted. And this dramatizes and you can really see when you get lines up the difference between different sonnets in the way they work and the earlier version of this was using XML source. I know that LMNL syntax is not going to be the only way to do it. In fact I suspect that as things get more complicated it is going to get really, really hard to use because the of nesting annotations gets really confusing to read and so we’re going to need tools as we have with XML, if we are going to use that syntax on a bigger scale. And similarly we are going to be using Standoff notations.
[Syd Bauman] We need tools as in LISP to process them?
[Wendell Piez] As in LISP? Yeah, well we already have LISP. We need people to write the LISP.
[Syd Bauman] But the LMNL LISP functionality will allow for tools for better analysis for getting LMNL to work.
[Wendell Piez: Yeah well let uss take this off. Yes?
[Julia Flanders: I have a naive question. Just looking at the series of processing stages that lead you from your initial LMNL expression to the–
[Wendell Piez: The LMNL syntax.
[Julia Flanders: Right, exactly. It seems to me that– I guess what I am wondering is, are all those stages of representation one of the above? Or are you– ?
[Wendell Piez] Well they are all XML.
[Julia Flanders] Well in a sense they are all intellectually isomorphic but I am wondering whether the LMNL representation and the XML version you showed us – are we to think of those as being representationally different versions of the same data model or are they different models? Does that make sense?
[Wendell Piez] I think that goes back to my question about what the heck do we mean by data modeling.
[Julia Flanders] Yeah exactly I am hoping it does. I guess maybe the biggest question to think about in a more longitudinal way, put it on the table. By what point when we manipulate the model do we get a different model and to what extent do we think of models as being all the potential for –.
[Wendell Piez] Well the way we work with models is with models of our models.
[Julia Flanders] Okay.
[Elena Pierazzo] I want to go back to what you said about schemas. And I do think that schema are much more than what you said. It is not just for evaluation or something like a workbench for building tools. When I think about each in particular I do remember what I always say to my students, that [a schema] is a model. A schema is a way for making a model in your head in a sort of format that a computer to understand the idea you have about the text or a series of texts. It is a bit different […] the encoded text itself because it is the instance of that model. But actually to me the schema presents the model much better than the text itself. So for me the schema is perhaps a part of the modeling, because it is […].
[Wendell PIez] Yeah, I think that is a number of fair points, not just one fair point. You said a number of things about schemas and in my discussion about schemas in the later part I was really stressing the application of schemas within a processing system. But I also agree with you that schemas are themselves artifacts of interest and opportunities not just for modeling but for exercising our design sensibilities over the system as a whole. One of things that pops into my head that I should mention is when we were talking about some of the stuff over beer last night the question of aesthetics came up and I do not want to let that completely drop even though I cannot really discuss it at length.
But I think the question of aesthetics as offering us at least implicitly and sometimes more than implicitly a set of principles and design criteria is really important and I think it relates to the thing about schemas because the application of schemas and the role of schemas within the system is exactly on that fulcrum between looking at a particular instance representing something in the world versus abstracting from that and looking at it as something which is more general and a member of a family and a demonstration of an idea. In that sense I think schemas are really critical. Even though I know plenty of people who would disagree with you, that schemas are the correct place to do your modeling. I think that in my Lego analogy – you can sit down with your pile of Legos and sort of put them together but you can also design something and that is what we are talking about when we start talking about schemas. So, I could not agree more really.
[Syd Bauman] Not only could not agree more, I also want to take that a step further and say to some of our smaller brains the schema is so much a part of our data modeling methodology that some of us are going to have trouble thinking about data models in the world, the non-true world, outside of [the] XML world–.
[Wendell Piez] Yes that is probably true but I want to stress that you build your schema out of something. And I think that we also want to think about the design of the legos themselves. All of this being said, I want to stress that I hope there are people in this room who do not agree that schemas occupy the central place and will let their voices be heard over the next three days because we have some of us who agree but hopefully we are going to be thinking about it.
[Syd Bauman: When those of us who have boxed ourselves into this “schemas is my modeling capability”, when we have constraint languages for whatever it is you might be using LMNL, GODDAG structures, XP Pro, whatever it is you are using, our modeling capabilities are going to take a big jump.
[Wendell Piez] Yeah I hope so.
[Desmond Schmidt] I just actually had a question about – you probably ca not hear me up there. But you mentioned, if I understand correctly, you are talking about recursive annotations, but you did not justify and I would like to know why.
[Wendell Piez] I did not justify it?
[Desmond Schmidt] You did not say why you use it.
[Wendell Piez] Oh, why we have them?
[Desmond Schmidt] Why you want to make the annotations recursive.
[Wendell Piez] Yeah, aesthetics. We thought it was neat. We liked the idea.
[Desmond Schmidt] But it is complicated.
[Wendell Piez] Well it is sort of a flip answer in some ways but the more serious answer is that those of us who are – at the very beginning this was Jeni Tennison and me just looking at from the point of view of what do people we work with, clients and colleagues and people we work with want to do with XML where we feel particularly strong stresses against XML, things that XML cannot really do. And one of those is the way that attributes in XML are just simply strings and if you want to provide attributes with any kind of internal semantics you need to do that on your own. And to a certain extent data typing is capable of alleviating that problem but it involves commitments of its own. And we thought that given the idea that we have the capability of identifying arbitrary ranges that one of the things that people are going to want to do is make assertions about those ranges that we cannot actually predict or want to actually constrain going in. And so this idea that the analogue to an attribute, an aliminal and namely an annotation, would be structured seemed very attractive to us. And it also seemed like a relatively straightforward thing to do, for developing a sort of generalized range model, to allow that model to apply not just to the document as a whole but to the annotations. I do think that if the idea of LMNL gets any application it might be in exactly those areas where its actually difficult to define the boundaries of your document because the document exists in some sort of nebulous form and the aggregation is happening on the fly. And in that operational context the ability to have an annotation be a fully structured document, which may or may not have it is own markup or structure, it could also just be completely flat string, that is fine too. But the capability to let any document be an annotation of any other document could really be useful. So that was sort of our thinking on that. Chris?
[Jan Christoph Meister] You have got me thoroughly confused.
[Wendell Piez] Is that good?
[Jan Christoph Meister] You have got me thoroughly confused, and that is good, because I am still stuck dealing with your initial question of, “What do we mean by data model”? Do we mean modeling something by using data? If that is the case then this is redundant because show me a model that does not use data. So either its a nonsensical expression or it must mean something else?
[Wendell Piez] Well maybe we are modeling data.
[Jan Christoph Meister] Okay.
[Wendell Piez] Well I think the answer is both.
[Jan Christoph Meister] But if the latter is the case then this leads me to…perhaps there is a need to distinguish between two types of modeling. Representational modeling, which serves certain pragmatic purposes, and heuristic modeling which serves acquiring knowledge about that which we model and of course we can do that recursively; we can model models, etc. And I think that is where the question of tendency, and the force and the power of the schema comes into play when you have to distinguish again. I think there is a strong case for the schema as a representation of modeling. I am not sure whether there is a strong case for schemas in heuristic modeling because you are actually trying to find something new. If it is all in the schema already then you are not going to find anything new so the more constraints you have defined through your schema the less powerful this capability of your model is going to be. On the other hand if you are into representational modeling then you are well served by those schema texts that just do the trick because they are powerful. That is where you need them. I do not know whether it is so simple but–.
[Wendell Piez] Well, I would generally agree and I would point back to my two flow charts as a way of getting our heads around that particular kind of stress because I think that is a good way of characterizing two very important aspect of modeling, why we do it and what it is for, without necessarily trying to be comprehensive about it. That is a very good place to start. At the same time, we do heuristic modeling by way of representational modeling and the other way around so it is not like the two are completely discrete either. And so this is why in my second chart the schema itself is the site of contention. The schema itself needs to be amended, extended over time because the schema needs to be subjected to the scrutiny of, “Is this adequate, and not just adequate to my representational task but does this capture the information that I want to capture?”. And so at that point the schema gets to be the the result of the exercise. It is not an input anymore. It is an output because it is in effect a codification of the heuristic process. So I think that speaks to Elena’s [Pierazzo] point also because I do not think Syd [Bauman] has anything to worry about. We are going to continue to worry about schemas plenty even at the same time, I also want to put it on the table, that for many people working with this problem of data modeling in the humanities, schemas are sort of a niche topic because they are at least up to this point been within the context of markup language applications and we also want to be thinking about the kind of modelling language that we have not been able to do up to this point with markup technologies.
[Maximilian Schich] I would like to go back to Syd’s [Bauman] question. You make the statement about users do not have to think about […] and they are not supposed to think about them. I guess that’s what data models are. So one thing is what you cannot prevent is that every user thinks about, even if it is not consciously, but everybody has an idea about something. To talk about these things we need a shared world view, because if I have to describe what an elephant is, I cannot describe a piece of an elephant. So the question is if, right here, we all have some ideas, some very strongly, some developed over […] about how data models can work and people have different paradigms: some people do graphing, some people do XML, some people do [relational databases?] and they can use all this just like in science you can just map from Mathematica to Python to whatever and you can probably achieve your goals. Sometimes this is better, sometimes something else is better, but the question is how to translate them, and the problem is, how can you evaluate if it’s actually good or bad because that’s the key problem if we are going with this heuristic data model where we apply the data models to some data, what actually comes out is not an application of the data model, it is actually a structure in the data which might not actually fit into the […] but as long as we don’t measure this, we do not know if it fits. And I think that is a situation going on now – for like thirty years – we actually discussed about how should an ISO standard for this and that, and how should [we record?] this and that, and how should this and that. Now projects running for ten or twenty years have collected data and nobody ever checked if what is the distribution of, say, the link types of this […], how many of them exist actually. When you really need to spend ten years of discussion on <p> whatever… [probably a reference to versions of the TEI Guidelines].
[Wendell Piez] Yeah, right. I liked to pick this up with you more after and between because you are talking about a lot of, you are echoing complaints or concerns or issues that—
[Maximilian Schich] It is a couple of things I find interesting.
[Wendell Piez] Yet at the same time this is where I am split, because if we do not argue about how to define a <p>, then who will? It reminds me of the sort of the long path I took to get to this point. I got home from college in my sophomore year and I decided to major in classics and my dad was “hrmmmm” for a couple of days you know and then finally one day he cheered up and he said “Well, I suppose somebody has to know Greek”. Bless his heart because he had enough faith in me. The thing is that it comes back to this thing about the process and the goal where I could not agree with you more about our need to be somewhat more careful and self conscious of our resource allocation with respect to the energy and the time that we put into refining these things and you know at the same time I think this is how we learn. So I have this issue much more in the instance than I do in general. I have no problem with the schema that allows us to argue all night about how to define a <p>. It’s the particular conversation about the <p> that I am tired of. Gregor [Middell], did you have a hand up? Somebody, mhm?
[Stefan Gradmann] I want to go back to his remark about the representing versus heuristic […] in the schema. [long unintelligible passage]
[Wendell Piez] Yeah well I hope you hear more about that because I need more education in RDF and in semantic terminologies. I know I have needed it for years and I still need it. I do think that the same issues are liable to come up in terms of early commitment to certain semantic commitments that then constrain you further down. So I take your point and I am also going to be interested to hear and to push back.
[Julia Flanders] I would ask everybody who asks questions to speak up to make sure that the questions are audible to audio pick up.
[Laurent Romary] Somehow relating to what Elena said, I would like to come back to this idea of scholars getting their hands dirty. I think that whenever we speak about, even the word schema is already too much oriented. I mean we all share, somehow we are spoiled by schemas here. And I am also relating this issue with what you said about how every TEI project is TEI customization project. If we abstract away from this and we experience while discussing together in this like yesterday we were doing the exactly the same thing in our classes. The first thing you get through that’s with no technical background so never heard of XML and schemas and so one, is really to give them like a scan of a dictionary entry and saying what is it about. And they have us already basically picking things – and that struck me in a way, that is started this entry as if it was a generic one just emphasizing the data model problems you will see. And that struck me in a way, exactly what you were talking about: the data modeling process. And so then no project, no digital humanities project without a prior data modeling activity, there is no digital humanities project without the need to write down encoding guidelines, whatever encoding scheme you use. It does not make sense to say, “Oh, I am using the TI”. Ok, but what kind of data do you have? What is the specificity of your data? You need to write down exactly what you need when you’ve got it a <p>. So everyone has to define what <p> is for his own project or her own project.
[Wendell Piez] I hope that people kind of argue with you this way.
[Maximilian Schich] So, I am going to disagree. I think what is missing, so actually what happens in that kind of conversation, unless almost the kind of the key sentence somebody says data, digital humanities projects are not really about by collecting data but also about data tools. Which is amazing because it does not include any project which actually looks at the data. Right? You just build tools for somebody else to look at the data.
Actually, I think you could do a myriad of digital humanities projects without defining your own data model, just looking at the data models and the data other people have. Just look at what is the structure of the bible and how the text is distributed, how are the names distributed, stuff like that. How are different, say take about a hundred bible projects where people mark up the bible with TEI, some version of the TEI and compare them.
You do not need your own data model. You can actually compare all of them. My hunch is that you would actually learn way more about what is going on in the understanding or in the interpretation of the bible than by taking another bible, taking another TEI data model and then do the markup themselves.
[Laurent Romary] So you definitely can agree with me?
[Maximilian Schich] Yeah. You said you need to model beforehand, right? But the model is not necessarily –
[Laurent Romary] No, No, I said you need to put the data modeling activity at the core of your activity, so not beforehand. This is as soon as you start – I am going to take your example, you take ten projects working on the bible with ten encodings. The first thing you should do, you should forget about tools, you should forget about technology, you should look at the models which are there and compare them. They went deep into the data there. Less deeper and the same old structure becomes—
[Wendell Piez] I think we agree with what you say you said.
[Laurent Romary] What I meant –
[Fotis Jannidis] Yes, okay. I think when we have three levels of data modeling, we have one level which would be the concrete text and an application of markup in the text. Then we have the schema and I’m not sure that—I understood what you said, what you meant, implying that the second level is the main level of data modeling. But I think it is just the second level, to have the schema, is driving a class of object. And then you have the third level. In our case this would be, take for example, a text encoded in TEI and then we have XML as a meta-metamodel. And you were talking about shortcomings of this meta-metamodel in a way and I am just, why do you jump so quickly at the moment from the whole area of all of the levels to this one question of multiple hierarchies. Do you think that as because you have a personal history there or is this the main problem with working with models?
[Wendell Piez] Well I think the answer is both. I think one reason is that I was given this opportunity and this is something that I’m thinking of. But I think the other thing is that historically you know I mean Michael [Sperberg-McQueen] was writing about how texts had multiple hierarchies. We need a data model that supports hierarchies. At the time that was radical because everything was all flat and there were no hierarchies at all. And so developments were made on the basis of the fact that Michael [Sperberg-McQueen]had a point and Allen Renear and others had a point that hierarchy is part of the picture that we need to be looking at. So of course at the time — SGML — there was an expectation and a hope that SGML itself would address this issue better than it eventually managed to do. What happened was that in a sense the development of SGML was hijacked by the fact the genie got out of the bottle and the world got XML and XML has now grown into this fabulous set of technologies that is extremely useful for most or many, most, I mean we can argue about the applications for which it was designed including primarily that publishing application which is not concerned with the heuristic aspect of modeling. Here we are in the humanities asking the same questions that were twenty years ago. I do not personally know that we need to have an answer to the question of is multiple hierarchies or arbitrary overlap, which of course multiple hierarchies are only a subset of the overlap problem, right? Is that the core problem? I mean I do not know. But I think it is a problem worth working on because we know that texts have multiple hierarchies and we want to represent them and we want to represent them more gracefully than we currently can using the technologies that we download off the internet. That is not to say that an extremely creative person who has an, or maybe just obsessed in my case, an interest in tackling with this cannot think about it and work on it using the current toolkit. But we want it to be as easy to think about this as we want it to be easy as it is to mark up just a simple document in HTML and put it on the internet.
[Desmond Schmidt] […] Basically that there is a step between those who want overlap represented directly and others who are uncomfortable with that, who feel that the strength of the schema is a powerful thing. These two things can coexist quite happily. All you need is an efficient way of translating one into another and back again realizing that if you’ve got overlap because of poor representation you may have had some data loss when moving to a strict hierarchical form. But you can move from one to the other with zero data loss into the representation—
[Wendell Piez: Yes, that is exactly what I am trying to get at when I am talking about moving up and down scale. In some cases it is not even data loss it’s simply loss of efficiency, that the representation is a clumsy representation, but I think there is going to be a lot of interesting work in this area.
[Stephen Ramsay: I had this horrifying realization looking at LMNL that it started to occur to me that the data modeling problem of overlapping hierarchies which I am not sure it is the central problem but that problem, one of the problems, is that that problem is written all the way down the computational stack. You know, when you have what looked to me like four lines with BNF grammar for LMNL and so I am looking at that and thinking, “Well, gee, what would it be like to write a BNF grammar for LMNL?”. And I thought, well, it would be hell on wheels because BNF grammars do not deal with this any better than, you see what I mean? And what does a programming language want when it gets to parsing in this thing? Well it wants an abstract—
[Wendell Piez] Well, that is why we use a range model.
[Stephen Ramsay: It does not want it a DAG, it wants an ASP. There is a sense that you keep kicking down this road which is perhaps just a route. We are always going to be talking about taking some structure into some simpler structure down the stack and it may be that the question ultimately is, “How much do you want to be able to model in a single […]. Well Syd [Bauman] mentioned LISP, well LISP wants ASPs, which wants single hierarchy trees, et cetera, et cetera.
[Wendell Piez: Elena [Pierazzo]?
[Elena Pierazzo] It is a bit off the topic of the moment but something that you said before that bothers me.
[Wendell Piez: Good!
[Elena Pierazzo] I cannot say that but you said something about when you read the text, when you read the text on the page you don’t consider the letters. You do not look at the letters.
[Wendell Piez] Oh, I was saying most people, most of the time.
[Elena Pierazzo] Okay good because that is what the point is. Depends on your point of view. Depends on who you are.
[Wendell Piez] Yes, that is right.
[Elena Pierazzo] And actually, one of the foundations of the humanities is that you read everything.
[Wendell Piez] Yes, that is right.
[Elena Pierazzo] A different point of view, a different moment, a different aspect but there is always someone to look back at what you think, but it is not really—
[Wendell Piez] Absolutely right.
[Elena Pierazzo] You cannot really have any-
[Wendell Piez] Well, having made many categorical statements this morning I am appealing to all of you to not take anything I said as being intended as categorical. (Laughter) And there are people who engineer their own Lego sets. By all means and that is one of the best things about the humanities is that we have license to do this.
[Elena Pierazzo] I did not mention [this] but this is something I had to say. I know you know it but I have to say it.
[Lisa Swanstro] I have a question about the beautiful images of the output. […] And I can see this as this fantastic way of teaching this text, Frankenstein. […] It would be a wonderful visual for students. So I guess what I have a couple questions about the design [outputs?]. And the first question is, is this design included in the conversation about the process of design? So why purple? Why bubbles? Why beyond this entire visual thing which is very effective but why these choices and was this built into the initial parameters of the text? And secondly, is being built into it interpretive loss? So for example, you have this very tidy, these are the narrative circles that are embedded within this text, and I do not know that I would intuitively know that by looking at that output. So is that something that is built into this process?
[Wendell Piez] It is not all built for the bubbles. I mean the bubbles are much the fun part. It is purple because I thought that looked good and because I tried white and Kim, my wife, clearly liked the purple better. I did mention the word aesthetics a minute ago and I think that is actually really important here. I think that the larger question of the design of the interface that is a separate question and yet it is not. Right? I mean because you do not need to do any of this fancy stuff with LMNL and overlap and whatever to get these bubble pictures. All you need to have is an index into the text that gives you the information you need to draw that thing. So in that respect the bubbles really are a side effect and yet at the other hand I knew very well as a sometimes teacher of literature myself that that is going to be something that is revelatory to some students or if not that then at least extremely interesting as a way of beginning to understand things that we as teachers and readers of literature become aware of and interested in which go beyond simply sitting in the armchair reading a good gothic. Right? You know once you begin to learn what we learn as readers of literature we learn that a good gothic is even better than you thought. And one of the interests about this sort of interface is it gives you this, it almost sort of mainlines, a certain of aspect of this information and you can directly see it and you do not need to go through this same kind of laborious process to begin to understand, for example, how the book is shaped by this relation among the narrations. Now that gets to the second question. It is a separate question. To take this much further I am going to need a user interface specialist to help me, right? But on the other hand what each is to represent does go back straight to the modeling question because I chose to markup the structured narrative. Because I knew as a reader of Frankenstein that that was interesting. And if I had not known that I might have never ended up with these particular bubble because I would have had another set of bubbles that showed something else. And so that’s where we come back to this thing about the heuristic and the way in which all of this cycles, the hermeneutic process involved. Because partly what I am interested in doing is to actually do something which is like a spin off or side effect of my interest in working with the text is that then other people can the share that interest and then they can see [that] there actually is something there. And I can pick up, do a narrative in rhyming couplets but they go do, they might do completely different a completely different poetic form and it might look really different yet be also very illuminating about that. And so it is kind of that idea of having the tool kit. And I think we have done tremendous things in the last twenty years to build these tool kits such that we now have industry and a culture growing up around us where people are doing this work and being engaged and learning from it. And at the same time I also know that there is this sort of the bed of Procrustes. This sort of way in which the technologies in the current form also really limit us. So I am interested in pushing that forward, pushing that out. And so what you are looking at with that is simply an expression of that interest on my part.
[Julia Flanders] I think we have time for one more question or comment and then we have got a break.
[Maximilian Schich] It is a comment actually on the last question. It occurred to me running the third […] portion of the NetSci conferences and Arts, Humanities, and Complex Networks […] bringing humanities and art history, art people plus information science together with the natural scientists to actually discuss that very problem and it is very interesting what you can do with it. So if you take thousands of bubble charts and you put them in one large poster you can actually compare types of music.
[Wendell Piez] Absolutely.
[Maximilian Schich] You can say, okay, this is Schoenberg over here and here is Beethoven. And that is something, you do not even need to listen to music you can just say okay that is interesting to me. So I think that is something that is a necessity and it also takes us back to this thing. Yes, it is true the complete computational stack needs trees in order to process because that is how we do it empirically. But that does not mean we can basically avoid dealing with NP-complete problems like overlapping texts because you cannot go back looking at the elliptical ways how planets revolve around the sun simply because it is hard to compute. Right?
[Wendell Piez] Right.
[Julia Flanders] Thank you very much for getting this all started.
[Wendell Piez: Thank you
Pingback: Knowledge Organization and Data Modeling in the Humanities: An ongoing conversation | datasymposium