Case Studies–Research Ontologies (March 15):
Trevor Muñoz, “Discovering our models: aiming at metaleptic markup applications through TEI customization” (slides, video)
[00:00]
[Trevor Muñoz] I wanted to start off by saying that I hope there’s still some appetite for picking up some of the questions that were left hanging at the end of our wonderful discussion this morning. I’m actually not going to give the talk I thought I was going to give. I’ve kind of massaged it a little bit so as to respond to some of the issues that came up, which were along the same lines that I wanted to talk about anyway. I should also mention at the beginning, as a caveat, that this might feel like something of a thump down from the heights we’ve been scaling so far.
I’ve called my talk “Discovering our Models” and I want to talk about how we go about developing these models by telling the story of doing TEI customization and thinking about the ways in which the data model starts to come into view, or to cohere, as you work through this in a practical sense.
But I want to start with this quote from Wendell [Piez], from his article on the distinction between descriptive and procedural markup. He says that (1:19) “an effective markup language will work by establishing a self-contained, internally consistent and clear set of categories, perfectly sufficient for handling the data set to which it will be applied, within the range of applications for which it is due. But this ideal is impossible for a truly descriptive language to achieve.” And I think we’ve seen that quite strongly in our discussion so far. “Since the world is not a closed, finite set of phenomena that is liable to such treatment. Metaleptic markup” — which is this category he’s proposing in this article as an excellent place that we might aim at —“gives us the next best thing: it invents its own imagined world, proposing earnestly or ironically that this serves both sides, both accounting for external reality as it is, and creating it as it needs to be.” The idea is that when we’re working with preexisting materials, we’re trying to describe a set of documents we might we working with. We’re both interested in giving rich and provocative and interesting descriptions of those materials inside a system that we’re also thinking of validating, that we’re enforcing certain rules against so that we can do something with these descriptions we’ve created to create new data or future information, future ideas. That, I think, is my, perhaps poor, paraphrase of Wendell [Piez]’s metaleptic markup idea. So the question I want to ask is: how do we get there? And more specifically, how do we get there from here?
[02:55]
This is a page from Frankenstein, a manuscript that both Mary Shelley and Percy Shelley worked on. It’s part of a set of documents that we’re currently working with at MITH, along with our partners at the Bodleian libraries, the New York Public Library, Huntington Libraries, and the Harvard University Library, on a project called the Shelley Godwin Archives, which is bringing together as many of the extant manuscripts from the Shelley family, principally Percy Shelley and Mary Shelley, that we can get our hands on, and creating a digital archive of these materials—reuniting them from the physical archives where they currently live. For this project, we’re creating a customization of the Text Encoding Initiative suitable for describing these documents that we’re working with.
For those in the room who aren’t familiar with the TEI customization mechanisms, the TEI offers a way to modify the schemas that you can validate your documents against, and you describe your customizations in a TEI document itself, which is then processed against the TEI source files, and you can do things like removing elements, adding new elements, modifying the attributes and other characteristics of your schema. So we’re thinking about creating a customization for these documents, and we were inspired by the recent discussions in the TEI community about representing manuscripts, about the genetic encoding work that’s been done. We have these wonderful images of these very fragile documents, which apparently are very difficult to access in their physical manifestation, that we might focus our encoding on representing as carefully and as interestingly as we could, these physical documents themselves, rather than trying to create a reading text or some conceptual version of the text that was more about the literary and linguistic structures. We wanted to think more about the physical structures, the documentary structures.
But what happened as we started to work through the project is that we found ourselves working up from the diploma. We took a model very similar from the one Gregor [Middell] was showing yesterday. We have surfaces and zones and lines and these are again following from the TEI’s current way of modeling this information. We began to put that customization in place and began to work through our model. It’s at that point that things begin to get interesting. You can put enough of a customization in place to get yourself up and working, to make sure that your encoders are perhaps all agreeing on certain outlines of a problem. But then you have to be focusing in more closely on, what are the things that are actually of interest to you? So when we talk about working up from the diploma, we started off marking very carefully the lines and all the interesting metamarks you find on these pages, and finding this very careful, diplomatic transcription somewhat unsatisfying for the purposes we were trying to work towards, not to say that it’s not a valid goal in and of itself, but we found that we weren’t really after diplomatic transcriptions. Partly because we’ll have these wonderful images sitting right alongside them. So that’s a factor to consider: the ways that these different data representations sit next to each other and rub up against each other. What happened is that we eventually stopped marking certain metamarks.
I’ll just give one example, which is around deletions: the ways that certain paragraphs or sections of the text are cancelled out. We were encoding information that described exactly how those cancellations looked on the page—where the crossed outlines went, exactly what kind of formation was contained there. And it wasn’t really answering the questions that the scholars who were working on the projects were interested in. They were interested in the dialogue that these physical documents represent between Mary Shelley and Percy Shelley during the composition of Frankenstein. We were recording all along both the markings that indicated deletions and the deletions themselves. So you can represent that there’s a metamark indicating a cancellation, and you also have a separate set of tags for indicating that something has in fact been deleted. And those are two slightly different statements. And we found ourselves dropping the statements about the metamarks and the careful descriptions of these drawings on the page. Whether that was a good decision or not, I guess we’ll find out.
But we found ourselves then not really being able to say that we were dealing with diplomatic transcriptions. We were dealing with, again, a different selection of features that was hoping to indicate more clearly the interplay of deletions and insertions—the moving around of the texts as a conversation between these two writers. That leaves us to ask, and I guess this was where I was having trouble with our conversation this morning, is that we seemed to be circling around a definition of usefulness, where we were saying that putting less information into the documents made them more useful to more people. I was having trouble with that definition of usefulness. The less information we put in, the more people can use them for their other purposes. Whereas really we’re finding that the scholars in this project wanted to use markup, wanted to use data modeling, to engage in a conversation with other scholars in their field about the development of this text, and that therefore it was actually helpful to us to put more interpretative judgments into the text so that people could disagree with us, could fight against our model. That’s a perfectly valid and scholarly reuse of our material, reuse of our markup.
[:09:27]
I think the demands of interchange, or the seeming size of the interchanged problem perhaps distracts us from a humanistic approach to this endeavor of doing markup, where we’re not really interested in providing the most minimal, interchangeable representations, but as Steve [Ramsay] has said about text analysis, we’re interested in provoking more arguments, making more interesting readings. In some cases, that requires very carefully and perhaps elaborately encoding our interpretative judgments right into this supposedly clean data. And I wanted to think about that from the perspective of information science. This is from some work that people at the Center for Informatics Research in Science and Scholarship at the University of Illinois have been doing, where Allen [Renear] works. This is a paper by Carole Palmer, who’s the director of that center, and Nicholas Weber and Melissa Cragin, where they’re talking about some studies that they did of reuse of scientific data in these new, big digital repositories of data. And they’re drawing on an information science researcher called Berger Hjorland, who wrote about something called the epistemic potential of documents, and they’re thinking about how to apply this to collections of data in order to assess whether we’re doing a good job at digital preservation or data curation. I think the interesting thing here is this idea of “transfer of knowledge” and where that happens. Where, in the process of building up a model, we begin to transfer knowledge and how we think about the different ways of making that happen through the act of modeling. I think this ties in very closely with the conversations we’ve been having about validation. How do we know when it’s good enough? Is that the same question as “How do when know when it’s valid?” That, in turn, is related to this larger question I want to point out from both literary studies and scientific data of the point at which we begin to transfer knowledge. In our discussion this morning, we got into what I thought was an interesting point to conversation about the interaction of a data model and the activities that go on around the data model. Whether there’s a point of distinguishing “What of that is still data modeling?” and “What of that is something else?”
In the process of writing a TEI customization, you can write your ODD file, which is how you encode your customization, and it seems very clear that you’re speaking about the data model—what kinds of things exist in your documentary universe. You’re at the same time keeping track of how people will use the system that you’re building. So it’s less helpful to think about it as a document and more helpful to think about it as a program that has certain interfaces, that has certain behaviors, even at the level of writing your supposed data model. You’re thinking about, if your encoders are working with a program like oXygen, which will read the schema and provide them certain help as they’re trying to encode the document, what kind of information you can put in your model so as to prompt the program that your encoders are using to encode their instance documents, to do certain things or not do other certain other things. This is, I think, distinct from the level of just what the schema tells them they can and can’t put in. Because there’s a level at which you can specify that behavior on a sort of “valid” or “invalid” level, and then there’s a level you’re trying to govern at more of a behavioral level or community level. I think those interchanges are interesting. The way in which the data model there is working on transferring knowledge in different ways, for two different audiences. This goes to the question of how we document our data models? Working with TEI customization projects, certainly working with the documents in the Shelley Godwin Archive, the TEI has a wonderful mechanism not only of writing these customizations, but at the same time you’re writing the customizations, you can write the documentation right along with it. You can produce beautiful versions of it in multiple formats and you can do all those wonderful XML things. But that doesn’t seem like fully documenting the data model. You don’t seem to have captured what you set out to do.
I would argue for a richer sense of what it means to document a data model that encompasses things like the validation routines themselves. So, not only the actual schema itself—the instantiation of your conceptual model—but also the programs that run against that model, the programs that run against instance models encoded against that model, in addition to the prose and the shared behaviors that you work out in encoding the documents that you’re working with in your archive. And to describe the data model you need this broader set of documentation. You need to have the programs. This, I think, ties into the things that Kari [Kraus] was talking about yesterday. When we talk about digital preservation, when we understand things like representation information, we try to draw boundaries around “What are all the things we need to make sure we stored away somewhere in order to preserve this object through time?” Things that look like documents, like TEI documents, are of course, in a sense, programmed. That is, of course, not surprising, but going through the process of customization and understanding the ecosystem within which that customization works, will allow us to give fuller descriptions to collaborators such as librarians and archivists of exactly what it is these objects of scholarship are. What it is that they need to model in their systems of archives and libraries and digital repositories.
I want to leave my talk a little short, because I’m very interested in people’s responses to this: thinking about the ways in which customization, the putting-in of actually more information can spark, through both automated and human means, a richer notion of data modeling, or a data modeling that participates in the humanities in a different way what we’ve been speaking about before. Not simply the interchange of cultural heritage objects, but a real participation in the humanistic debate that we know from disciplines that not explicitly digital. I’m interested to hear and to have further conversation around these topics.
[17:02]
[Elena Pierazzo] I like this idea of documenting the data model as well as documenting the encoding schema, the encoding practices. I’ve seen a beautiful realization of this idea of yours. In 2006, at Digital Humanities in Paris it was Gautier Poupeau who presented his documentation of a data model using METS Files, which we have used at Kings for a few projects of ours. Basically what you do to is that you can see “Which XML goes against which XSLT to produce which HTML?” “Use which Javascript?” “Which CSS?” So the full processing, which was for preservation issues, of course. . . so not just, in that sense, it’s exactly what you think. We were thinking about a data model. But together with a full documentation of the encoding, which can be very beautifully described by ODD, as you said, not only the schema, how you build it, but how you reuse it and how you encode it, which is the meaning of your encoding and how to get from here to there. But there is also this aspect: how do the beautiful website and stuff work? And why it works like that. So we experiment with METS, that’s the idea.
[18:22]
[Muñoz] Right and I think that’s a very interesting example. I think it’s even perhaps a better example than the ones we’ve generated so far with Shelly-Godwin. I think it covers a lot of the things that I was trying to bring up but also points to the absence that I’m still pointing at, which is that I’ve often found when doing a TEI customization, there’s always a certain point at which you stop specifying at greater and greater detail where you could logically go on. In other words, even the ODD file, even the programs and the behaviors are perhaps not capturing all the modeling work that’s going on because part of it is social that’s not actually enforced by the tools but is part of a sort of reenactment of the project, and looking for other ways to grasp at that information.
[19:15]
[Wendall Piez] . . . You scare me. As I stand over here to the side. [Laughter] The more we invest in the idea that our executables are an important part of our developmental contribution, the more committed we are to the incomprehensibility of our actual contributions. We don’t know how to talk about dynamic markup. We don’t know how to preserve executables except in very difficult and very fraught ways so, you see, you’re getting us involved.
[20:27]
[Muñoz] I don’t know if Wendell wants to offer hope, but maybe if he does then I’ll go first, because I don’t have anything to offer. I wonder whether. . . Yes, that seems to be true. And yet it seems important to point at the loss that is then incumbent upon us, that we maybe become, in all the dialogue around interchange and independence from any particular platform that we’ve become too confident and that we are saving what we think we’re saving. A clear acknowledgment of the loss of the things we don’t yet know how to preserve is an interesting thing to consider and perhaps even to mark out in our models that which will be lost at certain strategies of preservations.
[21:16]
[Piez] That counts as hope. . .
[21:17]
[Wendell Piez] I would accept that, and I also think that, to me, among the many wonderful things in what Trevor just presented to us is for me personally a reminder of what I was thinking about 10 years ago when I wrote that article. And I’ve said it from the beginning. To talk about these things within a tradition of rhetorical theory and poetic theory and to understand that when we create these things, we’re actually engaged in a kind of rhetoric and a kind of performance. With respect to that, with respect to the problems that Elena and Michael and you are pointing to with regard to documenting the data modeling, is actually something that has many parts but also has a whole to it. In some respects, the risk of documenting the data modeling, not that that’s a bad thing, but the risk is that we move the gravity of our work away from the production itself into the documentation about the production, which is always a risk with humanities, I think, but it’s something that’s especially kind of ironic and interesting when we’re doing such fabulous things with the text that we’re working with and the encoding uses of those texts and the performances we drive on top of those encodings. Which have a kind of, they also speak to what we’re trying to say, which I think is one of the most important things that you were reminding us of.
This is not something where we’re trying to have the last word, even though we always are—there’s always this idea that “I’m going to say the thing and then it will have been said and everyone will then refer to me.” The reality is that they’re going to talk back. The interesting thing about the transition we’re seeing now in terms of the evolution of the humanities as a whole is that we’re moving away from that notion that any individual scholar is going to have the last final word or that that’s the proper aspiration of the scholar. Even though we’re in a sense going to privately entertain ourselves with that idea, we also don’t take it very seriously and we understand that part of the purpose of our trying to have the last word is so that other people can speak back to that. I think that documenting the data model is all very good because it speaks to other specialists, but the data model itself encompasses the work, and the work is something to a broader audience potentially as well. Let’s come back to that.
[24:30]
[Syd Bauman] I want to admit up front that I don’t know how to model what I’m about to say, because it’s indirect speech. Michael asserted that we have difficulty documenting processes, and I’m wondering why Michael says that. I don’t want to say “What do you mean we, kemosabe?” Because I might have trouble documenting processes. But Michael, I’ve read your documentation of processes and it’s pretty good.
[25:19]
[Sperberg-McQueen] I don’t mean to be mysterious or mystical. I just mean that the difference which we . . . proving the equivalence of two sets of comparative statements, which is a difficult but soluble problem, and proving the equivalence of two turing machines, which is a difficult and insoluble problem. It’s not that documentation of processes is a bad thing. It’s obviously a good thing. I feel guilty because I’m horrified by the thought that we’re going to say the way we process this, this is part of the meaning of our data. But whenever I look at strange data, I want to say “Yeah, can you show me how you’re using this? Because I don’t understand what you’re doing here.” And as soon as they show me the website, it’s like “Yeah yeah yeah, now that helped.” So it’s very hopeful, but if you give me a Fortran executable or a Koegel executable. This is not a fantastic thing. Twelve years ago, thirteen years ago, there were a lot of people trying to make new versions of existing programs and prove that they would behave exactly the same way except that they wouldn’t die on January 1st, 2000. And that’s a hard problem because, who knows why? A gap in our logic, a gap in our heads. Talking about the way process is executed is hard.
[27:11]
[Muñoz] Two things going on there. There’s the concern about being able to assert an equivalence between two processes, which we say is hard and possibly not doable. Which seems to be a concern about preservation, that this thing hasn’t been tampered with, it’s still the same thing that someone put in. There’s the way that this bids participation in the humanities, which is we’re not really interested in the equivalence of one process to the other but the response of one process to the other. The way that those performances respond to each other. Those are two threads that are going on there.
[27:52]
[Flanders] I’m not sure how to frame this question. I want to get back to what Wendell [Piez] was saying because I think there’s an interesting mapping onto what we think about as different kinds of scholarship or different layers of scholarship. I was very struck by your invoking the performance of rhetorical aspects, because one of the things that struck me in our conversations yesterday was a kind of Matthew Arnold-style distinction, possible, between the Hellenistic and the Hebraistic, the thinking and the doing, in a kind of crude way. I’ve always been aware of myself as being the kind of person who ought to be a scholar because I’m so bad at doing things in real time. So it’s good that I get to write them down. That’s why events like this always make me nervous because, here it is! It’s all happening in real time.
But I wonder whether, when you say the data is the work and the documenting the data is in effect some part of another discourse, a meta discourse that’s aimed at another audience, that piques my interest because there is this ongoing question about where the scholarship is in digital scholarship. Is it in the creation of the data? Is it in the annotation of the data? Is it in writing out the data? Is it in documents of the data? I feel fond of the idea that in digital humanities, documenting the data is actually the distinctively scholarly aspect, or one distinctively scholarly marker. Not that doing the data isn’t also scholarship, but I wouldn’t want to sequester off the documentation as somehow too expensive an indulgence or too Hellenistic, whatever that means.
[29:41]
[Piez] Right, I’m perfectly with you on that, but at the same time, the proof of the pudding is that it’s delicious. We’re failing to see that. We are failing to see that in humanities stuff, there are projects that are manifestly interesting to people who don’t have their heads in the encoding.
[30:05]
[Flanders] The hack speaks louder than the yack.
[30:06]
[Piez] I think that part of the reason that’s happening is for two reasons. Number one, the infrastructure is just strong enough, after all of this work you put into it. It’s strong enough to actually support activity which does speak. We are able to communicate to the larger world even in that humanistic way. It’s interesting that it’s taken us so long, but nevertheless, it is interesting To the extent that there’s an emphasis within the digital humanities on documenting processes and figuring out how we’re going about doing this, and developing standards to allow others to replicate and build on this information. That’s all really very important because that’s how this work is sustainable. There has been a way in which, I think, because we have been so long with delivering results that speak for themselves, there’s been this counter-emphasis within the community on doing this as if we’re eating biscuits because we’re hungry. And that’s not a bad thing. On the contrary, I think all of that work which we do for much cheaper, is profoundly important because we’re going to learn how to do our own work better. And yet at the same time, you can’t forget that there is something more meaningful.
[Muñoz] Well, and not to get too lost in the metaphors, but also to veer away from the food, when you’re giving your Lego analogy, I don’t know if anyone else had this problem but in my Lego sets there were always ones that got stuck together and they were kind of clumping, you could never really get them apart again. And yet those became interesting pieces for future constructions. I guess I’m arguing a little bit for the sticky Legos.
[32:22]
[Male audience member] I’m still stuck on this moment from Trevor’s presentation where you said “If i tell you it’s broken, or what they’re thinking, Why are they doing this? and I finally decided that what they’re trying to do is to enable a conversation about their scholars about their choices,”—that’s a paraphrase but I heard some sentence like that in your paper. And I thought “Okay, so they want to make some choices here, things like this scriptable figure here is Percy Shelley or this cross-out must have been Percy. .”
[33:06]
[Muñoz] The more interesting examples are: “This was written first and then crossed out.” It’s more about stages, is the interesting thing.
[33:15]
[Same male audience member] As they go along, they’re paring down things like “only certain things are relevant” or “only certain things in the ODD files, or the TEI, or something that’s relevant.” Because you marks off with this is mind. Is this…?[trails off ]
[33:28]
[Muñoz] Right, but I mean, I guess I don’t think I’m agreeing in the sort of universalizing since you’re headed towards, but.
[33:36]
[Same male audience member] Where I’m headed is, it seems to me that we would at some point have to say why data modeling is the best access to that, the best way to eliminate that conversation. Because in this case, we’re not talking about enabling future processing figures. We’re talking about . . . They decide what they’re doing and what it’s about. Someone really has to convince me that ODD customization is the way to believe in that conversation, because opposed to saying, “You know, there are these six cruxes in this text where it’s really strange and interesting. We’re going to list them and talk about them in a way where they can talk back. Is that really vastly inferior or totally flawed as opposed to saying, no, it responds best to a TEI file. Because that’s an amazing thing to do.
[34:33]
[Muñoz] I don’t think I was trying to make that argument that the TEI encoding model is a better way to do it than the way. . .
[34:40]
[Same male audience member] Well, this is sort of what we’re talking about in the symposium. We want to make the claim that for certain kinds of intellectual affordances for data modeling. Here’s a case where I didn’t see any kind of actual affordances. I really don’t. I see other kinds.
[34:59]
[Muñoz] Yes, and at the moment, I would say that it’s an open question and we have to get further along in the project before we can really argue back and convince you. I have some confidence.
[35:06]
[Same audience member] Well, I’m not criticizing you.
[35:08]
[Muñoz] No, no. I guess I want to argue that whether or not it’s a better way or a less good way, there is an interesting way in which data modeling is another way of approaching this text and communicating our choices about it that opens up further scholarship in a different and perhaps useful and interesting way our just writing about it. In other words, this deletion business…
[35:35]
[Same audience member] But I’m not hearing it. Like how?
[35:40]
[Muñoz] Partly because the enforcement of the system and the way that it interplays with…
[35:47]
[Flanders] Because it puts that set of data . . .
[35:52]
[Different audience member] Can I already intrude on the turf we have? It’s not necessarily better to have the ordinances that says that these six cruxes are really interesting. But you know, I’ve read an awful lot of articles that say “You have to pay attention to the cruxes.” But I don’t try to commit myself by saying what I think those are. Whereas, in an encoded text, if you have a parse element, you have a choice. You’ve gotta say “I call this a crux, and this I don’t.” It’s an inability on the other side. Some formalisms make it harder for us to walk. One advantage of those formalisms is that we can use them to keep ourselves accountable. Now, whether that’s exactly where Steve was going…
[36:54]
[Muñoz] That’s along the lines that we’re going, but I think we can acknowledge Steve’s point that yes, we have to prove it.
[36:58]
[Male audience member] I would like to take a slightly different tactic. I find it fascinating to acknowledge that the inclusion of the performative aspect of what we do when we deal with data should actually be reflected in the data models. But then of course there comes this question of cost-benefit analysis. How much effort do you want to put into documenting the process and reflecting the process versus dealing with the primary source object that actually initiated the entire chain of processes or the networks of processes.
And I think we’re obviously not the first to encountering that problem. Just look at the history of literary studies over the past fifty years, where we progressed from positivist to structuralist to poststructuralist, and eventually into a deconstructionist paradigm. Once I’m in the deconstructionist DH paradigm, I don’t give a damn about the original source objects and programs. I’m so obsessed with myself, what I’m doing, I’m writing this. And hey, that might be what really does the trick for me. Whereas somebody else who might be labeled as an absolute conservative and traditionalist would say “Listen, that’s not me. I’m actually dealing with a text or a monument or what have you.” I think it’s a question of philosophical premise that we touch upon here. We have to make a decision, each and every one of us, how far along the line we want to go.
The second question, of course, perhaps more problematic is that when we start investigating some criteria, we’re making this cost-benefit analysis. I mean for example, what you just said to Michael, “If it helps to stop the waffling, I for one am all for it.” But that’s just me. Somebody else might say “Hey. I love waffles.” I think perhaps we have to shift the focus of attention away from just dealing in assumptions and likes and dislikes and work towards establishing a criteria that will enable us to determine how far along this route we want to go and where we say “It’s now becoming not productive. We’re actually circling around ourselves and there’s no progress.”
[39:48]
[Syd] Where’s my tail? Where’s my tail? Where’s my tail?
[Laughter]
[39:51]
[Another male audience member] Just wanted to point out, maybe we are not really talking about data modeling at all here. Because I think if you have a very traditional edition, printed edition, you say “I’m not so much interested in what’s really on the page but what’s happening behind that.” And many people would say “Yeah, but that’s not really an edition.” Maybe we are talking about different sets of standards, and they are not really relating to the question of data modeling, but what you want to achieve with what you’re doing. Is to comment on something, or is it an edition that allows others to comment on the text? I think in a way you started off with this edition, what you described, then on the way you changed your goal.
[40:35]
[Audience member] Can’t it be both though?
[40:36]
[Audience member] Actually, I think then you’re mixing up things that you shouldn’t because it’s much easier to point, if you say “This is the edition part and this is my comment part.” Then you say, “Okay, your comments I totally disagree with, but thanks for the edition.”
[40:55]
[Flanders] I think also there’s a question of scale at stake here. Trevor’s example shows a single manuscript with six cruxes and one axle—whatever the number is, but say it’s a small number. But I think if you had been proposing to us, for example, the Proust edition with, God help us, hundreds of thousands of revision sites and so forth, the question of “What kinds of things get revised here?” becomes a kind of question that you really need the markup for. And you also potentially need to debate about the markup for. I think that that sort of much larger case might be an interesting use-case to address the question that Steve raised. When someone comes to me as a TEI person and says “I want to encode this poem so that I can study this poem.” I say, “Well why don’t you just study the poem? You don’t need TEI for that.” But if they come to me with a hundred thousand poems, it seems like there’s another kind of problem.
[42:00]
[Stephen Ramsay] But then you’ve moved into..then you’ve already explicitly made a connection to the processes. This is what’s excruciating for me. Why did that poem now move over from…well, we’re interested in these questions to, “No, we need a machine to deal with that,” to show us basically where we’re crossing a kind of line..what I disagree, where Trevor’s waving his hands, where we’re all waving our hands, including me, is It feels like kind of a big question mark in our conversation, and it kinda, I don’t think in this context we want to treat data modeling and its intellectual criticism and scholarly argument, all of that as a kind of assumption actually is why does that … it feels like it breaks out this uncomfortable moment in time.
[42:56]
[Susan Schriebman] As I was listening to this conversation I was thinking about what Michael said when he said […] I see what you do, and it seems that what we’re talking about is a way to deal with something else, that modeling is […] that we believe there is something else to make visible that visualization that other big model processes that allow that modeling representation to have it in a way that is not native in a way that displays the preset model, TEI in that case, I think, […] that deletions are important […] not having that disallows you from understanding which in the meantime actually […]
[44:27]
[Elena Pierazzo] Data model is a very generic word in that sense. Which goes to the question, which of them do you actually put online? Which is the edition? And if it’s actually just the xml we have to document, or is it everything else we have to document at the beginning stage? I’ve been saying quite a lot lately that in my opinion the edition is not the xml. It’s not the HTML, it’s not the CSS, it’s not the XSLT—it’s all together. Because you cannot take one without looking at the other, because XML alone will not convey the way you use the data. It’s very complex. Say that you can separate things, it’s very difficult as well. You cannot say I just take this paper, that is where the scholarship is, the final website is not the scholarship, only the scholarship is one of them.They are very complex subjects we need to take into consideration. And even our comments, our understanding of it, we need to be part of it.
So I’m not sure you can separate them as you were saying, the level of the edition and the level of the editor thinking about the edition and the doing of the edition. If you want to go back to the printed word, which was always a part of the idea of presenting this idea of editing, the consideration of the text, the presentation of the evidence. It’s just that the printed word has given a standardized way for presenting these data models and the evidences. Otherwise we are inventing a new one because our tools are different.
[46:03]
[Male audience member, to Elena] We’re talking about different things here. We’re talking about the interpretation and the edition. It would be another thing to say “We have a different aspect of the edition.”
[46:12]
[Muñoz] Well, I was agreeing with that point and resisting this introduction of the edition as a category because I don’t know that it was helping. But I think there’s a connection between some of the comments Steve was making and your comment about having measures of quality or measures at which we consider there’s no further dividend to be gained, and Laura’s comment that we’re modeling in hopes that some visualization or analysis will appear that will have made our modeling worthwhile. Rather than continuing to model forward and hoping that it appears, to think hard about the interaction between our modeling and these kind of points or standards reformulated in this new digital space for what’s good enough and where there’s a loss of traction.
[47:04]
[Wendell Piez] I would just very briefly like to interrupt this circle here. Because I don’t think Fotis is right, but I don’t think you’re wrong either. I think what you’re describing is a particular approach to the discipline of editing, and there are many possible disciplines of editing which can be defended on their own terms which may differ in terms of their approach, but which, on their own terms, are perfectly legitimate, and may also have a disciplinary basis. So, I don’t think we’re really disagreeing on a fundamental level—Elena, you and I, and Trevor—with respect to differentiating between the presentation, editing, commenting, and so on and so forth. But I do think that that kind of distinction is something that is realized in the instance, rather than something we can describe for all time, all editing, up front.
[48:10]
[Douglas Knox] I raised my hand because I wanted to agree and disagree with Fotis [Jannidis]. I did sense from your presentation Trevor, you should have been putting your digital humanities hat on, and your literary archives hat came out to talk preserving all these contexts. But I heard an earlier point, and others were right in saying that even if we don’t preserve it we should understand part of the history as giving context to lots of stuff that is not necessarily captured by it but it’s enabled by it and I think you could trace this back to book history and say your commentary may be different from the edition probably you want to see it that way but it assumes a publishing context and it assumes a cost culture , the building of audiences and printing new media and that sort of thing, and it’s all of those constraints that shape the rhetorical act of expression.
[49:18]
[Female audience member] Going along with the idea of markup as a rhetorical act, I kept actually thinking of speech act theory when you were making that argument. An illocutionary act. J.L. Austin’s Doing things with words. An illocutionary act is when you say something when you do something. So if simply list what the debates are, it seems a little more noncommittal. It seems like you had to put a stake in the ground once you’ve actually had to represent it. It feels more representation than something else, like exegetical. It feels like you’re raising the stakes by actually representing the the composition history in some way in your markup. I know you were not thrilled with the example of an edition, but I think of A.E. Houseman editing someone like Ovid, where he makes a conjecture and he inserts his conjecture directly into the text instead of confining it to the apparatus of the text. There’s something kind of dangerous about it that I find appealing.
[50:27]
[Desmond Schmidt] This point has been made, but when you started, you said that you worked on a formal manuscript as opposed to looking at the […] and representing that . .. was that still original material that you saw from the document? In fact, what you really saw was the opposite, that the user and user’s needs . . What kind of information do they want out of the text? And then put that towards the data modeling in deciding what to encode. It came out of the conversation, in writing the software we have to think in engineering terms about satisfying the user first, thinking very personally about the architecture first, then there are users we must present to—what about them? They’re actually the first people we think about.
[51:34]
[Muñoz] Well, I don’t know that a strong either/or answer there satisfies me very well. I think, as Doug was pointing out, I have trouble keeping one hat on versus the other—the DH person versus the librarian. In Carol’s [Palmer] elaboration of this idea in the paper, I think there’s a much more well developed argument about user communities and thinking about user needs and how that interacts with the ways that scientists in this case are using the data as they generate it and using it for their own purposes, and the way that interacts with the reuse or use by others later. This is an area that there’s probably a fuzzy part in the middle between those two approaches that needs re-inventing or re-examining or further work on it, at the very least.
Pingback: Knowledge Organization and Data Modeling in the Humanities: An ongoing conversation | datasymposium