Theoretical Perspectives III (March 16):
Fotis Jannidis, “Digital Literary History and its Discontent” (video)
[Fotis Jannidis] What I want to talk about is literary history. I’ll skip the first few aspects here. Literary history has been coming into trouble.
About twenty years ago, there were two challenges. One of them from a poststructuralist point of view, saying that all history is constructed, especially under the aspect of power. [We see this in] [Jean-François] Lyotard’s concept of the grand narrative has become famous and put an end to many kinds of history. Another form of criticism came from Hayden White, who said that historical works are a kind of fiction. People are creating fictions by writing fictional narratives about historical events.
There’s been a long debate since then, but in a way, these two attacks destroyed a lot of self-understanding. People got irritated, and there have been some resolutions for this.
In the praxis of literary history, we have two experiments, both done by Americans, actually: one about French literature and another about German literature. The one about German literature was published in 2005. They tried to get rid of all of these grand narratives and they say we’re just talking about little events. So you have a book with, I think, a few hundred little chapters, and each chapter is closely related to an event. This event can be the publication of a book, but often it’s an historical event. Then we have very small articles about a text, but there’s nothing combining them to a history, or whatever.
And there’s another approach to this saying: “Was there a problem? Sorry, I don’t have the time for this kind of theoretical stuff? I have to write history.” Very often this kind of approach is a bit undercutting of the theoretical discussions. So, my approach was to say that maybe we can tackle the theoretical problems, but without giving up. . .
So, we’ve had multiple discussions here on constructivisms and social constructions, and I think these are related through this point. The criticism by Hayden White, which I think is actually the more interesting criticism, actually, is . . . if you look at it closely, you see that he says that the source of patterns, patterns can be something like “you look at the rise, the fall, the decline,” whatever-any kind of pattern. The historical representation has to be the historian, because the historian has no direct access to the past. So, if there’s a historical representation, a pattern, and he points to the pattern, the historian must be the source of it. It implies, in a way, if you look closer at his writings, that the events and the information behind the events are in a way randomly distributed–so much randomly that any kind of pattern emerging from this has to be put upon this from the people who’re describing what happened there.
Basically, they are two-fold problems. First, this is an ontology in itself. It says something about reality, which is quite a strong assumption, actually. An assumption that I, for example, don’t share — that things are equally or randomly distributed. Another problem is that he was writing about historical events, but as literary historians, we’re talking about texts, and texts are there. It’s nothing hidden in the past, unapproachable from me. The text is very often here on my desk, and I can point to it, read it, and share it with others. So, many of the criticisms directed at literary history by people like [David] Wellbery [Slide: A New History of German Literature, 2005] is, in my eyes, misdirected, but it’s a challenge to show whether we can base our assumption, whether there are patterns in the material, on new, maybe more empirically-based sources.
So, that’s the frame I was talking about, and I’m going to talk about two specific problems. The first is something like influence–how to model influence. I’m taking a specific case of influence.
We have a novel by Rousseau, La nouvelle Heloise, and we have a novel by Goethe, Die Leiden des jungen Werthers. We have an expression describing this relationship. This source you’re reading now is a source that the book I’m quoting from (it’s my bad translation, and I’m cutting a lot of things short), but, basically it’s one of the largest accounts of German literary history. Rousseau’s book was important for sensibility. That’s the argument you find in the book.
Then, you find a description of the reception of the novel and the content of the novel.
Then, [this is] an important sentence: “In this intellectual climate, Die Leiden des jungen Werthers originated.” That’s bad, isn’t it? It’s obviously so sloppy. But, to model it. . . this is a real challenge. That’s what I first thought. If you start to think about it, you say, that’s interesting actually. They’re talking about influence, but influence is a bad metaphor in itself, actually, because there is nothing pouring or flowing into something. The whole concept of influence is rather bad, actually, because the concept imposes. . . the agent in influence is here Rousseau, but actually, Rousseau was rather in France that time, and not in Frankfurt writing the novel. So we have to switch things around and say, “we’re not talking about influence but selection.” It’s a proposal by a German sociologist. So, Goethe selected this book as being important for him. But selection, that’s rather too intentional in the narrow meaning of intentional meaning, because he had no say in choosing this. So, obviously this is a very — and I’m not going down [this road], I just wanted to point to this — saying this is a very complex relationship.
“Climate,” so if you look at this closely, it’s a very very good expression of this relationship. It’s like a metaphor, and [it’s] showing that there is something there, a complex relationship, and we could unpack it — “we” being the historian of literary history. But here, he doesn’t have the time or the place to do this. So, he’s just pointing to it, and replacing it with something that’s more clear really loses all of the information which is explicitly not said at the moment.
So, modeling has the challenge here to be explicitly awake. I find this a specific problem because it’s much easier to say “I do know what I’m modeling, but saying ‘I know that there’s a complex relationship and I can’t model it at the moment, just point to the right direction’,” that’s what language is doing all the time.
We don’t have . . . or I can’t see . . . a way to model this at the moment. Replacing it with something more specific means that I quadruple or quantify the time I have to spend just with this relationship to the amount that I will never finish a literary history at all. So, that’s what we might first question: what do you do about that?
Second question: Corpus Studies. The wish to base your assertions about literary history on something more empirical: nowadays, you go to Corpus studies, and you do something like this [Slide that reads: realismus_einfach Bootstrap Consensus Tree]. This is the result of an R macro. I’m very grateful, I must say, to Eder and Rybicki (Maciej Eder and Jan Rybicki, Stylometry with R, 2011). Many of them, you’ve probably seen their presentations on the last Digital Humanities [conference] They wrote a very nice macro that allows you to put in a group of text, then say “please calculate,” using some very easy measurements, one of them based on John Burrows’ Delta (Delta: A Measure for Stylistic Difference and a Guide to Likely Authorship, 2002).
[One can] use the distance of similarity between them–stylistic similarity based on the most frequent words. The first thing you notice here [is that] the colors in these first few slides: the same color indicates the same author. The first thing you notice is that it’s working rather perfectly. It’s a rather simple measurement, and it’s working perfectly. Which, for humans, who say everything’s so complex, just counting words can’t be doing it right, but it does, in that you see the red text a little bit in distance to the others.
That’s the first novel by [Theodor] Fontane, and he wrote the rest forty years later, so it makes sense that there’s a distance here.
[Slide: Bootstrap Consensus Tree.] Then, I started to throw books of a specific genre together with other books into this. You don’t have to make any sense of this image, but I want to switch to this. It’s the same, but now in violet you have groups that have erotic narratives as their content. At this moment, this tool finds genre. Not only is the “author” a group, but the genre is a group. The texts belonging to one genre are grouped. This is rather amazing. And, you can do the same thing with gender. Here, you’ve got a lot of texts by different authors. Here you have them grouped by the same diagram, just using different colors to express male and female authors just based on most frequent words.
Okay, that’s what’s out there. And, if you know this kind of tool, the problem now for me is: how do these different things relate to each other? What is the data model here? The data model in a more narrow sense is obviously, first, a bag of words, and secondly, the most frequent words. So, the text is conceptualized as a bag of words, and then you have, as an indicator of stylistic similarity, we use most frequent words. This is related to a conception or a concept like gender or an erotic novel. And now, this comes back to a discussion we had all the time here. What are we talking about here? Is gender a data model? We often in this discussion . . . we intend to say that every kind of conceptualization is a ‘model’; we have at least three models here, all somehow interrelated. So, maybe it’s more fruitful, but [it’s] just a proposal I would like to discuss with you.
Referring to data models, to things like “bag of words” and “most frequent words,” maybe even to “bag of words,” because the phrase “most frequent words” isn’t even a data model at that moment–because that’s in the algorithim, basically. The other things are intellectual models. Then, we have a clear distinction. I’ve learned in cooperation with Julia [Flanders] that it’s very German to have this preference for very clear distinctions.
[Julia Flanders] And I admire you for it!
[Jannidis] Thank you. These are the two questions I wanted to pose to you. Thank you.
[Maximillian Schich] You raised a couple of very interesting things. The first thing is that, in the twenty-first century, we’ve learned that there’s no random distribution in anything, so the normal distribution is absolutely not normal. That’s obviously also true for historical events, because if people fight for a border, for example, rivers and mountains are all over the place, so it won’t happen in random places, for example. That’s the problem that the more applied kind of research of complex networks has — actor-network theory(ANT) (Bruno Latour, Reassembling the Social, 2005) or other poststructuralist theoreticians — because they didn’t consider the facts, which we know, right? You can measure historical data today and see how distributions are also not random. It’s not random to discuss now […] for example.
The other thing is this kind of question of the model emerging from the data. There is a lot of modeling in the background of what’s done there, because most of what we convert does refer to something that’s probably the most we convert in the books as related to all the words in a corpus, which corpus scientists call a TFIDF (Term Frequency Inverse Document Frequency) graphs. There’s a lot of modeling where you can’t do that because you have some concept about your corpus. Without that, you cannot really start.
The other thing is the influence that has been a huge discussion in art history, for example, about the direction of influence. There’s an excursus against influence by Michael [Baxandal?] which points out there’s way more words which describe the selection direction and not the influence direction. You’re right. It could go both ways. A lot of data disciplines have actually learned, and this makes much more sense, to talk about editing selections. So, if you think about citometrics, people stop thinking about idiometric coupling, but now they talk more about co-citations, because what you have in terms of art history is called the T.S. Eliot effect. Once we glance around at these unfinished sculptures, our notion of Michelangelo changes forever. So, co-citation or selection makes the past a dynamic kind of culture, but if you look at influence and the static past, it’s only changing the future. That’s somehow honest, right, because every one of us has a different concept of the past–so that the past is really there. That’s the thing I think you have to take into account. Then if you bridge there from this last thing of these consensus trees–that’s very similar to another domain in which we have similarity in dependence, which is biology. They construct these trees, and they point out in textbooks that we cannot really find the rules, we don’t know where the origin is for the shark, for example. It’s problematic. Someone goes “it could be somewhere else in this tree.” There is no top of the hierarchy.
[Jannadis] Actually, there is a 0.0 (zero point zero) in the … [doesn’t finish sentence, looking to slide.] …this is based on algorithims we’ve invented or developed by people in bioinformatics. There’s one point, to my layman’s understanding, where you can say that everything which is distanced from this point is the same distance from each another. But, as soon as you unroll one of the branches, you can measure that distance from one to another and say, “Okay, I understand, this is visible. This book is is here from another.
Not sure I understood you idea of model, but . . . what I wanted to point out is that modeling would be the generic term that also covers data modeling, and data modeling would be a very specific activity, then. It would allow for the things that are not really . . . they use some kind of classification, some kind of understanding and putting into concepts, but they express it different,y and it doesn’t have to have the same requirement as data modeling.
[Allen Renear] So, with respect to the problem of “how do we make progress on modeling when we know we can’t get it exactly right and now it’s time to move on?” Using the example from my presentation, I said that it seemed that according to our analysis FRBR entities should really be be roles. But, when you’re doing that, sort of on general principles, it seems like we were kind of anticipating or accommodating the possibility that, say, the text of Moby Dick might have realized some other work. How much time should one spend accommodating that possibility? After all, it exists only with another possibility. Maybe it’s distant future […] It did seem to us that there was a class of modeling improvements that were probably not worth pursuing in actual situations, just like denormalized relational databases are often much more effective for querying, or something like that, so we conceded that denormalized ontologies, which are the ones that weren’t quite right, could often be better and could often be preferred to ones that were exactly right. So one thing to proceed is to say “we’re not going to pursue this. We’re not going to figure out exactly how to do it right. We have something that works, we’re confident that it will work […] Now, that particular case is one where you’re not saying something false. In other cases, it seems like if you don’t solve the problem, you might be saying something false. The ship of Theseus, you just reconcile yourself to the fact that, in principle, there is a removal of part that makes it a different ship, and if it’s a small part it doesn’t. And also assume that you won’t be iterating transitivity of identity a thousand times on your ships, vis a vis your ships, or your texts. And cross your fingers and move on. It’s not the shuttle.
[Jannadis] Maybe I misunderstood this, but…wouldn’t that mean that you have at least one position for how this relationship is modeled? For example, between two books, you can say “I can say, at least this about them” I’m finding that people in the humanities tend to use very often metaphors or other vague descriptions, which are deliberately vague, because that’s their most effective way of conveying so much, but not more information.
[Allen Renear] I would distinguish the issue of metaphor from the issue of metaphorical or idomatic usage from vagueness. I think vagueness is a very precisely identified issue. It’s very important to have vague terms for ordinary communication. And it’s hard to accommodate them in modeling; but vagueness is probably. . . allowing yourself vagueness will probably not have the same kinds of influence on results as trying to represent metaphorical usage in data description languages that represent metaphor as if it were […] There I think you’ll have objects that don’t exist in your ontology.
[Jannidis] Yeh, if you take them literally. I totally agree. You have to translate what the expression refers to. You have to translate it in a way, but then, the question is: how precise can this translation be without losing the intention of “to be read”?
[Speaker 2] I’m not sure exactly what the question is. Precisely. [Laughter from the room.]
[Jannidis] I think you’d like to keep these apart: “vague,” “metaphor.” My point would be to say that metaphors are, in this case I’m analyzing, is a way to refer to something by keeping it vague. The relationship we analyze is deliberately vague. You just put away [excluded], actually, that which I would like to include into my description of what these kinds of sentences do in the humanities.
Actually, I wanted to ponder them much ‘better’ [gestures quotations] than when we think of them, because they are obviously notworking as clear-cut reference as we think they should.
[Speaker] I guess I’ll just say that vagueness is something that no one’s ever managed to free from the paradox and puzzle. It’s essential to communication, obviously. If you have a modeling project that depends upon eliminating it, you’re in trouble. [Laughter in the room again].
[Wendell Piez] I think everything that Fotis says actually speaks to that. I think we’re actually experiencing something really tremendously important here, because what I think this relates to is actually our conversations about preformal versus formal descriptions of our objects or the improvisational stages versus the more formalized and refined stages later on.
You don’t ever want to end the previous stage even while you move into the next stage. The paradox is in the relationship between the stages, because usually when user scholars step back and say “I’m going to describe this as a climate,” the beauty of a metaphor is that it says one thing very specifically, precisely, then it applies very many other things, completely unstated or in the air.
So, it’s not that metaphor is vague. It’s that a metaphor is a way of saying something without even addressing other things, by virtue of applying a different language, a different perspective on the problem, right? So, this isn’t to say that we’re saying something vague; it’s to say that we’re saying something, but we’re deliberately foregoing the opportunity to spend ten years on the question of what we mean by x, y, z, right, and…
[Jannidis] Yes, excuse, me, yes. It depends on what we mean when we say something or when we don’t say something vague. A climate expression isn’t vague. The all possible inferences you can do based on this expression, these are varying. This climate could be invisible.
[Piez] Yes. Because that’s unspecified, it’s left between the author and the reader to agree implicitly on the appropriateness of the metaphor without any explicit or spelled out indication on what those implications are taken to mean. In a sense, i’s a determination of the resolution of the image you want to apply. It’s like–I’m going to go all the way down here and pixelate this intrically, and this other thing, I can see the shape of this without ever trying to get down the details. And I can leave it to my reader to come along with me on this. It relates both to this question about what we choose not to formalize, or what we decide we can’t formalize, and to this question of how we relate to our audience and the nature of performance. In other words, there’s a process that a scholar undergoes which involves selection of not only of what to study but how to approach it. Where to emphasize, what to verify.
[Stephen Ramsay] I think, in terms like gender and influence and authorship are not, and this is just going to restate what Wendell [Piez] said, they’re not vaguely defined or in-defined. They’re deliberately undefined. They’re deliberately put forth with ever-tentative definitions, and the reason for that is because that is the contested site of the discourse. If you’re going to solve for x where x is gender, you’re, as far as I’m concerned with the project, trying to destroy literary studies as a topic. I’m not joking. The only thing I see in terms of a data model in this is a bunch of key values that tie words and frequencies [together]. That’s the only data model I see here. Everything else is a function or a process of something else, and at the end of that thing is these words like gender and influence.
This is absolutely mystifying to me. I don’t know how we start with word counts and end with gender. But it may be that, we are hoping that, we feel like there should be some relationship, given what we started with, there should be some clarity at the end of the process, but It doesn’t seem like we’re looking for clarity and questions of things like gender and influence. In other words, when you say something like “influence is a bad metaphor,” you’ve made an absolute and trenchant and marvelous literary critical observation, and you don’t want to destroy that with your algorithms. If your algorithms should shine a light on that, more power to you. But that doesn’t mean that we’re going to end up with more defined terms for things like gender.
[Jannidis] [Pause] Who was that addressed to? [Laughter in the room.]
[Stephen Ramsay] You, I guess. I mean, I get frustrated in analytical discussions because people will say “that can’t possibly define gender.” And I want to say “none of you can define gender.” That’s what our discourse is about.
[Jannidis] Well, that’s one thing that’s mystical about definitions, explications, which is much more useful — but it doesn’t matter here. I wouldn’t say we’re talking about the numbers and their relationship to each other is a definition or explication of gender. Obviously, the concept of gender is in close relation to what the numbers show us, and that’s the main point here, so that we have the model of the world dividing correlations in two or more groups, and numbers, which for some reason we don’t understand yet, are doing the same with text. I think that’s interesting.
[Ramsay] No, me too. It’s the subject to which I’ve devoted my life. But, I mean, it’s the anxiety that bothers me. I’m not accusing you of anything. I’m not accusing you of it! [Laughter from all.] I think I’m accusing the critics of text analysis.
[Stefan Gradmann] Two comments: The first is general. If we talk about vagueness and ambiguities, I think natural language is best suited […] for expressing them. As to gender and genre, they would be first-class[…] in any RDFS […] they may not be […] a data model, but they would be perfect […] in a propositional model like RDF, which is somewhere between the data model, in a sense, and language. I had a nice conversation with Allen [Renear] yesterday about natural language, and I think this is where we’re going to take this up again.
[Julia Flanders] Your response, or? Ok, another question.
[Maximillian Schich] I would disagree on the fact that the key value pairs you’re talking about. There is a data model. In data models, there are two keys. One of them is the text and the other is the words. Then there’s for every one of them a key/value pair. You have the frequency of the words in the book and you have the frequency of books for the word, which is a network. Basically, all of the models we’re talking about are consistent of this kind of basic construction of bipartite networks. If you’re lucky, there’s one thing which is a […] This thing is the thing you do measurements on. The TF – IDF model you count words and compare to the average frequency in the total corpus, that’s what’s a clue-in to the climate. That’s the mood […] There’s some universal distribution of the word… “is that an “I” or an “A,” and so… So, the point is that this kind of analysis has a more complete notion of data modeling than most of our models have.
[Audience member, Stephen Ramsay] I just don’t think that the TF – IDF is a data model.
[Schich] It’s not about data modeling. Conceptually, it presupposes that you have this pair in space. There’s this true relation between word and text. And then in the background, and that’s the thing you should get, too, there’s this new field, right? …Which they say, “Ok, that’s the frequency and the corpus” because they don’t have all the books in the world, right? And I think if you consider that, it’s a very interesting thing to talk about the vagueness of climate. Because, in fact, here, the climate is based on one instance which is the […] and the […], the actions of the […] are not distributed in a […] or an average way…So the idea of the post-structuralist notion of climate based on a few instances is actually not the right one, ok?.
[Off screen] Ok.
Pingback: Knowledge Organization and Data Modeling in the Humanities: An ongoing conversation | datasymposium