Panel Discussion: Data Models in Humanities Theory and Practice (March 14):
Steven Ramsay, Laurent Romary, Kari Kraus, Maximilian Schich, Desmond Schmidt, Andrew Ashton; Julia Flanders and Fotis Jannidis [moderators] (video)
[Julia Flanders] So, this panel is the first of our two panel discussion sessions, and it’s really intended to help us step back from the opening keynote and propose an agenda for the next three days by pulling out the themes, the topics, the questions, that we collectively, and in particular, this group, would like to suggest are the ones that would be most fruitful for us to pursue, things that we want to make sure are on our radar, things that we want to come back to over and over again. Maybe questions that we are trying to actually answer rather than simply entertaining, I guess.
This [first] panel is focusing on the theory and practice of data modeling in the humanities. The second panel will focus more on pedagogical questions, so we’ll be able to come at data modeling from both perspectives. This is an informal discussion, the panel is sort of primus inter pares, so, you know, everybody should be feeling free to speak.
We’ve proposed a few questions to just get things going and begin the discussion, and I’ll offer you the first of those to start with, and you don’t have to address it in any particular order–whoever wants to speak up first is fine. We’ll let that discussion proceed, and then we can put the next question in when it seems useful.
The question I thought it would be useful to start with here is: what’s distinctive about data modeling, specifically in the context of humanities research and digital humanities research? In other words, what is it that sets humanities data modeling apart from data modeling problems in general? What special challenges and research issues or illuminations does humanities data offer us in thinking about data modeling theory and practice?
With that we can all sort of breathe for a moment and think, and then, whoever would like to start, perhaps, just begin.
[Laurent Romary] I had a discussion with a woman on the plane who wanted to know everything about my life, and I think she understood what I was about to be doing here, and she asked me, “what will digital humanities”–[and] after she understood what digital humanities was–[she asked me] what benefit [would digital humanities get] from what’s going on in big data in the other scientific fields. And I said no, that’s exactly the contrary!
The variety of concepts and domains and objects we manipulate in the humanities will probably make digital humanities very useful for the other fields of science, so that’s a way for me to answer.
And then I’ll make it short because I’ve got a few words here: I had common concepts across this variety of fields, abstraction that we need to create across those various fields, the issue of semantic interoperability, which is to my view a major question, but seen in the context, and then, the final point of data modeling seen as a tension between standardization on the one hand and scholarly freedom on the other, and we’ll have to fight constantly between those two extremes.
[Desmond Schmidt] I’d like to answer the first question that you asked there, “what’s distinctive about data modeling in the context of digital humanities research?”
I wasn’t sure what you meant by data modeling, and Wendell [Piez] asked this also. I’ve always understood it just to mean writing XML schemas basically, and that’s what the term’s always seemed to me. But, I think the term’s open to free interpretation–it’s really modeling data in the general sense. I think it’s quite important that the modeling of humanities data should reflect the nature of the material being modeled rather than the nature of the technology being used to represent it.
I think what’s distinctive about humanities data is it’s not really digital data very often; it’s usually…what’s meant is…the artificial surrogates that we create from real world artifacts (like manuscripts and books). Whereas documents written in the digital medium, on the other hand, naturally embody the technology they are created with, [on the contrary] if I write a document in HTML, its tags are part of my text, not that of a surrogate. And this fundamental difference of interpreting features that we see in the text, in the physical objects, and putting it into the text, is the fundamental difference between humanities data or humanities data modeling and the technologies that we use to represent it, which were designed for this formatting exercise of digitizing information for the web and other features.
[Kari Kraus] So I’ll build off that. I think, first of all, coming up with shared terminology and shared concepts around data modeling is going to be a challenge. But, the point about what we model being artificial surrogates of what it is we’re really trying to understand is key, your point. So I should say that I adopt a kind of really expansive ecumenical view of what data modeling is. So data can be events, actions, concepts, ideas, as well as things like numbers or strings– what you might consider as more characteristically data.
To give a non-technological example of where I think one problems lies with the way we might deal with this is: I think those surrogates that we have to adopt are often necessarily reductionist in nature, and as humanists, we always rebel against that. And, so, to take one example: I was recently reading an article by an author in the field of information science; he was trying to get a sense of how interdisciplinary the field was. iSchools claim to be metadisciplinary and interdisciplinary, so he was trying to get at that in some way. So the way he did that is he actually went to different iSchool websites and looked at what disciplines iSchool faculty had gotten their PhDs in. He calculated that; he created some graphs for that. You know… [of] how many faculty members had degrees in computational science, or natural language processing, or literature, or whatever, and then got at the question through the distribution and the statistics.
As humanists, we would think that was incredibly reductionist,–that he didn’t think about things like a particular cognitive skill set, someone who’s able to translate across disciplines well, how much one actually collaborates with other scholars in different fields, so, a faculty member with a PhD in education who actually works with electrical engineers, for instance, so this wide, broad landscape of what might constitute interdisciplinarity was elided entirely in his approach; but, I think we’re forced to make those kinds of concessions all the time, and I think that’s one difficulty of getting at the question of data modeling in the humanities.
[Maximilian Schich] My hunch about distinctive properties of data models in the digital humanities is we have a multiplicity of opinion, which we have to deal with all the time. So if we talk about the birth date of some person, or if you talk about the georeference of some text string which might mean a location, we’re in a very different situation than, say, a database for tax records or something. We don’t know if the “Paris” is here or there.
The second thing is, that brings with it a heterogeneous focus of attention, which is very, very strong in the humanities. Every humanities researcher is interested in something different in a different aspect, right? And that’s also different than, say, a tax record database.
The other thing is we have a relatively easy kind of way to work with the data because we don’t usually have to deal with privacy issues, so if people analyze mobile phone records, you cannot go into the data and discuss the fact of, like, somebody cheats on his wife or something, right? If you talk about Baroque artists you can actually go to the state archive in Rome and look at the entire set of letters and you can discuss their love life, right? That’s also different from biology, where people can’t ask P-53, the most important protein, how it was today, right? So, it’s something different.
And the last thing, an illumination, is we can actually look at the application of data models and their structure, which is a source of complexity, and that brings us back basically to having a common field with many other people.
If you look at the linked data cloud, there are data sets coming from all sorts of fields, and right now they all use the same kind of technology to publish them, and you can actually use the same tools to analyze them, and that’s something really, really game-changing because all of a sudden you can hang out as an artist in the physics lab and work with biologists and social scientists and find something out else you would never ever have found out in the Vatican library, and I think that’s the awesome part about thinking about data models beyond the point in time where we used to compete about XML versus AXIS versus FileMaker versus whatever.
[Stephen Ramsay] I just want to restate Karis Kraus’ point somewhat more pithily, and say that as a sociological matter, the humanities live out the modernist credo to “make it new.” Do something no one else has done, say something no one else has said, see an aspect no one else has seen. And as long as that is true, it is at cross-purposes, it seems to me, with data modeling, which has a different teleology–which probably explains why I have been trying to saw lego bricks in half and use Super Glue for twenty years with these XML technologies, and so forth.
Yet, at the same time, and I think that this may recover this a bit, it seems the case–it seems to me that constraints are ultimately not constraining, that constraints can be productive, that’s the way back into it, it seems to me. But nonetheless, there is a tension there that will always be there, and hopefully, will always be a good tension.
[Julia Flanders] This may be a question for the rest of us as well–is there a dark side to humanities data modeling? I mean, we’ve been very optimistic. This may be a question for the rest of us as well, is there a dark side to humanities modeling?
[Jan Christoph Meister] The Force? I would be interested to learn more about your concept of data to begin with. I think Desmond [Schmidt] touched on that, Kari touched on that, and Steve [Ramsay] touched on that as well. What is the distinction between data and phenomena?
[Kari Kraus] Or data and information?
[Jan Christoph Meister] We encounter phenomena traditionally to begin with, we don’t think of a book or a play or a painting as “data” when we encounter it in our real world experience. It’s a phenomenon which we process, which we interpret, buy, sell, whatever.
When we turn it into data, there is this reductionist step that we take, and it’s very important to understand where exactly the reduction lies in. It has to do with the fact that we are talking about digital data. Data is the measurement; we are reading off measurements. We are turning this phenomenon into something that we think of as a set of discrete readings that we can read off an instrument, but we all know that phenomenon, per se, is not that.
It is analog, it has fuzzy borders, we interpolate all the time, and so forth. And I think that’s the essential tension that we’re being called to when we talk about data modeling, because we tend to forget that there is this initial step that we have to take, and if we bring that back into our consciousness, then we can actually turn data modeling on itself and make it productive.
So, phenomena and information, as you said Kari, I think those are the two concepts that we have to group around data or that we have to juxtapose with data. We have to think in this triple, and I think that might answer the initial question, what is particular about humanities data modeling, because that is the problem that other sciences do not have. Try and discuss this with a physicist, and say “hey, hang on, what you’re reading on that instrument is actually just a reading, not actually a radioactive compound.”
[Maximilian Schich] I fully agree. I had a discussion once over multiple bottles of wine where I said everything is a document and the other guy said everything is data, and I think we were completely agreeing, honestly.
There’s a famous talk by Max Planck where he says you have to believe in one real world in order to debate about it, but everything else is a measurement, and our measurements’ instruments, including our eyes–analog–are wrong most of the time.
So the question is: what’s the bias? That’s basically, if I have to give you a chopped description of what I’m doing, besides being an art historian, I’m a database pathologist, so I’m only interested in the bias.
And that also brings me back to this kind of notion of data, so that data is my document. Right now I’m dealing with databases, and I’m studying them, but I’m not doing it in a different way than I would study analog data, which is also data I think.
The problem is, if you say the most pressing issue I face as a person–I’m at the end of having four years of post-doc, awesome time in the physics lab, generously funded by a German research foundation, I’m an art historian, [and] most of the digital humanities positions are written out in social sciences and humanities departments who don’t hire art historians because they belong to [the] wrong colleges, both in Europe and the US.
I’m an art historian who has done too much classical archaeology, which is another problem, within my own field. And then the other thing is that basically the people who actually analyze data, not building tools, not doing the digitization efforts actually have now– post-post-doc–have a very hard time finding funding sources.
And, that said, database pathology actually is personally threatening to people who are firm believers in their particular type of data model. They say “yeah it’s right you have to do it this and that way” and you say “yeah but if you measure that, there’s this problem…”
That’s a critique, and people don’t really like to live with that. But, I totally agree. (16:29) That’s actually I think the future of digital humanities–to actually study what the errors in our beliefs are.
[Wendell Piez] I would agree with that and I also think that like what Laurent [Romary] was saying–I almost put on a slide…I was at a conference a few years ago when HTML was very, very hot, and the big discussion was “how are we going to bind data types to HTML” and Lou Burnard came out of the session sort of fuming, and he says, “they don’t understand. They think that text is a type of data. Data is a type of text.”
And, of course, that gets to what I was talking about with the nature of text itself. Text isn’t just text. Text does have this materiality, this context, and culture, and it leads back to phenomena as you were saying.
And, I think it’s really important, that one of the things the humanities can contribute to this conversation is something that in fact I think the scientists know very well but which the computer scientists and the information people tend to forget, namely that (17:37) there is this sort of arbitrary contingent relationship between our models and the figurality that we pretend to describe with them, and that that very relationship needs to be part of what we’re concerned with.
And, so for example, there’s a classic book in data research called Data and Reality where his whole point is that most of the problems that you get in data modeling have to do with the fact that you’re making all kinds of concessions in order to build the system at all to do useful things, and those concessions are going to be what break you, and you need to be very aware, not so that you don’t make the concessions, but so that you’re aware of the way in which you’re setting traps that you yourself are going to fall into (Kent, William. Data and Reality: A Timeless Perspective on Perceiving and Managing Information in Our Imprecise World, 2000; 3rd. ed. 2012, Technics Publications, LLC; Third edition (February 20, 2012).
[Michael Sperberg-McQueen] I’m nervous about contradicting Jan [Christoph Meister], but I think you’ve got the wrong end of the stick, or you’re possibly hanging out with the wrong physicists.
[Jan Christoph Meister] I agree. Sorry, I should have said engineer rather than physicist.
[Michael Sperberg-McQueen]…Because the fact is, most of the people interested in some sort of scholarly domain–natural scientific, social scientific, or humanistic–are studying phenomena of one kind or another, and many of those phenomena are analog and in the natural sciences people have been conscious for as long as there have been computing devices that they are dealing with numbers, and numbers are abstract objects, and what we’re dealing with are physical objects. Not only that, but numbers, the numbers physicists are interested in, are numbers on the continuum, and we’re dealing with discrete digital devices.
So, while it’s quite true that one of the central problems of data modeling in the humanities is coming to terms with the reduction step in our visual representations of the phenomena we’re interested in, we’re not the first people to be there. There is a long history of numerical analysts worrying about how to represent numbers in machines.
We often–as far as I can tell by looking at textbooks–tell our students numbers are simple, machines are good at numbers. This is bullshit. There is nothing natural about the representation either of integers or of real numbers, floating point numbers, in current machines or in any machines in the history of the world. Not to mention imaginaries.
And on the other hand, I have to say [that] it’s not machines that brought digitization into human culture, but the invention of writing–because what is the property of a writing system? It is the representation, typically of linguistic information, of language, with signs from a finite set of signs. For that matter, what is vocal communication but our reduction of an infinite variety of analog signals into discrete phonemes?
If you do not understand the phonemes, you do not understand what I’m slurring. But if you do understand the phonemes, then my slurring is separate. It’s not a non-existent phenomenon, it’s not intrinsically uninteresting, but it is distinguishable. (21:48) So, a lot of the concerns people have raised about digitization over the last few decades have always struck me as odd, because in theory, we ought to be raising exactly the same concerns over any transcription and any edition.
[Andrew Ashton] I think that’s a good point, and one of the sort of themes that I’m hearing is that we’re talking still about the sort of surrogate model of data. And I wonder if one of the areas that digital humanities has not explored enough is networks, and specifically, relationships as data. I know that there’s an entire field of network analysis and people are studying that, but (22:42) I think that we’re reaching a point where the distinction between data as emerging out of a surrogate and data as relationships between surrogates and between corpora—those distinctions are becoming less and less distinct. Those are data that, as far as I can tell, may never have had a sort of starting point, they’ve always been phenomena. And I think that goes back to the point you were discussing, which is that all of the things we sort of hold up as data deriving from a source or from a surrogate, at some point, can be traced back to only a phenomenon. And I think that that gives us a wedge into a broader discussion maybe about looking at data not so much as being exclusive to objects that we can affiliate ourselves with but as being a more expansive idea about the relationships between those objects, where the assertions about those relationships come from, and find ways to sort of serialize that. And there are ways to serialize that.
[Stephen Ramsay] I really think that reductionism is a canard. I think it absolutely is meaningless–
as long as human discourse is based on generalization and selection and sampling. Now we get this all the time. I give a talk I say “here’s an analysis of a thousand English novels” and I have all these English professors jumping up saying, “but that’s just a selection from the total!” And I say, “right, not like your syllabus for example.”
But, people have this sense that somehow they’re not being reductive, and of course they are. But to say that everything is reductive is like to say that everything is green. And as soon as we say everything is green, we should ignore it, we should stop talking about it, because it is no longer a meaningless thing.
The question before us is: Are our selections, distortions, samples, so forth– are they useful? Do they advance human discourse, do they help the discussion? That’s the only thing to talk about. It’s not a question of how do we avoid constraint, how do we avoid selection. We cannot avoid these things, not if we want to have actual, human, finite conversations.
[Maximilian Schich] That’s a very interesting point, I think, and that’s exactly why we have to measure before we model. There is this old way of doing a speech– it describes how you collect material, you find the order, then you construct the story and you tell the story. For thirty years, we have collected the material and then we have done nothing.
And I think there’s a very…It’s very interesting– the name of this symposium is called Knowledge Organization and Data Modeling–which is somehow the opposite of knowledge discovery and data mining, which I think is the precondition to do better data models, right? Because a classic example is with networks, we find out that most of the data, actually any data we look at, any relation in the humanities, has at least on one side of, say, books connected to monuments, something like a very heterogeneous distribution. Something like a power log… It’s very different from a Gaussian, that means that there are no averages, and there are no types. And then you can discuss, “yeah, the physicists might be wrong to call it a parallel because it’s not straight but it’s like a curve.” That doesn’t change the fact that you have to actually measure the curve and basically discuss how that goes. And that means that both computer science, data mining for patterns on a smaller scale, and physics, as looking for universal laws, are tremendously important for data models in the humanities and we have to simply do it ourselves or work with people who can do it.
[Douglas Knox] I want to support things Andy [Ashton] and Kari [Kraus] were saying. I have a feeling we’re talking about text as data without talking enough about what we’re trying to model. So Kari [Kraus] mentioned events, Andy [Ashton] mentioned networks of relationships, but the kinds of things people are interested in in the humanities when we don’t think about our tools too specifically…we’re interested in people, places, social movements, aesthetic styles, form. And we want to think about those things natively, so, how do we model that? So, sometimes text is the evidence for that but we’re not really trying to model surrogates of text, and then we’re done, we’re trying to get to something else. A database of events, for example, from a text, is a different sort of thing, we’re making different kinds of humanities. So, I’m saying, how do we get to that?
[Maximilian Schich] So, basically what you’re saying is that we should take care that the text is…we’re modeling people and events and stuff in text, right? But there’s only one thing we’re really interested in: the people and events and stuff like that, right?
I think that bridges really well to what Andy [Ashton] said. I think mostly we are interested in the relations of people and locations and stuff like that. And that’s a very interesting point. If you look at digital humanities, how it was funded in the 1990s the Getty Research Institute would pay you a hundred thousand dollars if you entered ten thousand records in the database. And the records were notes and not links. So basically, if some archaeologist would have said, okay these three drawings all show the same sculpture, not three sculptures, actually you would get less money because that’s only four records if you actually say it’s the one sculpture. So, I think that’s the reason we’re interested in the relations, not the notes.
[Wendell Piez] Granularity is money, in other words.
[Jan Christoph Meister] [Speaking to Michael Sperberg-McQueen] My initial question about this was about what we’re doing, and in a way, your intervention, which was really called for because what I said was very simplistic, and we know that natural sciences have discussed this problem of the inadequacy of discrete representation. And indeed you’re right, any symbolic representation of something out there in the world is reductionist.
[To Maximilian Schich] But, then your argument is also, right. You know that argument is a showstopper because everything is reductionist–our sensory apparatus is reductionist to begin with. And if you look at it on a micro-level you will find out that, hey, it’s actually digital. You know it takes and turns something–information bits over a defined period of time, and that’s just how it is and we interpolate between those and we forget it, which is fun, so we live in a non-discrete universe. But it’s a nice fantasy to have.
But, wherein lies the specificity of humanistic data modelling? I’d like to mention two things. One, I don’t think that we have inasmuch a field parallel to what we’re doing as the sciences have that’s called “applied x, y, z.” You can have applied mathematics you can have applied physics, called engineering, or chemistry or what have you, so that there is a sort of consensus that you can forget about these fundamental limitations and get on with the job of living in the real world. You can build bridges, and you don’t have to really worry about the reductionist nature of your calculating machine, because again, you’re getting on a bridge, and not a topic of conversation.
Whereas, in our areas, I think we’re hell-bent on investigating this particular question. That’s actually our major concern: finding out where the limitations of knowledge lie. So what is a side issue in the other disciplines has over time become the core issue in the humanistic debate.
The second point I think where we have a substantive difference is all humanistic data is indexical, and all humanistic phenomena, what we regard as phenomena, is indexical, and it always points back to us as the perceiving subject. It has this particular component.
I don’t really think that—Heisenberg apart (of course again that’s a principled discussion)—I don’t think that a physicist looking at an instrument reading will at that point in time contemplate how this relates to the his identity and so forth. I think that when I read Shakespeare. I can’t get away from it, so I think that is something that should be specific about the humanistic data.
And the third point that is really interlinked with that second one is historicity. The data that we deal with always comes with a timestamp and a shelf life, and we know that. And that’s how our disciplines have over time always dealt in what we’re dealing with. We know that ideas change, that boundary systems change, and we sort of touched upon that early on when we talked about schemas. They also change over time, but I think that this is a very specific element of humanistic data modeling that we should perhaps pay more attention to–that our data is data that is valid over a defined period of time, that it sort of carries this historical index with it. To my mind, we haven’t really come to grips with that issue until now.
[Thomas Stäcker] I just want to ask, this mention of process of creating data within this area, so I think there are data in a way equivalent to data in the natural sciences and engineering (in that you collect data and use it). So when you have a project today with maybe historians, art historians, they collect data, and they use the data very similarly to maybe the people using data for the weather forecast, or geologists. They use data to make corollaries of that. And it works in a way in the same way, and so you can use a text as data, but you can use a text as a product. As a product of the humanist. He wrote the text, which is a result, but in the same token, it’s his data for the next book, so to speak.
So you have to distinguish between what you call the data-phenomenon distinction, which is a heuristic distinction and data as a collection of usable data for something else. And this is the very problem I mentioned when you look at tools, being created in the last couple of years. You find something like annotation tools becoming very important; I refer to this very interesting book of Peter Boot’s, which he called Mesotext. I think this is a very distinctive data for humanists. Collecting annotations, for instance, to use these annotations to create something else.
[Julia Flanders] I think this also bears on the question that comes up of whether data is scholarship. In other words, your point that data can function both as a kind of output when we take the model seriously and also can be an input in ways that may alter its nature. I think that’s a good point.
[Fotis Jannidis] I’m taking the opposite stance now: that there is no real distinction on this level between what we are doing and what people in the natural sciences are doing because even if you are touched by a Shakespeare poem, you wouldn’t really write it into your essay, or sort of tell it to your students or whatever, and it’s not the same level as the scholarly research we are doing.
And I think it’s probably more clear to say the data we are using to explore the world behind the data, which we are really interested in, as you pointed out–the concepts, the ideas and so on. The relationship between finding something in the natural sciences, and the relationship between data there and the phenomena there, we are really interested in. (36:44) The relationship between what we are really interested in and the data we have is different, but not our handling of them and our basic understanding of the data in itself. But it’s just you have this–it’s how we construct the relationship between the data and the things we’re really interested in. And I think understanding this kind of–it’s another aspect of data modeling really– saying, how do we model our data in a way that allows us insights into what is happening behind it? That is the question and I think there your argument is maybe different because you have potential agency and everything.
[Andrew Ashton] I think it would be useful for this group over the next few days to consider functional domains for humanities data. Because I’ll say, I’m a librarian, I’m not strictly a humanist in the sense that many of you are, and one of the things that I personally struggle with on a regular basis is making data play well with other data. And that’s a functional element of humanities data that I think frankly a lot of humanities data does not do well. It is not functional in that sense, by and large. It may be very functional for the purpose of enhancing a scholar’s insight about a given object or purpose or phenomenon, but as functional data within an ecosystem, which I think is more and more–and I’ll go back to the networking idea–is more and more a networked ecosystem, a truly networked ecosystem where semantics and phenomena are in fact the sort of objects that are interacting. (38:32) I think that humanities data, at least a lot of it that I’ve seen thus far, has a long way to go. And I think that derives from the fact that it’s created as a function of the scholar’s interaction with the object or the object of study as opposed to scientific data which is created for a single, possibly multiple, functional purposes that are at a sort of different stratum from the scholar’s interaction with the actual object. So I think it would be useful for this group to sort of consider functional domains for data if only to clarify what the scope of each discussion might be.
[Kari Kraus] Can I just say something in response to that? And I would want to tie it in with what Chris was saying about some disciplines having an applied perspective, so in the natural sciences or in physics or whatever.
I think that one thing that will be a challenge to humanists is thinking about how we model data through experimental design. So, if we’re thinking about our data having a functional purpose or an instrumental purpose or an applied purpose, then we’re thinking–like, public humanists–and increasingly we hear that term being thrown around. You know, a shift toward the public humanities, that it’s not simply about getting insight into our own data that is then revelatory for us and a handful of readers, but rather that we’re creating data models that can be useful in some way to a larger community.
In the sciences, what they do is they design experiments to try to understand how the potential user community models or understands the data, so they do focus groups, they design instrument surveys, they do interviews, and then based on the user community’s understanding or their own data model, they then design tools and services on top of that, and we’re simply not trained to do that in the humanities. We don’t know how to do those kinds of experimental designs.
When we talk about the digital humanities and we talk about curricula for it we always point to things like programming or technological skills, but things like statistical understanding and experimental design will become increasingly important as we make that shift toward the public humanities, as we think of that applied perspective, that functional perspective, or that instrumental perspective.
[Maximilian Schich] I totally agree, I think that’s the key point. Basically what we don’t do in the humanities is we don’t assimilate, we don’t model–in a numerical way–not in terms of data model. Basically if you want to understand the data, right, we need to actually model the processes going on which lead to the construction of the data in order to understand what’s going on. That model may be wrong, that’s what the scientists, do, right, but they come up with a better model. But they’re not doing the same thing as we do.
What we do is we interpret data, which is not, not a tool for forecasting and prediction, and that’s a big problem we have, actually. And I think there is nevertheless a glue known as, as he [Ashton] said, data more and more comes in the same kinds of formats. We find out there is an ecology of complex networks, which we have to understand. We have similar problems with our data. If a natural scientist produces a visualization, he has to interpret the visualization, the same way an art historian would interpret an artform or a painting. Because there are these kinds of things; how do you construct the visualization? If it’s literature-curated data you have the first time, some cancer might be very, very frequent, and your dot might be very, very large due to the fact that a lot of research is going on in the cancer–while nobody ever has the cancer. And at the same time, cancers which are super frequent, but there’s no research, are super small. That’s a very, very frequent phenomenon.
So, that’s a hard sell to a natural scientist. And it’s equally a hard sell to say okay, we did something like, say, a Master or a Ph.D. in the humanities and we didn’t learn statistics. That’s wrong. It’s really wrong, because once we have a lot of data, we actually have to do statistical tests [to determine] if certain kinds of arguments we derive from the data are true or false.
[Elke Teich] Just to add another perspective to the question of what’s specific to the humanities would also be to say okay, all objects are x, y, z. And are there any other disciplines that also analyze these objects, x, y, z, or texts, images and so on. And I think there are. So when you look at computer science, or one part of computer science in the last ten years or so, it has developed a theoretical objects-analytics, right? So you analyze large amounts, collections of text, and you’re not just interested in finding patterns in the signal, but you’re interested in what kind of knowledge is expressed in the text. What are the facts that are expressed, what are the opinions and so on. So if there’s that field, text analytics, is that something we’re also interested in? Or is that something we don’t ever ask about a text collection?
[Elena Pierazzo] I think the thing we should remember is that our objects are a great conveyor, in this instance. They do have, most of them, an aesthetic value that is offered to the general public for their private enjoyment, ok? Or, for educational purposes, but for most of them there’s this aesthetic level–the private pleasure.
That is a very fundamental point about the difference in our objects, because we datify objects, we don’t datify that–the aesthetic experience.
We do deal with other aspects of the data we build, of the structure, but we have invented in the past few centuries, a positivism–a way for analyzing this object in a structured, objective way, and that’s what we’re trying to model this [on]. We’re going on the same line–of the structure, find the function of the character, what they do, how they move, who says what, the language of this, because we are trying to datify this object, but we cannot. So far as I know, I haven’t seen anybody able to datify the enjoyment–the pleasure that reading a poem of seeing a beautiful painting, or sculpture, or building is able to give. So, the point is it is the absurd we are trying to do in a sense. We are trying to capture what does it feel, and what is the value of the object –they make you feel something. In as sense that we are going to, their value is cultural if you wish, but also personal, and it something that we aren’t able, or we haven’t done, in a sense.
[Fotis Jannidis] There is a huge movement for the last 50 years, concentrating on the emotional aspects of the literary texts, so that’s not new I think. They code from this survey where they’re doing a lot of cognitive studies at the moment, so they do exactly this, and probably you could read and encode it, because they’re finding…to find codes in the text–or relation between cultural codes and the things or expressions within the text. These things are being done at the moment.
[Elena Pierazzo] Yes, but not by the humanities people.
[Fotis Jannidis] Yes, yes! By the humanities people, at least in Germany.
[Maximilian Schich] I think there is a major mistake to say the humanities people or the natural scientists. I am working with both very closely, and I have met people who are incredibly, basically almost racist towards anybody who bridges disciplines in both camps, and I think we cannot generalize. But, it’s true, I think. You know. There’s Twittermood; there’s Flashmap [sic; Fleshmap] by Fernanda Viégas, for example, that actually measure stuff in a quantitative way, which is very, very surprising what it tells us about us, it’s the same thing, right? You use computer science methods, and you find something out about yourself, which is awesome, I think. And there’s more and more such interest across the board because obviously the people who do physics, who see that they can apply their methods, the very same—you know the…they study Bose-Einstein condensation in some material and now they can apply that to society and find out something because that obviously is that they are interested in society just as the hermeneutic, humanities person is interested in the very same thing. I think that’s interesting because sometimes these people often go further than we do.
[Julia Flanders] I think I’m hearing a point you’re making here about identity politics. I get the sense that there is an identity politics operating within the humanities that drives, and I’m going to deliberately generalize here, that sets as an official voice of what constitutes humanistic inquiry, properly speaking, and that sense of the value of that boundary as a defensive layer, I think, has a long history.
But, what strikes me is Andy’s point about whether in effect, data models in the humanities are intended to put data into conversation–to create a more publicly useful data-driven discourse, or whether the purpose of the data model is to allow an individual to represent accurately his or her own insight about things. I think that’s in a sense at the heart here, I mean, when the humanities, in the old-fashioned sense, receive word from the ‘digital humanities’ that the ‘humanities’ ‘needs to’–that there’s some imperative to move our frame of operation into a more linked space, a more commensurable space, a space where we can actually use our models to study what we’re doing, I can imagine an identity politics response from the humanities saying, not only is that ‘not what we do,’ as a matter of identity, but also that that is going to operate to the detriment of our ability to think the kinds of, individual thought scholars, to do the kind of scholarly research we’ve been accustomed to. And I know, I think of TEI, which is the data-modeling universe I’m most familiar with, that I would think is the central problem of whether the model is a service of private expressiveness or more of an interchange.
I think it’s really interesting to see, coming up here, in a way, a question we may not be able to answer. And we can certainly talk about it. It is: to what extent is there an imperative, and what could be the value to the humanities of either moving in that direction or not moving in that direction? How could we measure that value? How could we build it into our sense of the world of humanities in public discourse the world of the humanities in culture?
[Andy Ashton] I just want to respond briefly and say that the comments I was making about data being linked are not solely for the greater good of other disciplines in the general public; I think that is a side effect. But, I wonder if there’s something essential about humanistic research that would not benefit the individual scholar from having that data be linked in ways that are not looking inward at the object of study, but looking outward and at its relationships.
[Wendell Piez] I want to pick up on that, too, because I think that what Julia [Flanders] just said is very important. But at the same time, as I see it, a lot of the stresses that we face are expressions of a natural anxiety with a change that is happening around us, which, in many ways, is simply something ‘there to be dealt with,’ but something we’re not much a part of as individuals.
Our question is not “is this going to happen?” but, “how is it going to happen?” How are we/am I going to happen? And that’s the most old-fashioned, humanistic question of all, isn’t it? I mean, within that context, I think that’s really not an either/or consideration, right?
The reason I’m in the digital humanities is because I’m in the humanities. And the digital humanities happens to be the form that I do it in because I also live in a digital world. So, I have no problem with the ‘old-fashioned’ scholar who wants to continue to do the old-fashioned thing. I think that’s great, and more power to him.
So to me, these questions of how is it that we address…and how do we make this public with syntactical value and so forth, those are just the same questions that the humanities have been facing all along. And with respect to that, let’s remember that the sciences came out of the humanities, because we’ve been doing this since the Renaissance, asking these questions. So, my feeling about that is “come on in, the water’s fine.” I understand why they feel the stresses, but I also don’t feel that they’re stresses that are anything particularly new. By asking the questions that we’re asking, we’re actually moving us forward, if only to discover that we’re not the only ones who have them.
[Elisabeth Burr] I was under the impression that it was a very reductionist picture of the humanities because, I mean, for example, there are other languages. I mean, I’m doing corpus linguistics for example; it not just all about emotions and beauty and things like that. And so, I do digital humanities, and I think linguistics belongs to digital humanities, or at least, I want it to be there and not somewhere else.
I do digital humanities because it allows me to ask really different questions–to have all these relationships, to have a lot of data, and get a more holistic knowledge or impression of what there is, actually–I mean, between maths and society and history, and I don’t know what…I mean, all these sorts of things which we could not do before.
[Stephen Ramsay] I think it’s worth noting as a matter of historical consciousness that what we’re calling the old-fashioned humanities, are historically the new, the very, very new humanities. I mean, this idea that interpretation is central is in fact a defining activity of humanistic inquiry.
You know, data modeling, as far as what I can tell, is what humanities looked like, more or less, between the 16th and 19th centuries. There was a time in humanistic scholarship so called where publishing a book of Greek cognates was a perfectly acceptable, excellent act of scholarship, and in fact, what you should do! That’s ‘what the humanities is,’ you know.
And actually, I think if there are identity crises and worries about, questions about ‘is data modeling scholarship?’…and perhaps the answer is no, and that could be bad, I think a lot of that is actually the present contemporary humanities’ fear of its own past, because it looks for all the world like this thing that the humanities definitively destroyed forever after the Second World War; I mean, there was a real reaction to anything that….The list of things that are not really the sort of the thing upon which we can build a tenure case are exactly those things that the humanities did for hundreds of years. So, we’re up against massive historical forces here when you start to talk about ‘is data modeling scholarship?’ as a matter of political and as a matter of identity. It’s the present humanities that are the newfangled.
[Jim Kuhn] One thing that we haven’t touched on yet, but relates to Andy [Ashton’s] question about the functional perspective on data modeling, relates to sustainability. And one of the questions about, not just the fear of past but also fear of future, and relating to the hard sciences, is that right now it’s easier to redo the genome of a mosquito than to recover the data about that genome that was done a year or two ago. We’re in the position now of having to get old-fashioned in order to push things forward because the ways that we engage in this sort of discourse is by slapping flat images of the user interface into a scholarly article and talking about it; and that may be where we are stuck at the moment.
But I see a future, maybe 50 years from now, when we’re looking back at the landscape with littered, failed projects, and we actually want to lift them up and use them and try to recover what’s going on with that data model, much of which is full of implicit assumptions that were never unpacked, that are invisible to us, software dependencies that are gone. And as a librarian, I share the concern of trying to get the data to play nicely together, not just now, but off into a future in which people will be coming to libraries, and asking, “give me that.”
[Maximilian Schich] That’s a, I think, a very, very, very interesting point, and reflects what Tim Berners-Lee says about the initial stages of the World Wide Web–where the ‘data hoggers’ would close their web sites, or only give the link out to other people who know them, and they would not link out to other pages because there were jealous about their page. And these were obviously the sites who were forgotten, right? Because the linking is what keeps you going, it’s the same thing we know….How many Greek texts do we only know because some other Greek author said ‘there is this text by the other guy’? But, that also produces a kind of imperative–that we actually have to, as the people who produce data models, as the people who produce data (and if I analyze data, I obviously produce data, right?)–to actually publish them in the right way. So there is no infrastructure right now, which is simply political infrastructure–because it’s totally possible right now–any kind of data, any kind of text data can be published in some XML format–in RDF–a data model can be done somewhere in something which can be read in twenty years and stuff like that, right? And I’m pretty sure, if there’s enough of this stuff, probably there will be even historians of data who will do nothing else but actually study this kind of thing.
But, right now, we’re still in this phase where people are still reluctant to give out the data. To give you an idea, I’m currently working with person data, and the proprietary data sets I get are two orders of magnitude larger than what’s on Wikipedia. Anybody’s free to put any kind of person on Wikipedia, but people don’t. And that’s a very important message, I think.
[Laurent Romary] Yes, I think we should have no problem of identity because we are not the first ones to go through exactly this…we spoke about natural scientists and physicists…thirty years ago, a scientist [astronomer] would go to an observatory, look at the thing, and make observations. They took benefit from photography and digitization, and now they look at various ranges of wavelengths, and they got observables, like we have observables now. And those observables change form. I mean, there’s this notion of surrogates being just another way of being observable, and people, instead of going to archives now, would rather have these observables in sight [on screen]. You know that nearly, according to the latest figure I’ve had, 80% of astronomers do not ever go to an observatory. They don’t have [to do] that–they just open their computers and compare wavelengths and do observations of a certain place because they know they’ve got those databases. That’s probably what we’re doing now for humanities as well in the long run. So, it’s not an issue. And I think data modeling is a way, and I think they are doing data modeling as well, to say ‘look, I’ve got these observables and I want to reconstruct some kind of phenomena.’ So, the observables are on the other side for me, it’s not the observables proper. So, what are the mechanisms we use, what are the tools we have to correlate wavelengths across dictionaries or words that occur across texts, or what have you.
So, it’s really a matter of having a methodology. So, forget about identity! We need to know what we’re doing now achieve this–to be not the observers of what’s going on in the digital humanities–but the actors of changing the methodologies, of defining new observers.
[Elli Mylonas] I just want to respond to Steve [Ramsay] a little bit, because you referred to how data modeling was what humanities did. And one of the really interesting things is that if we take that modeled data and try to use our tools or squeeze it into our models or modeling instruments now, we realize how it was not consistent. It did not map actually beautifully to these structures that either scholars then, or at least we now we apply to.
Anyone who’s marked up a classical dictionary…for example…structure…the Oxford English Dictionary…structure. No, they’re not. They’re not structured at all, actually. So, I think that raises…I’m not trying to say that we didn’t ..that people didn’t do modeling back then, but what we are calling modeling, or what was called technology of scholarship, doesn’t map as simply, and maybe that goes back to Laurent [Romary’s] statement about looking for what are the tools that do that. You see, the flip side of it is that we may feel that these models were not consistent because our tools expect something that is not appropriate for that kind of data or that modeling. So, I don’t know what the answer is, but I think that from the trenches of trying to work with XML and classical texts or something, one starts to feel not that one is right and the other is wrong, but the fact that there’s a difference or an inconsistency is where we might want to look for some meaning in what we do when we model humanities data.
[Wendell Piez] I think that’s something important to keep in mind, right? I mean, I think we’ve all faced that issue–that we see this commonality between the activity of the 18th-century philologist and ourselves, and then we discover that, ‘oh, wait a second’ and mapping is problematic. But, saying that mapping is problematic is not to say that it does not map, right?
This goes back to what we were talking about earlier–about markup as a heuristic activity as well as markup being something that’s–you know, you’ve got to come up with some sort of a product. And then, you know, I think we’ve talked about this for years in the context of digital humanities and what we’re doing in that there is something about our modeling in which the consciousness that we bring to the modeling activity is partly how interesting it is when the models fail and how necessary it is to pay attention. Not just to improve the model. Yes, there’s that, but, also because it’s also telling us something about the phenomenon and the observation. That’s something that really speaks to a lot of these issues that we’re putting together here, because it’s where the problem sort of folds in on itself. Because we have texts for modeling text–text is a thing we’re using to model text with. So, we get back to these issues where there’s no escaping the box that we’re inside. We get a larger and larger box, but it’s been that way for a while.
[Desmond Schmidt] Can I just take that up before…sorry to interrupt. I’ve been listening to the last few comments, and I’d like to say something that, I think, if we’re going to put subjective judgements about texts, what we regard as our interpretation or model into the text, mingling it with the text, we’re going to always have this problem about trying to reuse data to download the wavelengths, or whatever Laurent was talking about. We need to be like the astronomers, and say, oh, we’ll pick this text here, we’ll pick this interpretation (or indeed, model of a text) and we’ll merge the things together; and we’ll get some kind of output. That gives us lots of different possibilities.
But, while we’re still mixing it together and fixing it in there, then we’re stuck with this problem of interchange, that is to say, interoperation after conversion with possible damage of the text. What we actually want is literally interoperable texts that we can just load into software, use, put back, get someone else’s, and we haven’t got that at the moment.
[Maximilian Schich] May I add onto that? So, basically, I think this is also an epitome of how digital humanities projects are often structured. There’s always this kind of component of, like, let’s build a tool which does the analysis on top of it, right?
So some people digitize something, they bring data together, and say, one work package would be to actually build a tool that would basically analyze that kind of thing and produce nice little network pictures. But, the point is: that’s not how it works because we don’t know the structure of the networks. We don’t know if the actual community-finding algorithm isn’t actually bad. Of course, we find out two weeks later when another paper comes out. And, I think, basically, what we need is like a model or a kind of system where people can actually collect data, apply different models to the data, actually analyze the data, and do basically the same thing the scientific process has done with papers, right? So, people publish a result, you take all of the results ever made. You write another paper, and you put it back into where it belongs. And I think that’s something which is not done in any kind of digital humanities project, especially the legacy ones, which run for say, 10 or 20 years, often have this kind of thing…what they’re doing in the particular point of what they’re doing, say, antique reception, is the database doing that kind of thing.
But there may be other scholars that come up with different ideas, other algorithms have a slightly different angle, produce another data set. And I think that’s the kind of thing we need. Just as in astronomy, people, they agree on the raw data, but what’s going on after that is super different for most of the people.
[Fotis Jannadis] Let me ask you [Schich]: What’s the raw data we are using? Does it just … have strings? Or do we have a cultural construction of what our raw data is? And the digital libraries are really trying to reconstruct this cultural construction, in a way, so that we all have the same basic ideas of what’s important, right.
We are talking about two different data models at the same time here: one is the data model digital libraries offer; they have a very strong understanding of what people need. On the other hand, we have this open-minded process where people say, ‘I have an idea which could be interesting for my research,’ and throw something at the text. And obviously, we choose different things, and we show different results, and we have different techniques, and different demands of the environment.
I think this goes back to, or it’s very similar to the discussion vs. top-down kind of classification with librarians with a predefined thesaurus vs. tagging stuff with hashtags, for example. And the interesting thing is that if you analyze quantitatively, it’s two different things. And there is a lot of quality in what librarians do with these predefined kind of classification systems.
But, there is also some gap, they don’t get certain things because they don’t have the classifications in their system or it’s not within their cognitive limit in what they want to look for.
And with the hashtags, there is the problem that there is so much noise, you need a lot of data to filter out some structure. And that noise is something the humanities doesn’t talk about. At all, right? Noise is something which is really interesting because always, as a single scholar, we’re always operating below the noise threshold.
[Kari Kraus] Well, in humanities, too, we see, we interpret noise as signal.
[Maximilian Schich] Yes.
[Douglas Knox] The discussions of earlier models and dictionaries and pre-digital models make me wonder if there isn’t room for perfect models. Modeling can be inventive and creative; we talk about some sonnet forms as a model that you could implicitly validate the poem against and say, no that doesn’t meet the rules, or say no, these projects don’t meet the rules. These products don’t really exist in nature. Critical editions don’t really exist in nature. And I don’t know that that distinguishes us from…more technically, in math, matrices don’t exist in nature either. So, we can make this stuff up.
[Susan Schreibman] We’ve been talking about modeling…either explicitly or implicitly and the text, but what about non-textual data? So you can see how something for the art historian, yes, it makes sense to model non-textual data. But what about, for example, literary scholars? Or modeling an historical record, not with text, but building other kinds of non-textual models to do what Elisabeth [Burr] was saying, ask new questions about the objects that we study?
[Maximilian Schich] Very simple answer from my side is, if you put enough series one on top of the other, you don’t see the difference anymore. This discontinuousness of digital data becomes something continuous. And the other thing is, like, if you look at cognition science, about how syntax is built up, it’s strongly non-verbal. There’s a lot of motor things going on, which it totally not digital, right, at least in effect? And yes, I think you’re right. That’s a discussion we have to do because all the data we work with in the digital humanities is usually translated to textual data. And you can actually see that in the funding landscape, and you can see that in the conferences–the MLA. This year’s Modern Language Association’s conference had 20%–26% digital humanities sessions. The Historians’ Association’s had between, I think, 16% and 20%. And at the College Art Association’s, there was one session which practically [sic] the digital humanities and one session which talked about it out of 100 sessions.
This is amazing if you take into account that the arts…a lot of the tools scientists work with, in terms of digital data, comes out of art colleges. For example, Processing, the graphic visualization language, it was done by artists and designers. So, I think that’s a very legitimate question, and I think my personal point-of-view on this point is that, actually, the people who deal with non-textual data should make a stronger case that they are also a part of digital arts and humanities. And I think they should also have a part of the funding of the National Endowment for the Humanities, for example, which is not much money, but…
[Stephen Ramsay] I want to just say something; I want to re-interpret Susan’s question and then answer it. And I don’t even know if this is… it feels relevant to me that…well…First of all, I hope that we’re not here to talk about XML. I hope we’re here to talk also about GIS, and we’re also here to talk about relational databases, and we’re also here to talk about image data models and image processing systems, and all of that, because one thing I notice about the XML ecosystem is that it doesn’t look like any of these other things. Those other things have way more in common with each other than with the XML ecosystem. And the XML ecosystem does have a privileged place in the humanities; I mean, it is a focus of our attention. But, XML as a technology–as a platform–doesn’t look much like relational database systems, it doesn’t look much like GIS systems, it doesn’t look like the R environment. And yet, R, GIS and SQL all as a practical matter share enormous similarities just in the way you interact with data. For example, in SQL, the line between the language used for data modeling and the language used for processing almost vanishes at a certain point. Whereas, that’s actually a pretty hard line in XML, despite the fact that XSLT is homoiconic with respect to XML, forget that for a moment. I mean, XSLT — it’s a different kind of thing from XML tagging. GIS systems–that’s not true. It’s not true in SQL. It’s not true. I mean, that seems to be a really big question. We’d better have a very good reason for that. Like, we’d better have an excellent reason for why that’s so because that’s striking to me–that just the way you deal with data modeling in these other domains, including in image processing, is very different. Very very different.
[Douglas Knox] Are you looking for rationale or historic reason ?
[Ramsay] Either. I’m more looking for rationale, but I’ll take either. I mean, I think I know the historical reason, but I’d like to hear a rationale.
[Michael Sperberg-McQueen] Can you expound on that, because I’m not there at all. I can understand…I would have said that a) there is a line between in SQL between the metadefinition language and the data manipulation language, but maybe that’s because I hang around with a bunch of SQL geeks for whom that distinction is an important one within the SQL language. But, I would also have said, to the extent that there are similarities and it feels like one language, I … maybe it’s the way my head has been bent …
[Ramsay] Well, first of all the data definition language and the data manipulation language in SQL are literally the same definition. They’re in the same language. But also, as a practical matter, in all these cases, you open up a shell–I mean conceivably–you open up a shell and can work in either— you go back and forth between processing and data definition like this (weaving hands back and forth) and that is not true in XML. (I don’t think.)
[Wendell Piez] Well, it can be true.
[Ramsay] Well, like how? I mean, it could be, but it’s not. I mean, this seems to be the important thing, and this is a—we’re going down to earth here with a very practical issue about how people work, about workflows. Now, our workflow is very different.
[Piez] Well, this was what I was speaking to with my two workflow graphs. Because, as I see it, the ossification, the fossilization of a particular processing model of XML privileges schema design up front to such an extent that the schema becomes the be all and end all, and it actually impedes and prohibits much of the activity of data modeling that we claim to be interested in as humanists. And that’s actually not even necessarily a feature of XML, to say nothing of the other possibilities of other technologies. And in fact, it’s possible in XML in a very different way, but part of what Desmond’s getting to is that because of the nature of architecture of XML, it then forces you, you know, at a later point into a monolithic hierarchy that then prevents the kind of interactive relationship with the text and with interpretations over the text that we all know we want.
[Ramsay] You might be making a stronger statement than me because I’m not sure that the model we have precludes the workflow. You seem to be saying it’s in the nature of XML, and that’s…if that’s true, that’s a thunderous revelation!
[Piez] [Laughter, talking over each other]…When I say that’s the case with XML, what I’m saying about it is that’s the case with XML as an historical artifact that has evolved in a certain particular direction, not that it’s necessarily the case with XML-based technologies, which I do know that people are using it in different ways. And it just happens to be the way it’s usually done, very likely, and for the most part, for very legitimate, proper reasons, because it has to do with scalability of the publishing system—as opposed to the kind of system that Chris is interested in, which is much more dynamic, interactive, malleable, fluid, and you don’t want a publishing system to be like that.
[Ramsay] Well, I’m not sure—I mean, I don’t know, having never had that system, I don’t know! I mean I’m serious …
[Piez] Well, put it this way—I want one, but …I know many people who want to control things a little bit more.
[Ramsay] But I’m always suspicious of that. Of ‘what people want’ …
[Maximilian Schich]: I think that we’re at a very important point, and I think we could channel this discussion a little bit further. If you just abstract from what kind of relational, SQL, XML or what we use, there is this one kind of tradition where people create a data model, they have some, you know, a card, where you have a field where you say, ‘spouse, children,’ whatever, stuff like that, and that basically defines the relation between two people, right? And so, if you go to the other end of the spectrum, you could actually, you know there is this kind of data, if you make it super abstract, on one hand you have the nodes, that’s the people, and on the other hand, the links with the link types, that’s two tables and you don’t need more. And then, basically, the guys who enter the data could actually come up with link types as they go, which is, if you look at historical documents, often very very good, because there is a lot of link types the data modeler beforehand couldn’t even think of, right, because he didn’t know the text.
So, on the other hand, there are certain things you want to have beforehand. You want to have the spouse of every person who applies for a visa, for example, so there is this kind of two uses. It’s useful in both cases. But, what’s the important point of that is, if you do this kind of thing, that you allow for data modeling on the go, you actually get a benefit because every user, even without thinking, or not being supposed to think about the data model, can expand the system, and then in the end you can measure how relevant is that. There will be a couple of linktypes that will be consistenly very important for the system, and there will be a long tail, at least, for the datasets that I know, there will be a long tail of link types which are noise. Because people called “spouse” “ehegatte” even though it’s an English database. Some errors you can correct, some you can correct automatically, but some you can’t, and maybe the tail is something interesting. It will be one third of your data, and this one third may be interesting to some scholar interested in how the noise is distributed. So, I think we should do both. We should not have the conversation of ‘should we model beforehand or should we model afterwards.’ We should actually do it the entire game, and say ok, you know, “spouse” is what I want to know all the time, but you can answer whatever you want, and then you find out that most personal relations are a personal circle. Nothing, right?
[Piez] Well, I agree with that, and I also think that that’s essentially what Desmond is talking about when he talks about a system that is able to support the activity of interpretation, the activity of expression, and at the same time, not lock the data into that in such a way that the data becomes useless for other purposes, but serve it all out.
[Gregor Middell] I just wanted to add to Steve’s observation about the weird thing that we [don’t like to?] look at the aesthetic aspect of modeling very much, if we look at XML, if we don’t talk about the …, about data processing. What I want to add is that in computer science it’s […] normal actually because you have to take both aspects into account. Any modeling language [once you read that modeling language?] says the two views are complementary, they cannot read the aesthetic aspect of some [computers] without taking the dynamic (?) aspect into account. …
[Julia Flanders] Is it possible for you to repeat what you said and/or either stand up and repeat?
[Maximillian Schich] Should I repeat what he said? It’s that in computer science it’s built into how you do modeling that you have some kind of dynamic process going on, right? And I think it’s very interesting to look the difference here. There’s a difference between humanities and natural sciences. The physicists have the Pax classification system, and basically, if they submit a paper, they have to classify their own papers. And then, it’s a result every year to calculate what’s the structure of Pax, so what’s the hierarchial structure of Pax. While, in the humanities, in most cases, say, for classical archaeology, the Deutsches Archäologisches Institut will define a tree, and they will give it to the libraries, and they have to classify that certain way, and so, no archaeologist can ever contribute to that. And that went on for 50 years. And then they throw out classifications because they don’t have enough librarians to actually do the classifications, which means archaeologists would sit at home, go through thousands of books in one classification and say “okay, i need that, don’t need that.’ If they could classify themselves, it would be a very different story. And so here, the data modeling in physics is way better because it’s along those kinds of lines.
[Julia Flanders] Thank you all extraordinarily much. This has been fascinating. I think we’ve all earned our lunch. [Applause.]
One of the main problems with data modeling in the humanities is that it requires a necessary reduction of analog material that is normally experienced as a contextualized phenomena to decontextualized digital data. Kari Kraus argues that the humanist urge to fight such reductionism represents one problem with using data modeling in this particular context. Wendell Piez notes that a successful data modeling system requires the creator to be aware of the reductions and concessions necessary to construct the system of analysis. Stephen Ramsay argues that the discourse should focus on the effectiveness of data modeling as a tool rather than its unique constraints and shortcomings.