Julia Flanders, “Modeling Scholarship”

Theoretical Perspectives II (March 15):

Julia Flanders, “Modeling Scholarship” (paperslidesvideo)


[Julia Flanders] Our discussion has so far inventoried a number of kinds of digital objects, considered as targets of modeling practice, including simple objects, which we model through surrogates, or informational relations that we can model through links and RDF, and also complex objects that are in fact surrogate plus metadata, or surrogate plus commentary, or surrogate plus some other encrustation of stuff.

I would like to focus here on another kind of modeling, which I’m going to call “Representations of Intellectual Systems.” Thinking back to Allen Renear’s talk yesterday, I think what I mean by “intellectual” may be extending what he means by “intentional.” In other words, systems in which the system itself may be an object of scrutiny while in use. Or in which, as Kari Kraus put it, and I’m paraphrasing here: the noise we make while trying to communicate signal is itself of interest to us. And the TEI is an interesting example of such a system, and it’s particularly interesting to me because it offers a formal way of modeling types of information that are pervasively present in scholarship, and may be given a great deal of careful attention that results in detailed formal description but have not typically been formalized as data in scholarly practice. In other words, they’ve been formalized for use by humans but not as computational processes. What I’d like to do in this presentation is take a look at the TEI as an intellectual modeling system and consider how it represents these kinds of complex vectors of information. Also, how this information might be used more effectively in digital scholarship. So let’s step back and think “What are we modeling when we encode a TEI document?” This is just sort of entering it into evidence, because i think most of us are already familiar with this system.

So first of all, through our use of markup on its own, without regard for a schema—just sticking codes within the text or the data that we’re considering— we might be modeling a document or a data source . For example, selecting pieces of it for representation, describing those pieces using a formal language that is internally consistent and that allows us to associate a semantics with the markers that we use. This is the inner circle here in the slides of the document itself. That’s one thing we might be modeling when we encode a TEI document. And in the case of the TEI, the information we’re modeling here in the XML data may also include not only the document or the data source itself but also a self-conscious, more or less, representation of our transcriptional, our editorial, our interpretative activities with respect to that document, undertaken as part of the creation of the digital object. It may also include relational information that connects this document to others or to other data sources. So that’s all still in the associated information.

By using a markup system that has reference to a schema—by using the TEI, by using a specific markup language— we’re also modeling the type or genre of documents. We have to acknowledge that both of those terms deserve some scrutiny on their own, that there isn’t time for me to give here. But in other words, we’re locating this document, this set of information within a system that identifies some documents as being the kind of documents we are modeling here, and by implication that there are others that fall outside of the category.

Another way to put this is to say that we’re modeling this document as an instance of a genre, whether or not, in a larger sense, it really is a member of that genre. In other words, we’re sort of assimilating it or appropriating it to that genre and claiming ownership of it in that respect. So broadly speaking, we might term this domain as the document ecology—the set of characteristics that constitute the commonalities among this set of documents or this set of statements.

In this way, we’re also modeling our own intentions with respect to that document—both on its own and as a member of the collection, whether implied or actual. By making a claim that this document belongs in this class of documents, we also say that we will treat it as such. We will process it as such, we will read it as such, we will interpret it as such. These statements also affect the information that will be made visible or accessible to other consumers of the data. The kinds of expressiveness, in other words, that the document can retain within the communicative system that we’re setting up for it. And as we generate successive versions of such models, of such genres or of such appropriations, we’re also making claims with respect to that document projected through time. And at the same time, finally, these kinds of statements—always implicitly and sometimes explicitly—have meaning in relation to other project statements about and treatment of their own documents. The use of the TEI itself expresses an intention that the encoding of the document be intelligible within a larger system defined by the TEI. The specifics of that use—what elements are used and not used, what attributes/values are used and so forth—also situates the encoding we’re doing within a spectrum of use. A very broad spectrum. Is our encoding, comparatively speaking, a very detailed encoding or a very impoverished encoding? Does it have specific disciplinary affiliations? We might call this sphere of information, this last outer rank, the social ecology of our documents. In the TEI this social ecology is expressed and accomplished through the ODD—the One-Document-Does-it-all system.


The ways in which markup models documents and information about documents is, of course, familiar. For the sake of time, I’m going to set it aside, bracket it off, and I’d like to take a closer look at the schema and the ODD.


First of all, thinking about schemas, as Wendell Piez and Alan Liu have shown in detail, historically schemas as constraint systems have typically arisen in process, as a result of the need to regulate the manufacture of pieces of a work process independently and formally, rather than by seeing if the work process itself results in a functional outcome. For example, instead of waiting to assemble the lawnmower to find out whether it will work, we test each individual piece against a gauge and then we can discover the flaws and inefficiencies in the work process. Schemas help us regulate process. Hence, schemas also model a diachronic information space. Any given schema models a stage in a process, even if that process has only one stage. And the entire process can be modeled as a series of schemas.

It follows from this that the schema considered from this perspective also models the expertise or intention being exercised at that stage in the process. For example, in a workflow for a journal article that begins with a certain light kind of authorial encoding, followed by a layer of editorial encoding, the latter process might involve a different schema with elements for consistent keywording of topics, for example, or process metadata. The schema that regulates the final publication might need to enforce the presence of certain kinds of required publication metadata or renditional information. Having made this suggestion, I think it’s interesting to note that in the TEI, for historical reasons, schemas appear in an odd sort of way to possess a kind of timelessness and agentlessness. I should say “Appear to the uninitiated,” perhaps, or “Appears to a naive look” to possess a kind of timelessness or agentlessness. Because the TEI is framed discursively, it’s interposed into our digital scholarly universe as if it represented a set of convictions about documents, philosophical convictions, rather than as a set of functional or processing requirements.

For many individual scholarly users of the TEI, as distinct from large-scale projects focused on production, schemas are in fact conceptualized as part of a work system, perhaps because the academic setting of that use tends to obfuscate or set aside that dimension of the work in favor of a more timeless view of the document in which the model of the document represents above all a set of intellectual convictions or methodological convictions. Time in the TEI is perceived in a way as changes in those convictions—developments in our research trajectory, refining our ideas about documents, embracing more and more of the intellectual territory of text encoding at both the individual level—I as the encoder am improving my understanding of my texts — and also at the level of the TEI itself, which offers a kind of steady narrative of improvement. The idea that we’re refining and developing new features. There’s even a nice technological progressivism there, going from P3 to P4 to P5 to P6, we assume. That’s a quick view of the ecology of the schema, let’s say. Let’s now turn to the ODD system and the ODD customization file.


The ODD system and the ODD customization file are very distinctive, and I think in some ways entirely unique way of defining markup language. Any given ODD customization file taken on its own models a single set of choices about document constraint, aimed at expressing a set of decisions about the modeling of individual documents or set of documents, and aims at creating convergence in the modeling of a single set of documents. So, a single set of constraints that has a particular modeling aim. Any given customization file also models the delta, the vector of difference, between our local situation and the TEI central. Multiple ODD files taken together model divergences between multiple data sets. As we increase the number of customizations we’re looking at, we’re also getting an increasingly complex view of the kinds of divergences that these customizations represent, and we can also look at the nature of these divergences. These might be data sets from different projects, or they might be multiple stages in the development of a single data set, or stages in a workflow, or stages in the development of a project’s thinking about how to model the data.

Whichever the nature of the sequence of customizations, by comparing them we can also get an understanding of how these customizations differ from one another, as well as how they differ from the TEI in its unmodified state. In a way, the ODD also reinvigorates our ability to understand the schema not as a set of timeless convictions that express my true beliefs about the text, but rather as a set of functional or contingent constraints that operate within a specific ecology, within a workflow, within a developmental narrative or within a disciplinary community in which debate is taking place. In the modeling of divergences here, we can also see another important but underexplored relationship being modeled, namely that of debate and dissent. So for example, debate within the TEI community about what features are fundamental to our understanding of text. For example, the diversity of aims and methods that produces the complexity of TEI as a whole in the first place. Also, we can model changes in the terms of that debate over time, which register as changes to the TEI schema overall. We can also model dissent on the part of any specific individual or project from any specific modeling decision the TEI has agreed upon. If there’s some particular area of the TEI schema as a whole which is particularly controversial, we can identify points of dissent from that particular component. For example, on whether a specific element should have a particular content model, or should be permitted to go in a specific place.

This debate and dissent is modeled with considerable explicitness in the ODD customization, since the customization file records what elements and attributes are included and excluded and what changes to classes have been made, and what kinds of controlled vocabularies are there, what new and renamed elements have been created and so forth. So we have here a scene of considerable texture and complexity, considered either as a cross-sectional snapshot or as a temporal sequence. As a cross-section, what we’re being given here to work with, what we can imagine through this modeling exercise is, for example, debate. At any particular moment in a history, we can model uncertainty, we can model places where we are hypothesizing or speculating in a contingent way. We can model alternatives and choices. We can model intentions. If we consider it as a sequence in other words, if we can introduce time, we can model work process, we can model developmental processes, we can model the history of debates and the history of actions. In a sense, I think what we’re seeing here is the emergence of a set of tools for modeling something like historiography, or whatever the equivalent might be in literary studies or any other discipline. In other words, a self-conscious representation of how theory and practice change over time. The uncharitable way of characterizing this would be as “The extreme refinement of navel-gazing.” Looking back in retrospect over the conversations we’ve been having today, I think one of the questions I’m raising has got to be the value of this kind of historiography. Crudely put: Is there a market for it? Is there an intellectual market for it? Or is it really a matter for us of just getting back to the data? I’d be interested to hear what Wendell [Piez] thinks about that. But I think that the emergence of the ability to do historiography through models is at the very least of historical interest in the development of the field.

So what can we learn from the TEI’s example here? The TEI’s potential to serve as such a tool set, a set for, among other things, modeling the history of our own ideas. It has arisen first of all by virtue of its situation at the center of a very complex modeling problem, namely, humanities textual data. Secondly, it has arisen because, by its nature, the TEI is designed to handle not only the modeling of that data but also the markers of transcriptional and editorial and interpretative self-awareness. The TEI is designed to handle the non-transparency of the modeling process itself. That’s part of what’s being modeled. Again, I would try to align that with Allen [Renear]’s sense of intention as being a distinctive part of humanities data modeling. I think that this kind of self-awareness is cognate with that intentionality in the sense that you meant that word. Thirdly, the TEI’s potential has arisen through its responding to scholarly pressure, pressure from scholarly users, to provide ever more nuanced ways to capture these contours of scholarly responsibility. In a way I think this is reflecting Paul [Caton]’s sense that you can never do only just a little data modeling and Allen [Renear]’s description of the “push to take other people’s models seriously.” I think that, in the same sense, a kind of intellectual curiosity has driven the TEi community to crave more and more nuanced ways of accounting for the non-transparency of our modeling methods, or of performing those modeling methods in a more and more responsible or transparent way. And to think about how the digital medium itself can serve as a vector for scholarly ideas and scholarly work. I think the proposed new genetic module for manuscripts in the TEI’s a good example for this. I’d like to just close with two questions that I think I still need to have answered, that are things we could maybe discuss as a group.


First of all, is there an advantage to modeling such an intricately connected field of information within a single representational system, or are there parts of this information that would be better factored out and handled separately? Understanding that already there’s been some of that factoring just in the system of the ODD and the schema, the way that ecology works in the TEI. Is the TEI basically like a gigantic lintball here, or is it a naturally coherent system of information? Then ultimately, I’d also like to consider whether this kind of complex, layered intellectual modeling might also suggest methods that we can use in other digital humanities contexts, or whether it really has as its domain the special problems of text markup as we now do it. Thank you very much.


[Michael Sperberg-McQueen] Just a historical side note, in some ways the ODD system is, as far as I know, unique or unusual but it’s not unprecedented. It does have historical precedent in systems maintenance practice. A great deal of it is based silently on lessons I learned from the systems administrator of the first place I worked. Large parts of it are modeled very explicitly, as they had in mind, on the methods used by Don Knuth to separate global changes to tech and its related systems from changes he made. So the idea of “defining a complex system by defining a set of changes on a base system which you do not touch,” as beautiful as it is, isn’t one for which the TEI dare take credit. But i guess we get credit for having the good taste to follow Knuth’s example.

[Flanders] I certainly didn’t want to give the TEI credit for that idea, but instead wanted to treat it as a kind of interesting case study, maybe.

[Syd Bauman] But hammering at that case study a little bit, you’re expressing the idea that we, Paul [Caton] and I (for example) can talk about our different modeling of similar or the same texts by talking about our ODD files? Am I channeling that correctly?

[Flanders] Well, “talking about” suggests a purely human and probably inefficient process, sort of like talking about your subst elements instead of just saying something like “Here’s where Shelly deleted something.” But I would say processing our ODD files jointly, in other words, having some interesting computational comparison.

[Syd Bauman] I was wondering if there’s any advantage to thinking about that computation on the ODD file as opposed to thinking about the comparison on that schema that one generates from the ODD file. Thinking at first blush that all that stuff in the big white circle, that’s the TEI, that neither one of us has changed, will fall out because it’s the same. What is it the TEI has gained us here by creating the unidirectional lot?


[Flanders] I’m imagining that there’s a greater level of access to the human intellect-side of things. I’m going to try to come up with a good example here, but imagine that from version to version the TEI changes the behavior of a certain element so that I am no longer served by it. And my new schema, and my new ODD customization, will take that change into account by reversing it so that my new schema is identical to my old schema. And yet the ODD is very different because the ODD now expresses disagreement with the decisions TEI made between those two versions. That may be an impoverished example, but it gives a sense of what the ODD is giving us access to that the schema doesn’t necessarily give us access to.

[Wendell Piez] To that specific question, I think the answer to Syd [Bauman] is pretty much along the lines of that ODD was designed with the intention, with the premise, that it would be more useful to directly users than the schema that could be generated out of it. In other words, that layering is then—and you know this very well, right? —that layering offers facility to the system as a whole, for documentation or maintenance. That’s just the design premise of the ODD. We can question whether it succeeds in that, and have a reasonable discussion about whether ODD actually addresses these requirements, but that’s a separate conversation.

[Syd Bauman] I wasn’t doubting the utility of ODD in itself for insisting maintenance or from differing from TEI. But I’m wondering about when we want to compare the differences in our systems, is it better to compare our ODD files or better just to compare our schemas?

[Wendell Piez] Is that a practical question about what sort of information we would get from doing one versus the other? Are those exercises not completely isomorphic? And to the extent that they’re not… [responds “no”]… I know they’re not, and I would suggest that in fact to the extent that the ODD documents as well as parameterizes choices, the ODD is going to be the place to go, isn’t it?

[Syd Bauman] I’m not sure. If I’m comparing them, if I’m comparing the ODD, I also have to compare the pros of being right. And that’s…

[Wendell Piez] You think that’s bad?

[Syd Bauman]  Trying to do that by a machine might not be such a clever thing to do.

[Wendell Piez] Oh, did we assume the machine was going to be unaided?

[Syd Bauman] Julia [Flanders] was saying that a machine helped her, yeah.

[Flanders] I was imagining that for the kind of research that people would do on people’s customizations, certainly I’m imagining beautiful visualizations and things like that — to show sectors of the TEI communicating.

[Wendell Piez] But at the same time, you’re going to be able to click and there will be prose.

[Flanders] Sure, and in fact, one could imagine going beyond one better and providing controlled vocabularies for documenting your decisions. So, did you open this cell because you hate it? Because you think it’s stupid?

[Wendell Piez] Exactly, rationale.

[Flanders] Right, exactly. The prose accompanies the rationale and it would be nice if the rationale…

[Wendell Piez] Yeah, and whether, while designing and building that system. . . how you would operationalize that as a sort of phrase that would be ODDs or would it be a phrase for both of them together? This is a development question, right?

[Syd Bauman] I think that’s true. I think it’s pragmatic almost as much as it is intellectual.

[Wendell Piez] Moving onward to a more general question, Julia [Flanders] did ask us to also think about this not just in the TEI context, and I think that’s really important to do. This is not just about the TEI programs. In many ways, due to the complexity of the TEI and the nature of the goals the TEI seeks to address, it’s led over time to the formalization of these processes — and that doesn’t mean that these processes are special or unique to the TEI at all. With respect to that, I would go back to the two wordflows that I shared yesterday, which are directly an outgrowth of that same understanding of the world of schema within historical learning-based systems versus its applications in the humanities because, of course, when I point to a more complex workflow and I say “This is what we want to be able to do,” I don’t really mean to be saying something like “This isn’t what we’re already doing.” Because I think that it’s actually really important to understand that all along we have not simply looked at the schema of being some sort of receivable that we simply plug into our schema, but rather the schema itself is a site of development and scrutiny.

In that context, the most important thing to keep in mind about the difference between these two workflows is that the second, more complex workflow has internal groups in it, which, going along with this idea of the process as being the product, you can no longer distinguish so much between the product and process being the single moment in the system. There’s actually something else more complicated going on. In particular, in terms of the way in which systems’ management and maintenance works, that internal looping is what allows the more complex system to be responsive to pressures from outside the environment, right? So that when we discover we want to do something new, the system is already built to support that destabilization. Rather than having to go in and reengineer everything, the system is already built for support. It’s like we’re maintaining the airplane on the runway. We’re not having to go back to the design shop and start the blueprints from the start. I think that that kind of schematic view of it can be generalized, because all with all humanities-oriented projects that I’ve seen, you necessarily have to take that sort of self-conscious view because we’re discovering what we’re doing while doing it. And also because we consider that the goal of our process is in the discovery to process, which is what distinguishes us from running a publishing system that publishes “x” number of issues of the journal every year and puts out 5000 copies of it. It expects to maintain itself largely without change over time — I mean, obviously it’s not a complete dichotomy but there’s a big difference in goals there. Of course, all humanities projects have to have that kind of introspective self-critical quality: and we document when we do, and we publish about it, and we go to conferences and give papers, and ask people to criticize our methods and give us ideas, and that’s just part of what we do. And I don’t really see any fundamental difference between the way in which TEI projects are doing that. It’s just that I think the TEI projects have much more infrastructure and support.

[Flanders] Right, and I think that was what struck me about it, was the “Why in the TEI do we have this incredibly highly formalized system for documenting something which, in the scholarly world generally, and in digital humanities projects generally, is of great interest and is documented in all sorts of ways but they’re just not formalized. In other words, has the TEI evolved this whole system purely for practical purposes and and I’m just romanticizing it as a system that does these lovely things, or is this something that’s lacking in other places that we ought to be formalizing?

[Wendell Piez] I think that at least part of the answer to that is that the TEI people want to formalize. They like formalizing.


Maybe that’s the answer. They just like formalizing.

1 thought on “Julia Flanders, “Modeling Scholarship”

  1. Pingback: Knowledge Organization and Data Modeling in the Humanities: An ongoing conversation | datasymposium

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s