Theoretical Perspectives I (March 14):
Allen Renear, “Taking Modeling Seriously” (video)
[Allen Renear] So I just noticed I have some leftover animations in the slide set. I’ve never actually put it in slide mode before. So you may see me moving through them very fast. Something quick so I don’t forget: in fact, I’m tempted to just respond to the questions and continue to interrogate Paul [Caton] but I don’t think I’ll remember to say this. The issue of blank nodes in RDF is sort of interesting. They aren’t necessarily as unconnected with plausible ontological entities as one might think. That if you think of them as concrete events, then they correspond to an event that has been used to accommodate adverbial modification and predication: Davidsonian events. So this does not necessarily apply to all blank nodes. In particular I think some EDM blank nodes don’t have natural ontological entities. They suggest entia suspecta actually. So anyway I think there are two kinds of blank nodes.
Okay, let’s see: in Paul’s [Caton] vein of candid honesty and openness about the setting for his talk, I think most of us often when we look forward to a talk in six or eight months we imagine we might have that manifesto prepared. We might actually be able to lay out the entire view. At least I think that sometimes so I come up and I submit a title and an abstract for that manifesto. [The] talk rolls around and I don’t have that manifesto. I have a grand title and a very promising abstract but modest results. Fortunately this time I was so late that I smartly chose a title that would do for something less grand. Even though taking modelling seriously could look like it’s going to be a serious presentation of an approach to modelling, it’s really as much about how we look at anybody’s conceptual modelling as it is how we do modelling. And I’ll admit that I am a fairly, if I say this sort of ironically or self deprecatingly, maybe I’ll say that I am a serious modeller. I believe that I am interested in modelling that attempts to provide an account, a theory, not just a guide to developing a system. When I get a chance to go about that I look like of literal minded and naïve, I suspect. I feel like I’ve been talked about in the last few minutes of the discussion here. But I’m also just trying to have fun to tell you the truth and this is how I have fun. Like Paul [Caton], you’re not going to change me. I can’t be deprogrammed. So keep your, to yourself. Thomas Szasz has some interesting things to say about people who don’t like the fact that other people aren’t like them and they want to change them to be like them. As I said in my abstract this is not everyone’s taste. Fortunately what’s not everyone’s taste I’m not going to be doing so much of just now if I can help it. So serious modelling versus taking modelling seriously, I think that was what I was just talking about. I’m going to take up four, I think I’ll get through four specific little discussions: One of inheritance in FRBR; one on identity and change in the Planets data model; one on Oratio obliqua in the Europeana Data Model and types and roles in FRBR; and then I’ll try and to say something about what is distinctive in humanities data modelling.
And I know some of you have seen these before but I’ll try and make them hang together in a more interesting way. Most of you are probably familiar with FRBR and I’ll assume you all are but I’m not going to assume you all are. I’m going to quickly rehearse the function requirements for bibliographic records. It’s a conceptual model. They use that phrase and that always perks my ears up. Specifically its an entity relationship model of works, texts, editions, authors, subjects, etc. And it is created primarily to help design systems for managing bibliographic records. That is the developers of FRBR did not see themselves as performing first-order science and delivering a theory of bibliographic things. That doesn’t stop me from looking at it that way and that’s what I mean by taking modelling seriously: taking other people’s modelling seriously because it’s fun. It’s very influential as I’m sure some of you know; bibliographic databases are being FRBRized, software systems also being FRBRized, the new edition of the Anglo-American cataloguing rules, RDA, describe FRBR as it’s foundation and it’s increasingly being applied in the sciences now as an approach to understanding and managing scientific data. Most of you are familiar with the group-one entity types; work, expression, manifestation and item. Work as a distinct intellectual or artistic creation. Expression, the intellectual or artistic realization of a work in the form of alphanumeric musical or choreographic notation, etc. Manifestation, the physical embodiment of an expression of a work, an Item, a single exemplar of a manifestation. So the colloquial, the nice colloquial terms, if you take a book-like view of these things is work, text, edition and copy. Notice that nice interlocking cascade of definitions: an expression realizes a work; a manifestation embodies an expression; an item exemplifies a manifestation. There’s the picture. That’s the ER diagram. One to many relationship between work and expression. Manifestation, Item. Many to many between expression and manifestation. Do you know why? Anthologies.
So in FRBR each entity type has a distinctive set of attributes. Works for instance have attributes like subject or genre. Expressions of attributes like language. Manifestations of attributes like typeface. Items have attributes like condition or location. These are disjoint sets of attributes. Disjoint sets of attributes. So a work may have a subject but it does not have a language, typeface, or condition. That probably seems intuitive. An expression may have a language – I think it will have a language – but it does not have a subject. The work it realizes has a subject, the expression doesn’t have a subject. And of course expressions don’t have typeface or condition. A manifestation, edition, might have a typeface but it does not have a subject or a language. The expression it embodies has a language and the work which is realized by the expression it embodies has a subject. And similarly for item, items can have conditions and locations but they don’t have subject, language, or typeface attributes. Work expression, manifestation, these seem to be abstract things. We don’t actually find them located in space and time. They don’t seem to enter into causal relationships. Items however do seem to exist in space and time as material things. So ontologically you might say that FRBR is elimitvatist approach to the book. Our ordinary book might plausibly be said to be about painting in Florence and Siena in French typeset in Neo-Bauhaus and mustard-stained. A book might have those characteristics but in fact there is no entity in FRBR that has those characteristics.
So I’m fond of FRBR, I’ve talked about it on a number of occasions before. I’m fond of it because it’s an opportunity for me to think in what feels to me interesting ways about interesting things. And I find that I’m less and less inclined to defend any particular point of view and I really am more and more inclined to keep myself intellectually engaged. And regardless of exactly what ontology one might want to accept, or if there’s an alternative to accepting ontologies, I think it’s interesting to take a close look at somebody’s effort to identify the entities and relationships that are important to them as practitioners in a particular domain because that’s what we’re looking at here. The FRBR committee was a committee of primarily cataloguers. So an interesting thing about FRBR is it’s often said to exhibit inheritance. So for instance, I’m not going to read all of these but just a couple, the FRBR model implies information at the highest level is inherited by bibliographic entities at lower levels of the hierarchy. Characteristics of the work belong to all expressions, all manifestation and all items. Attributes are inherited by all lower levels of the hierarchy and so on. These are all well known, really smart people in the bibliographic community. The last one, Barbara Tillett, is actually the lead in the FRBR, in the development of FRBR. There’s something weird about this, though. Where’s the inheritance? The attributes in the documentation are explicitly disjoint sets. In fact the whole idea behind this kind of strategy is to segregate attributes according to entity and not apply them democratically across all the entities even though that’s how we often talk. There’s certainly nothing about this diagram that suggests inheritance, is there? There are rectangles, it’s true, and there are arrows, and it seems uncharitable to suggest that these really smart people looked at this diagram and couldn’t recognize it as an ER diagram but instead saw it as some sort of taxonomic class diagram, which it isn’t. It is uncharitable that I wouldn’t put it that way but it’s still kind of curious. There’s nothing in the FRBR documentation that suggests inheritance, and there’s nothing in this diagram that suggests inheritance. General inheritance of attributes is inconsistent with the entire strategy that’s being deployed here. There’s no inheritance in FRBR despite what the authors of FRBR say. They actually don’t really, I can’t resist putting it this way, they don’t understand the model that they created. I think that’s actually a counterexample to other things I believe, that this is even possible. But it’s as if they don’t understand the model they created. So when I say inheritance I mean what might be called classical subsumptive inheritance based upon the set-subset relationship. There are other kinds of inheritance that might be plausibly applied here: divisible inheritance, propagation, but there’s no classical inheritance. And it’s easy to see that FRBR itself describes works as abstract and items as concrete. If there is general subsumptive inheritance down the so called FRBR hierarchy then FRBR items are also abstract and concrete. But that’s ridiculous, right?
So here’s an alternative way of looking at bibliographic entities. This is another conceptual model. This is from the third chapter, chapter called “bibliographic entities” of a great book by Elaine Svenonius called the Intellectual Foundations of Information Organization. So we start with individual, concrete documents: concrete material things. Not abstract things. Some of those documents we might say are narratively similar. Let’s call W1 a set, let W1 stand for a set of narratively similar documents. Some of those narratively similar documents in W1 might be textually similar, so we can partition W1 into sets of textually similar documents. W1-E1 is one set of textually-similar documents, W1-E2 is another set of textually-similar documents, W1-E3 another set. I just prefixed the names of these textually similar sets with W1 to show that they’re subsets of W1. So the idea here is that narratively-similar documents are physical documents that carry the same story, have the same story, have the same work. But we want to avoid having to give an account of works; we want to take a more material, nominalist approach to our world. So instead we stick with individual concrete physical documents and notice that some of them are similar in a particular way and that lets us form sets of similar documents. So textually similar documents can be similarly grouped into groups according to whether or not they are in some sense, you have to use a little bit of charity here to make this work, materially similar. So in this way you can kind of accomplish the same sort of modelling objectives at least it looks like you may be able to accomplish the same sort of modelling objectives that you could accomplish with FRBR. We have the notion of work only it’s not a work rather it’s a set of physical documents that tell the same story. Within a set of similar documents that tell the same story we have various translations or textual variance and those are grouped into sets. Within one of those sets we have another partition according to material similarity. Here you have inheritance because this is a class hierarchy and any property that applies to everything in a work set will apply to all of its subsets and their subsets and their subsets. So something like this is going on.
So this is an example, I’m tempted to say of me having fun. I’m taking a deflationary approach to this to try to avoid Stephen [Ramsay]’s arrows. This is me having fun by looking seriously at other people’s modelling. And seriously and appreciatively, not just because I wouldn’t have anything to do if somebody didn’t do some models and make little infelicitous errors, but because in fact I find these models stimulating and illuminating. I find that they improve my understanding although often in ways I can’t quite articulate. So there’s one example of taking modelling, ordinary conceptual modelling, seriously, at least having fun or at least me having fun, that’s a personal thing, I suppose. Everybody has their own taste in these things. And perhaps also getting some better understanding of what our options are for looking at the world.
Here’s another example. Whenever I see a box and an arrow I sit right up and look closely. This is from a very widely, well-known, influential conceptual model for preservation PLANETS. And there’s a piece of the conceptual mode, this is a UML diagram where digital, the class of digital files is represented with a UML arrow indicating subclass of byte streams. So in this account a digital file is a byte stream. In the list of attributes for digital files, we see last modification date. And I know this is sometimes a little irritating meditation of mine but I look at this and say, “Really?” A file is a byte stream and it has a modification date. So a particular file could have been modified – we all talk that way and maybe that needs to be represented – but if a file is a byte stream or a byte stream is just a particular sequence of bytes, how can it be modified? What’s the “it” that gets modified? I can understand turning my attention from one sequence of bytes to another but what is it that once was one sequence of bytes and now is another sequence of bytes. If we represent that in knowledge representation languages like RDF and OWL in order to bring this into the world of linked data and the semantic web, we will have paradoxes. They are inevitable. So if there’s a possible, practical reason for facing some of these amusing little puzzles and paradoxes, I’d say that’s it. As time goes by we try more and more to be as formal as we possibly can and do as much lights out inferencing and processing as possible. We do that through logic-based languages like RDF. We’re going to have puzzles and they’re going to be traditional puzzles, familiar puzzles. And they’re ones that we can avoid with human intervention but they’re harder to avoid if no humans are around to help. Europeana Data Model, we’re going to hear more about that, right? I love it. I love all of these models. I don’t, I know every time I talk about problems with models people go, “You don’t like such and such.” I love them all. And I didn’t mean that like it sounds. I’m looking forward to working with Europeana Data Model people on a number of projects.
But there is a number of interesting – to me – little puzzles in the EDM. I’ll mention just one. So we often make assertion about thing in the world. We say that a particular thing has a particular value for a particular attribute. Sometimes we want to say that somebody else made an assertion about a thing. We want to say that Syd [Bauman], for instance, claimed that and something that Syd [Bauman] is fond of claiming. But we want to hold it at arms length. We don’t want to assert it ourselves. We just report that that’s what Syd [Bauman] said. Obviously in the humanities data modelling things like this are very important because we are looking at historical data. For instance, what did people say, what did they believe, what did they doubt? These are typically grammatically complex sentences with a main verb and a subordinate clause. They don’t break up into compounds of simple truth functional connectives. So how do we do that? And one problem with RDF is that if we simply add as triples somebody’s assertion to our collection; we end up asserting it. I don’t want to, I’m not backing Syd up on his crazy ideas. I just want to report that he has them. I want to report what Syd claims about the Mona Lisa. I want to add that assertion, the assertion that Syd claims that the Mona Lisa is the best painting ever, whatever. I want add that secondary assertion to my collection of triples. Because I do believe that; I have discovered something about what Syd believes but I haven’t discovered much about the Mona Lisa. So this is a very common need in humanities data management to represent that a particular assertion was made without accidentally committing yourself to the assertion itself. It’s not easy to do in RDF. It’s not easy to do period. We’ve been puzzling over this for a hundred years at least.
But one approach to the problem which certainly in a sense works is to define a proxy for the thing you’re talking about and make the assertion vis-à-vis that proxy, that thing. And so now I can collect all the things that Syd has said about the Mona Lisa and I can have Syd saying it about a proxy for the Mona Lisa and I represent that proxy as a proxy for the Mona Lisa. But I’m not backing Syd up on his claims about the Mona Lisa. And the proxy node that you can see in the middle [of the slide] here is an example of this. You notice that it’s connected to the Mona Lisa by the ORE proxy 4 relationship. And the claim that a particular person, namely Leonardo [da Vinci], created the Mona Lisa, as made by as in my story Syd, is indirectly represented here by drawing the DC terms creator arrow from the proxy to Leonardo. This works in a sense just fine. Any Perl programmer can manage data that’s organized this way. It’s very simple. It’s very, as far as I know, it’s computationally manageable as well. But it’s weird because it makes it look like Leonardo created a proxy. And DC terms’ creators used in a straightforward way elsewhere in the Linked Open Data world and elsewhere in the bibliographic world in general. It’s probably used in a straightforward way in EDM data. So it’s not like there’s a special sense of DC terms creator. It’s rather that there’s a particular usage of it, which is somewhat of an idiom. I think that’s pretty interesting. I’ll also say but I won’t talk about the fact that the medium, see the medium arrow there (gestures at slide), for the Mona Lisa is SKOS concept wood. I’m not sure that that works either. My understanding of SKOS would have the Mona Lisa, if this is true, made out of the concept of wood and not wood.
So my last example, I think. And I will say, so Paul [Caton], I don’t know if anyone has ever said that there’s no text in FRBR 00 but I really struggled with this and I told Martin (?) and others that I thought, “We have a concept of text. You don’t.” There’s something wrong with this. And I found it very hard personally to, it’s my biggest problem with FRBR 00 is that text is really gone from it. You just adapt to a different picture of the world, And it’s a picture where apparently we can say all the things we want to say. I’m not sure, but that’s the idea—but text isn’t there. So refactoring FRBR, Paul [Caton] mentioned the difference between types and roles as outlined by [Nicola] Guarino and [Chris] Welty in their famous 2001 paper [“Support for Ontological Analysis of Taxonomic Relationships”]. But even before Gaurino and Welty were cleansing ontologies with that distinction, relational database theorists and conceptual modellers were puzzling over when you should use a relationship and when you should use an entity type rectangle. When you should use a relationship arrow or triangle and when you should use an entity type rectangle. If we look at FRBR right here – and I don’t actually have a clock, so if I run late you need to tell me – we might wonder whether or not some of these rectangles are really nominalised rolls that would better represented by relationships rather than kinds of entities. So for instance I’m going to gloss the notion of type and roll since Paul [Caton] didn’t. He assumed we all knew. So the distinction, and it is one that grows, that’s connected to the traditional data modelling distinction of entity, type, and relationship but it’s much more ontology oriented. So from Gaurino and Welty’s perspective there are things in the world like persons and those persons in particular circumstances enter into roles, particular social circumstances, contingent social circumstances, enter into roles, and the types of things that there are can be identified according to Gaurino and Welty, by noticing that , to use the example of a person, a person cannot seize to be a person nor could a person have been anything—without ceasing to be—nor could a person have been anything other than a person. So a person is not something else in some other possible world. Students on the other hand can cease to be students very easily and they frequently do cease to be students and person who are students in this possible world might in other possible worlds have never been students. So using that test and some other similar ones we can distinguish kinds of things from the roles that the kinds of things enter into. And if we restrict our rectangles to things that are kinds of things then expression in particular, we’ll start there and end there probably, looks like it might be more of a role than a kind of thing. That is meaningful text is meaningful text in virtue of the contingent-social context in which those symbols exist. So we might say that an expression is a symbol structure in a particular social context in virtue of which it has the meaning that it has. So if we take that approach we can replace not only expression, it’s a longer argument, but manifestation as well with the notion of symbol structure as a kind of thing and argue that symbol structures do what they do in the case of expressions realize works only in virtue of social context, convention, intentionality, human epistemic agency. It’s not in nature, it’s in us and it takes a village. If so, we now have symbol structures here and the roles remain the same. A symbol structure is a text if it realizes a story. A symbol structure is an edition if it embodies a symbol structure that realizes a story.
Well, generally in ER diagrams we don’t use two rectangles for the same set, so we collapse these rectangles into one symbol structure. This is the refactored FRBR that we’ve come up with at University of Illinois at GSLIS that we’re using in an NSF project to characterize how scientific datasets do what they do because it turns out that data in the scientific sense is just as elusive a thing as it is for us and in general we’ve found that most of the nouns that we’re used to using end up not having anything to refer to by the time we’re done with our analysis. So fewer and fewer things, more and more roles. And I can talk more about this some other time.
So, this is the last slide. I thought I’d try my hand answering the question, “What is distinctive about humanities data modelling?” And I think it’s intentionality. It’s the fact that we’re involved– people. When we’re modelling cultural data, humanities data, we’re modelling entities and relationships that exist only in virtue of intentionality laden social activity. The things we’re modelling I’m tempted to say wouldn’t exist if we weren’t in the picture. They exist only because of social action. Social, intentional, cultural action. The method – I like using the word hermeneutic whenever I can, which is rarely – I’m going to say our method is hermeneutic analysis carried out by agents capable of creating the sorts of things they’re studying. The reason we can do this work is because we’re the kind of thing that can create these relationships. That’s what makes doing this work possible for us. We know about these relationships from the inside. And the challenge is how do we adapt modelling methods, most of which are based upon standard first-order logic. I know many of them here are based on grammars, but conceptual modelling in general is based on first-order logic and first-order logic is notoriously impotent at representing intentionality and that’s our problem. And that’s it.
[Gregor Middell] I have a question regarding the byte stream, the […] PLANETS object modification. You said—
[Allen Renear] I left that up too long.
[Gregor Middell] I have a questions regarding the intentionality part. For me as a programmer, I see such object-oriented models, it’s completely logical to think of a digital file as byte streams, because I can handle it as a byte stream. For me, as programmer, I cannot see […unclear…] it would make sense to inherit that kind of [information?] straight from the byte stream. Whereas obviously it would make more sense to modify… It could just be that the dynamic nature of the […unclear…]
[Allen Renear] Yeah, so let me say more generally that I think that—
[Audience] Can you repeat the question before you answer?
[Allen Renear] Right. So the question is about the example from the PLANETS Preservation data model where the UML showed files as a subset of byte stream. And I said there’s something problematic about this because a byte stream cannot lose or gain a byte. And yet these files have modification date attributes. You’ve heard this story before, most of you, I think. In fact, I think that we do not have a robust account of identity and change in the digital world. And what the problem is, maybe I won’t say, it could be that we have idioms and they need to be made more precise. It could be with [Ludwig] Wittgenstein our idioms are just fine and we need to give up trying to make them more precise. Or we need to do something in a more ad hoc, opportunistic way. But in general with documents being defined as graphs, files defined as byte streams, all of these familiar tables described as relations (how do you add a record to a table?)— for me, the reduction of familiar digital objects to their discrete mathematical constructs is problematic because it doesn’t seem to allow us to give an account of identity and change. And I would say that in this particular case it’s not too hard to formulate a paradox where you say about a file that a particular file once was identical with this byte stream and now it is identical with another byte stream. And that kind of claim is very difficult to manage in logic and ontology because it seems like byte streams have their bytes essentially. That’s why we call them byte streams and this is a PLANETS preservation model; Fixity is a key feature of this model and fixity is the bytes in a particular order.
[Paul Caton] In that refactored FRBR why do you use stories, that set of concepts, which are apparently what the original FRBR people meant by work? At least that what it says in the FRBR documentation.
[Allen Renear] Can you repeat the question?
[Paul Caton] Allen had changed what was “work” as the top level FRBR entity—in his refactoring as nominalized roles he changed “work” to “story,” and I was just interested whether that was the term he really wanted there.
[Allen Renear] No the term I really wanted there was “stories”, plural, so I made that slide today. So that should have been stories, plural. Namely—it’s that rectangle represents the set of all stories just like in the original FRBR diagram the rectangle represented the set of all works. I know we talk about the work entity but that is an unfortunate recent terminology that’s been applied to conceptual modelling. The original term was “entity type” and that rectangle is supposed to be a set. So my example refactored the top rectangle stories. The set of all stories. I mean stories the colloquial term. It’s intended to be a colloquial term, “stories”.
[Wendell Piez] Allen, thank you for that talk; it was fabulous. It was compelling.
[Allen Renear] Fabulous?
[Wendell Piez] It was fabulous. It was provocative and also very well presented. I want to get back to Syd’s [Bauman] crazy ideas about the Mona Lisa because I thought that your presentation of that dilemma was actually really interesting because I think it actually encompasses a miniature, something we’ve talked about all day, which I think is really, really important comes to your last point which is that the naïve response to this would be, “Well why are we inventing a proxy representing the Mona Lisa when our assertion is essentially about Syd [Bauman]?” Right? And if you follow that down it’s like, well, okay, so we represent the Mona Lisa with a proxy so that we can get out of this problem of merging Syd’s assertions and ours. And then we follow those down so that we introduce the proxy for Leonardo because Syd is also saying something about Leonardo. And then by the time we are done we have a whole map of the universe that represents Syd’s world, Syd’s own map, which corresponds to our map in some particulars, but in other very important particulars it veers off in all kinds of bizarre directions, right? And so we actually have a problem where we’re trying to represent as many possible roles at once at that point which are only articulated together at the points where they happen to correspond or where we can make assertions about the relationships like, “The Syd that believes”, in this other map happens to be an entity in this map, right? And which is I think comes directly to your final point which is about that what we’re modelling here isn’t just items in the real world. It’s crazy ideas that people have about items that may or may not be in the real world. And that sort of explodes into this web, into this epistemological minefield where we can’t establish with any given authority at any given moment which of the possible worlds we are looking at is the world that we are actually believing in because we turn the corner and there’s a different way in a different direction. And you know that’s what humanities data presents us with, just by its very nature.
[Allen Renear] Um, sounds, yup. (Laughs) I’ll just say that I don’t think we have a good robust way of representing indirect speech or the propositional correlate of indirect speech, period. It’s not a problem, not a recent problem, not a problem with modelling, it’s that we don’t understand from a logical point of view how to represent propositional attitudes in a way that makes computationally available the things that feel like they’re computationally available.
[Maximilian Schich] Actually the […] has very different ideas about that so they say it’s impossible. You basically have a triple on a triple. Basically you say Mona Lisa is attributed to Leonardo and the second triple which says, “Martin Kemp says” Mona Lisa’s attributes.
[46:50] [Allen Renear] Right—
[46:51] [Maximilian Schich] You can do that for – so if you think about all of the assertions in your database which are about some object, for anyone you could have an infinite amount of triples basically asserting the assertion. So that means—
[Maximilian Schich] No it’s funny… If you look at twitter, you have certain realities, right? There is this kind of thing where people say, “This is like that, this is like that, this is like that, today the weather is fine.” So looking on the order of number of people in the world assertions are highly diverse.
[Allen Renear] Right, so—
[Maximilian Schich] And some people agree because they’re in the same location. But that’s something we actually, I think this kind of proxy thing is somehow saying we are the guys who know that the Mona Lisa is by Leonardo which only is true for the Mona Lisa and probably for ninety percent of all the objects in your [geography?] Europeana is actually not sure if it is by […]. And then there is this one opinion. In fact what they say themselves is an opinion, too, which is a commonsense […]
[Allen Renear] So I think I know what Stefan [Gradmann] is going to say-
[Stefan Gradmann] I first would like to thank you, because this was fun. But coming back to this Mona Lisa issue, it’s extremely complex and we’ve talked about it for ages now. First of all, Mona Lisa or La Gioconde is a very nice example of ideological use of language because the difference between the two is enormous in Italian and we use La Gioconde […]. Now the reality we have to face in Europeana is that we don’t make statements, we give statements from people that know, critics who know and others who tend to know. And we have to organize concurrent statements that do not agree on a given thing. And actually the mechanism that we’re waiting for, now for years, which is being discussed at W3C, is this of named graphs, which then would enable you to model indirect speech a bit more delicately by extending the triple model with a fourth element that would allow for provenance and versioning, these kinds of things. Now the trouble with […] they’re not standardized. And the ORE proxy construct is something we’d like to get rid of, we borrowed that from the ORE model and it is a somewhat [abusive?] way of using that construct. But it was the only way we could tell the world “here is a set of propositions about a given object that we have received,” and that we’re […] approaching as incomplete we make part of one aggregation. That aggregation was created by Europeana. Now in theory you’re quoting, this is the aggregation, and that makes the picture incomplete. The one thing we at Europeana do is create the aggregation we are the creator of. And then each aggregation has several proxies…[it integrates them, as] statements about an object we are not through with yet, we are just interpreting. But I totally agree with you that this is suboptimum and I’m wondering still whether the main task will be really solved. Here’s my question for you.
[Allen Renear] Yeah, so to be fair, at this point there’s a nice little paragraph in – I don’t know which document – that in effect says “I know what you’re going to say. What I just said. But really, we thought about the alternatives at the time and this was the best thing.” But I ignore paragraphs like that because they interfere with my enjoyment of this modelling. [laughter] But more broadly the W3C has a kind of to my mind an amusing history of trying to deal with oratio obliqua. If you look back at the beginning days of reification, for instance, and there’s especially the email list where they discuss realizing what the consequences would be for RDF semantics for reification, which was designed to do this kind of thing. The problem is not a technical one and this is what actually irritates me a bit about even the phrase and the graphs. Named graphs is sort of an encoding of a representation and the representation has a semantics. Many different ways we can encode that representation. There’s nothing special about graphs. But the fact is we don’t know, I don’t think we know, how to do the semantics for what it is that we’re representing. So it’s not a question of having a data structure that we can manage, that we can work with to program against. Although maybe this is a deep issue because I guess what I’m saying is for me, it’s not an issue of having a data structure. It’s an issue of understanding what the semantics is of the language the data structure encodes. And W3C, RDF community has been working, struggling on this for a while. Key words would be Superman, Lois Lane, by the way. Just like key words for adverbial modification is “Sebastian strolled slowly [in Bologna?]”.
[Jan Christoph Meister] Allen, I wonder whether, what, if you could go back to the previous slide, because that was your final suggestion, answer of intentionality as a distinctive criterion which we cannot really capture as long as we apply first-order logic. So it is, basically if I understand you correctly, your suggestion, rather to think in terms of, let’s say, constructivist versus an essentialist approach towards phenomenon. We create the phenomenon because we construct the model of it, and it actually doesn’t exist per se outside the model. Now if I understood that correctly, is intentionality not actually too strong as a distinctive criterion? Because the test case which exemplified it to me, the best way where things begin to fall apart, is this issue of identity over time. This question of “when is it still it?” Why does this problem occur? Because we are taking time into consideration all of a sudden, which we haven’t done so far. So probably it’s a question of identity rather than intentionality because intentionality is a laden concept for many. It’s very hard to ascribe intentionality to somebody who acts non-consciously. Does that agent have intentionality? But any of us here has a sense or wants to have a sense of identity, so my question is whether identity is something that has to do with this change over time conundrum that is the key criterion rather than just intentionality.
[Allen Renear] Okay, this slide is the kind of slide that one ends on and then everybody goes away. I meant to move this one up here right away. But I’m using intentionality really just to indicate, in a gross way, human intellectual, cultural activity and it includes subconscious activity: the way we understand language, the way we use language, the way we interact with each other, the conventions, the practices, just sort of gesturing towards that. Now there are ways to make aspects of intentionality precise. In modal logic you can say intentionality consists of these features. If the prefix has these features it’s an intentional prefix. But I’m using it to just gesture towards those things. I think of it as the distinguishing thing in humanities information and data modelling. And the identity issues, I think they’re hard, I don’t know which ones are tied up with intentionality and which ones would be there anyway. But I do think, I mean it may sound trivial to even say this, that what’s distinctive about humanities data modelling is humans, or rather agents, that are capable of thought and action in the ways that humans are capable of thought, action and feeling because that’s how we come to have culture, language, and action, in a human sense of action.
[Fotis Jannidis] It is misleading you say that, you talk about intentionality but hidden below this you have social activity and things like […] which we are interested in, in intentional activity as the outcome of cultural processes which are normally not based on intentional activity but have something indescribable: an invisible-hand model. You have the third force—in Germany what happens is that many people want something, nobody really chooses and therefore we solve, which is something like language, cultural practices, and so on. And that’s very interesting in modelling intentionality.
[Allen Renear] Well maybe this is the problem. So I’m using intentionality not in the sense of intention, not in the sense of doing something intentionally. I’m using it in the sense that goes back through [Edmund] Husserl and [Franz] Brentano to medieval philosophy. It has to do with the mind but not necessarily with intentional action. It doesn’t mean conscious, it doesn’t mean occurrent, thoughtful; it doesn’t mean those. It’s not like in law where you’d wonder whether or not it was intentional or whether it was instinctive. So it comes from the medieval terminology for a taxonomy of mental things: the things involved in thought.
[Thomas Stäcker] As a librarian, it just comes back to FRBR. Up to this day I was quite happy with FRBR [laughter], now it seems spoiled for me in terms of the fun. I think it was very elucidating as somehow unmasking the data modeling of librarians. On the other hand it made perfect sense to me. I think this notion of inheritance is difficult. You’re right. On the other hand for instance a new […] would have made perfect sense of that when the ghost from the work cumulates into the hypothesis and creates an item of something. So there is an idea behind that that makes sense, but not in terms of data modelling, so I think the mixing levels has taken place here. So it was in a way puzzling for me to see that the model works. It works perfectly well and it meets a lot of problems that I’ve seen like ascribed works for instance and put it in relation with some manifestations, what means imprints, to put it that way. It was puzzling in that it works and it’s wrong, to some extent. So what does this mean for data modelling?
[Allen Renear] Yeah, that’s a good question. The side of that that I would be most interested in sort of defending, because other people can defend the other side, is it needs to be refined. I mean, there are a couple of different attitudes towards “It’s wrong but it works.” One attitude is: nothing is ever right. It either works or it doesn’t, so forget about it—half of your concern is misplaced. Either it works or it doesn’t. Forget about it being right or wrong. Another approach is, well, a lot of it’s right and some of it’s not quite right. And what I’ve done here and elsewhere is more or less along that latter line. We certainly know that we find it here.
[Fotis Jannidis] Thanks a lot for this lively discussion which I have to cut off at this point. We have twenty minutes and then we start again. Thanks.