Kari Kraus, “Preserving Virtual Worlds”

Case Studies–Critical Editions (March 14):

Kari Kraus, “Preserving Virtual Worlds” (slidesvideo)


[Kari Kraus] Okay so I can’t claim credit for this great slide. My colleague and collaborator Jerry McDonough created it. But this is a case study. I’m going to be talking about the Preserving Virtual Worlds Project. PVW1, as we call it, was funded from 2008 to 2010 by the Library of Congress. And I’ll touch just a little bit on our sister project, the follow-up project, PVW2, which is currently underway.


Jerry McDonough is our fearless leader at the University of Illinois, Urbana-Champaign but this is a multi-institutional project. Our project partners are University of Maryland, Rochester Institute of Technology, and Stanford University, in addition to Illinois.


For this project we adopted a case set approach. It was a very exploratory project. We were interested in scoping the problem of what it even means to say you’re preserving these complex digital artifacts, complex software. So the games in our case set, there were eight in all, spanned the years from 1962 to 2003 or thereabouts. We included a range of genres, a lot of interactive fiction. So we had in our case set Colossal Cave Adventure, which is the first documented text adventure game. We had Spacewar!, which dates from 1962 and was originally played on a PDP-1 machine. As far as I know, the only functional PDP1 machine is at the Computer History Museum over in either Palo Alto or San Francisco. I don’t know of any others.


The case set also included first-person shooter game Doom. Other works of interactive fiction included Mystery House, which was the first work of interactive fiction to include graphics and you can see how incredibly crude [it is]. It’s actually, Mystery House is that house that you see in the upper, I guess it’s your left-hand corner of the screen and it was filled with almost stick like figures. They were vector graphics. I believe that Mystery House was coded in Fortran originally. And then Mindwheel, another work of interactive fiction that was authored by Robert Pinsky, former poet laureate of the United States.


So I’m giving you some background on the projects first, an overview of what we did, and then I’m going to touch on issues of authenticity and trust in the Open Archival Information System. Our project was OAIS-compliant, as they say, so I’ll talk a little bit more about what that means. And, as I mentioned in the round table earlier, I adopt a very broad and expansive notion of data so again artifacts, phenomena was the term used during the discussion, events, actions, ideas, as well as things like numbers. So you can all let me know at the end if what I’m talking about is actually data modeling or if it’s something else. And also for the archivists in the room if I get anything wrong about the OAIS system, please just correct me.


So, I mentioned that PVW was an exploratory project. Some of the problems that we encountered, the challenges while trying to preserve the games in our case set—by preserve, we actually ingested the bits into institutional repositories at Stanford University and University of Illinois Urbana-Champaign and created information packages. Those are what live now in those repositories. But challenges we encountered were things like platform obsolescence and you actually see the PDP1 machine that I just referenced a minute ago there. The technologies have simply entered the stage of oblivion. Things like software dependencies. So, when you run a game like Mindwheel on your computer, it’s hard to determine what the boundaries of the object are that you’re preserving because there’s so many software dependencies. So the computer program that is Mindwheel will depend on the code libraries supplied by the operating system that are shared across different programs. So, that presents an enormous obstacle. Things like intellectual property law. This was huge in the case of, for example, Second Life. Second Life was in our case set—the only 3D virtual world in our case set, which dates from 2003.


Oh, there it goes. Can you guys see the Twitter thing going up? So Second Life—you might know if you’ve ever been in Second Life that residents of the world can claim intellectual property on the artifacts that they create. So we weren’t trying to save all of Second Life. That would have been a fool’s errand, but instead we were trying to grab about three or four different islands in Second Life. So Democracy Island, Stanford University library’s presence in Second Life, the International Spaceflight Museum in Second Life and a few others. What we tried to do to address the IP issues was first take an inventory of all the different objects on a given island and then identify the in-world owners of those objects and essentially give them a virtual, you know, gift of deed or owner’s agreement form to see if they would be willing to sign it to allow us to grab their content. This failed miserably. The best-case scenario was one of the islands—I think we got a ten percent response rate and on one or two of them we got no responses at all.


And there were enormous technical problems on top of that. It’s very hard; it’s not that hard to grab textures and basic geometry information. There’s actually a program called CopyBot which we used, which was created by Second Life “griefers” who were stealing people’s intellectual property, [06:56] so we rode on the coattails of the pirates and used their program for our preservation project, which is actually very typical. In fact, one kind of sub-theme here is that everything we do in the realm of digital preservation of video games is really parasitic, I think, on what the gamers do. They do extraordinary things. They build emulators—so, very often in the library and information sciences world we depend on the emulators created by gamers. They create weird hardware to do things like grab content off of obsolete storage media, magnetic media. So, there’s one piece called a KyroFlux, a piece of hardware. It’s basically a floppy disk controller that allows you to circumvent the original platform or system that’s now very hard to come by and usually not in functioning order. You now don’t have to go track down an Apple 2 or a Commodore 64 or whatever it is. You can simply use this KryoFlux that’s hooked up to a floppy drive, which you can still find, and it doesn’t have to be a floppy drive from the original system. And then it’s connected with a USB cable to a modern PC. The software on the hardware device isn’t stymied by DRM, it can basically read any format, and so you grab the bits off the magnetic media and then it’s on your modern PC.


So they do things like that, they create metadata, and they are very cavalier about IP issues. And one of the interesting things is that they make headway in getting IP rights where we often fail. So, for example, there’s a lot of gamers who create what’s called “machinima,” which is movies made using game engines. This is considered derivative work and therefore technically you have to get the permission of the copyright holders. They started doing it anyway, not asking permission from the game developers. The game developers saw that it boosted their brand value and they eventually created content usage rules that now give the pirates the right to actually create this work. So the practice of unlawful machinima led directly to the practice of lawful machinima.


We experimented with different forms of preservation, including migration, emulation, and what we sometimes call re-enactment. It goes by other names like recreation and reimplementation. So I won’t say too much about this other than maybe I’ll just say [a couple words] about emulation. I think we tend to think of emulation as the equivalent of a kind of facsimile of the original object. And one of the things we did with our game set is we tested different emulators. So we would run our legacy software on different emulators and compare them and they actually vary considerably, emulator to emulator, and in the quality of their rendering. So some emulators might render color better or sound better or what have you. So it’s actually a vast degree of difference across different emulators. Re-enactment or reimplementation—in our case set we looked at Mystery House, which I mentioned a minute ago, which was released into the public domain in 1987, but the source code was never released. So in 2004 Nick Montfort, Emily Short, and some others re-coded and reimplemented Mystery House in the Inform programming language and it’s available as the Mystery House Taken Over Project online now.


I’m just going to mention briefly PVW2. I’m not going to concentrate very much on it. This is our follow-on project. Our case set, I think there’s a slightly smaller set of games, I think six, maybe seven. They’re mostly educational games, but Doom makes a reappearance in this case set. You can’t really make a case that Doom is an educational game, but we’ve got Oregon Trail in there, [Where in the World is] Carmen Sandiego[?], The Typing of the Dead—which was a game based on a previous arcade game, a light gun shooter, a shooter on rails, where you would kill zombies. Typing of the Dead, the player characters now walk around with portable Dreamcast machines strapped on their backs and keyboards, and instead of shooting zombies they type them to death. So this is seen as a typing game. So, the zombies would get words over their heads and you’d have to type them as fast and as accurately as you could to kill or neutralize the zombies.


The first project was really much more practical, as I said, at the end of day we were trying to ingest bits into repositories. This is in some ways more research-oriented project. [12:03] Our premise is that preservation tends to be lossy, you tend to lose information over time, but if you can identify the most significant characteristics of the objects in your custody then you can try to adopt preservation strategies to ensure that those features or characteristics remain intact even if you lose others. So in Typing of the Dead, for example, you’ve really got to preserve the fact that it requires a QWERTY keyboard. It doesn’t make any sense without that. On the other hand, maybe something like color depth or color tonality is a less important feature, so that kind of thing.


The big challenge with this project—there’s a number of big challenges with this project—but one that has emerged, I think, that looms rather large is that the archival literature on significant properties, and there is a fairly robust literature on this, usually defines significant properties in a way that suggests their surface features. It’s often described as the look and the feel of the object that you’re preserving, so features or attributes that you can visually inspect. And in the case of games you’re also interested the underlying, you know, data structures and structural properties and that’s very hard to get at because we don’t always have the source code. And we, in fact, usually don’t have the source code. Although for our case set for some of the games we did have the source code.


And besides that, it’s not only about having the source code. It’s about understanding the relationship between the underlying game engine and the expression of the work at the visible level. So, it’s something we’ve been thinking about a lot. We’re interested, for example, in things like programming debugging tools that programmers use to let them understand the relationship between the underlying code and the behavior of the program. So when you see something buggy happening in the software program, how do you trace it back to what’s happening in the code? And, I’m not a programmer but they have these kinds of tools.


Also I think you could look at something like what the genomic community is doing. So they have a sort of similar problem where they have the genomic layer of data, the sequences, the genetic sequence, that’s the genotypic layer and then there’s phenotypic layer which is the expression of those genetic sequences at the visual level, the behavioral level. And so they’re trying to map between those two levels and they’ve developed all kinds of tools for doing that. And then there’s a strong tradition in HCI that is about trying to make the underlying behavior of systems visible to an end user. I think we could learn a lot about this and I think this ties it back into some of the discussion earlier today where Wendell [Piez] was talking about what features of the underlying data model can we shield from the end user because it’s simply not necessary for them to know. They might not be interested in knowing it. But the converse is that often it can help the end user to actually know something about the underlying data model or the game engine or whatever it is. And so, again, in the HCI field there’s some really interesting work that tries to do that. So making the behavior of a home network system understandable to the end user. I just saw a fascinating talk about this at the iSchool at the University of Maryland, where the researcher was trying to help users see what applications consume the most bandwidth, for instance. Was it YouTube or Facebook applications—and then they could also see what members in their household consumed the most bandwidth and other properties of the underlying system. [16:08] So I think, I would point to this particular issue—making the unseen seen—as one maybe one important aspect of data modeling.


Okay, so now I’m going to switch over to authenticity and trust. So basically, I suggested earlier that gamers play an incredibly important role in digital preservation. And then you’ve also got professional archivists and researchers like myself who are trying to save games. And so I’m interested in kind of comparing and contrasting the two models of authenticity that exist in these two communities. They’re very different. And I think, because the gamers play such a crucial role, there’s no reason why both camps can’t be pursuing their separate projects and adopting different models of authenticity. But I also think, kind of building on some of Jeremy John’s recommendations around personal digital archiving, that we need to think about supporting a post-custodial model of game preservation where these player archivists take primary custody. We don’t actually acquire all of these games ourselves, but we try to provide services that let them do what they do better. How can we help this community? How can we partner with them? And so understanding how their models of authenticity differ from ours can useful in that regard. So I have up here the definition of authenticity that comes from the Society of American Archivists [SAA][A] Glossary of Archival and Records Terminology. So they define it as “the quality of being genuine, not a counterfeit, and free from tampering, and is typically inferred from internal and external evidence, including its physical characteristics, structure, content, and context.” And I kind of bolded the “free from tampering” there, because I think that the OAIS model really builds on that facet of authenticity.


Gamers by contrast, they’re more sort of tolerant of variability so I’ll get to that in just a minute. In fact here’s a couple of quotations. So the first quotation comes from Jon Ippolito, who is an artist and also works on preserving net art—variable media art as he calls it. And he has this to say that gets at some of these questions, “new media art can survive only by multiplying and mutating…fixity is death.” And Allen [Renear] had mentioned fixity around the planets data model in his talk. This is a very, very different model of authenticity that Jon Ippolito is identifying.


There’s also a fascinating talk on digital preservation and evolutionary theory at the 2010 Digital Humanities Conference at King’s College London, where the authors were looking at the application of evolutionary theory to digital preservation. And among other things they noted that, this is Peter Doorn and Dirk Roorda, “The Ecology of Longevity: The Relevance of Evolutionary Theory for Digital Preservation,” “keeping digital objects fixed and rigid is difficult,” they say, yes, “migration as a preservation strategy, adapting data to the environment, which is what migration is, is better from a biological perspective.” “The traditional method of preserving first and then reusing content is illogical and even perverse from an evolutionary perspective. Evolution gets rid of unused functions. Better strategy is reuse and then preserve” and “copies should be free to evolve; make copies in evolvable forms.”


So I see what they’re advocating as consistent with a lot of what the gamers do. Now the gamers are not a homogenous community and there are some remarkable differences of opinion. For example, the folks behind the software preservation society, they’re all gamers and they’re actually the ones who created the KryoFlux, which I mentioned earlier. They have what I’d almost call a kind of fundamentalist attitude towards digital preservation.  And they’ve actually created tools that let them read flux transitions and magnetic media at very, very, very fine resolutions and so they are all about extreme fidelity to the bit stream. So that’s a sort of corner of the community that thinks very differently.


(I’m going to skip this stuff.) I mentioned the Open Archival Information System and their model of authenticity. So for those of you who aren’t familiar with it…First of all it’s going to be hard to condense the OAIS in one or two slides and whenever I see presentations on it, it’s always incredibly tedious, but this is my understanding of the OAIS model. It is a framework, a widely accepted framework, developed by the Consultative Committee for Space Data Systems (CCSDS) back in 2002, I think was when it was finally published, but it’s meant to be kind of content agnostic. So the fact that it originated with this community doesn’t mean it isn’t used by other communities. On the contrary, it’s very, very widely adopted in the archival world. So the framework provides a sort of shared terminology and shared set of concepts for thinking about things associated with digital preservation. It basically spells out the different high level functions and services that a digital archiving system is responsible for and it characterizes some of the attributes of the information objects that are the focus of preservation in the system. So this is, you always see this in the slides, it’s not a blueprint for system design. It doesn’t tell you anything directly about implementation of the system. It’s, again, a very abstract model.


This is a diagram you often see of the OAIS that spells out the different stakeholders. On one side you see the producers who are the creators of the digital information that our repository is acquiring and ingesting. You’ve got the consumer, which is called, generally, a Designated user Community in OAIS terms. So before you undertake the project of ingesting your bits and figuring out what it is you’re going to ingest or collect, you do an assessment of the designated user community. You decide who that is, and you assess their knowledge base, as they say, and you try to provide information or save information that is consistent with your understanding of their knowledge base. So, generally the broader you go with your user community, if you say the general public is your designated user community, you’re going to have to supply more information as part of your OAIS information package than you would otherwise, because you have to assume, you know, you’re really kind of preserving to the lowest common denominator in that case. And I’m going to be really focusing in on that middle square in the diagram, the archival storage. Here we see just another version of that same diagram, but here we’ve added the information packages that move their way through the OAIS, the digital archiving system. They have lovely terminology for this. There are three variants or flavors of the information package—the Submission Information Package [SIP], the Archival Information Package [AIP] and the Dissemination Information Package [DIP]. The A.I.P., or I think they call it “AIP,” is that right? Is what’s preserved and maintained as part of the archival storage function of the OAIS. So that’s the focus of the preservation efforts is that AIP, the archival information package.


And here you see a model of what’s in that package, so the structure of the information. It includes content information, the actual objects that are the focus of the preservation, such as a particular game, and also what’s called Representation Information. Representation information is what you need to maintain to ensure that your bit stream is intelligible and renderable over the long term, otherwise it’s just a meaningless string of zeros and ones. So how do you decode that bit stream? You’ve got to preserve the information to do that, that’s your representation information. In the case of, say, Mystery House, which ran on the Apple 2 system, you would want, for example, a copy of the Apple 2 DOS manual as part of your representation information. Now we get into the infinite regress because then if the DOS manual is in PDF format then you also need to include, as part of your representation information, the specifications for the PDF format. And then if the specifications for the PDF format reference other documents, then they need to be part of your representation network, too. So this very quickly becomes this very bloated network. It’s all about relationships, as Andy [Ashton] was saying earlier, and you’re mapping those relationships within the OAIS model. And I think, my understanding is that actually theoretically or supposedly your representation network ultimately has to end in an analog piece of information. It’s actually got to end with something physical in the real world. I don’t know if this is apocryphal, I think I’ve heard Jerry [McDonough] say this.


Another thing I want to point out here is the preservation description information in this A.I.P., this AIP. So this includes provenance and Fixity Information, so now we’re getting at authenticity. The fixity information might include something like a checksum value, which is kind of like a digital fingerprint, and you’d run a checksum program against your digital object, obtain your number, and then at some later date you’d run it, you’d run the program again, and then you’d compare the two fingerprints. And assuming there’s no difference between them, you can assume that there’s been no changes in your bit stream, that nothing’s been tampered with, or there’s no bit rot or whatever. So that’s fixity information—making sure that your bits are stable over time. But then there’s provenance information, which is documentation about the life of the artifact, the history of ownership, and the changes or transformations it’s undergone. So it also records that information.


Does the OAIS model tolerate alteration of the preservation object? Yes, it does, but the OAIS model necessitates that you’re thinking of two things simultaneously: preservation and access. You want to be able to provide access to that designated user community. So you’re constantly balancing the tensions between preservation and access. It might be necessary, for example, to migrate files to a more contemporary format that is compatible with your software and hardware systems. Delivering it to the end user in a different format. So delivering it in a JPEG format instead of a TIFF format. All of this is documented as part of the provenance information. So even though it allows for some change in this manner, you can say that in the OAIS model preservation happens in spite of, not because of, these necessary changes that have to take place. And they’re kept very minimal and you try and do the least possible so you’re always preserving the integrity of those underlying bits.


Okay, so we’ve got these two contrasting models. As I said, I think one thing we can do is— it’s fine for the two models to coexist, I’m not judging one model over the other, and in fact the model adopted by a lot of gamers is I think very consistent with what we saw in the late 90s and early 2000s in the field of textual criticism where it was all about the social texts and we valued the accretions and additions created by editors and so forth. In some ways you’re kind of seeing a similar type of model among at least a lot of the gaming community, but not all of them. But, given that they play such a prominent role in digital preservation, can we think about services that we might provide that support that model? This is a quote from Jeremy John at the British Library, I mentioned him earlier and he has this great white paper on personal digital archives, and he has this to say: Jeremy John of the British Library has postulated that “future researchers will be able to create phylogenetic networks or trees from extant personal digital archives, and to determine the likely composition of ancestral personal archives and the ancestral state of the personal digital objects themselves.” So thinking about tools that we might provide to these communities to kind of map and visualize the interrelationships between the different versions of the games that emerge. So if we’ve got these version streams around objects then this is something we could do. You already see the gamer community doing similar things. They create, for example, trees showing different versions of [Colossal Cave] Adventure, which is that game I mentioned earlier; it was created in 1975 by Will Crowther, revised, the source code was released, it was revised by Don Woods a couple years later and the user community, the player community has continued to adapt and change it over time. They provide these kinds of trees. This tree is actually based on several of those in the player community, but was created anew by Jerry McDonough and Matt Kirschenbaum and published in Digital Humanities Quarterly a year or two ago.


But to close I want to propose one other potential approach to this. And this is still very unformed but it’s something I originally floated in our 2010 white paper that came out of this project (and you can actually just Google for “Preserving Virtual Worlds Final Report”). This was inspired by my colleague in the iSchool, Jennifer Goldbeck. She studies trust relationships in web-based social networks. So, she actually designs algorithms to do this and there’s tremendous interest in the social network analysis world around trying to understand trust in online communities. And developing trust models, ways to detect trust, measure it, and so forth. I’m actually borrowing from Jen’s dissertation and proposing this idea. She designed some algorithms to measure and detect trust and model it in some communities.


I’m thinking again about surrogates. Surrogates are proxies for what it is we really want to try to get at. So can we get at authenticity in a different kind of way. Digital preservation services calculating trust in fan-run game repositories. Because game archives in the wild cannot usually be authenticated according to standard integrity checks, an alternative method for evaluating the authenticity of their holdings might involve the application of trust-based information. Jennifer Goldbeck, for example, has demonstrated how the trust relationships expressed in web-based social networks can be calculated and used to develop end user services such as film recommendations and email filtering. Applying Goldbeck’s insights, archivists could leverage the trust values in online game communities as the basis for judgements about the authority or utility or authenticity of relevant user-run repositories such as abandonware sites, Home of the Underdogs, and game catalogues, like MobyGames. [33:08] Under this scenario, authenticity is a function of community trust in the content and the other individuals who are a part of the community being provided. One consequence of this approach is that “authenticity” and “mutability” need not be considered mutually exclusive terms. On the contrary, fan-run game repositories that make provisions for transformational use of game assets, such as altering the appearance of avatars or inventory items, might in many incidences increase trust ratings. So, very counterintuitive, it’s a different way of getting at the issue. And I’ll just leave it there.

[Wendell Piez] Thanks for that. That was really interesting and I think it does bear on some really provocative things, some of the things we’ve talked about so far. One of the things that comes to mind that I think is interesting is the way in which this goes back to what Allen and Paul [Caton] were talking about earlier this afternoon, because, fundamentally what I see emerging out of the conservation about Preservation of Virtual Worlds is that the virtual world that you want to preserve isn’t the “it” that preservationists may be always focused on when they’re preserving artifacts. And so, for example, this final idea that mutability and authenticity are not necessarily opposed to each other, this sort of comes back to that notion of what that actually is.

[Kari Kraus] Right, right, yeah.

[Wendell Piez] And with respect to that also, it also bears on the question you asked in opening as to whether we’re really talking about data modeling at all. And to that I would say, well, you’re, in a sense not talking about data modeling in the sense that we’ve talked about it so far today because you’re talking about a model being a protocol instead of rules. It’s a definition, it’s a framework.

[Kari Kraus] Yeah, yeah.

[Wendell Piez] But, nevertheless, you’re faced up against all the problems because you have the problem of dealing with the stack in which we have both explicit and implicit data models in a more concrete form all built in.

[Kari Kraus] Right.

[Wendell Piez] And we haven’t really gotten in so far to some of the other metaphysical problems, questions about what’s the difference/relationship between a serialization format and the model that it’s in, that it expresses, or, for that matter, not even serialization format but an implementation of a programmer and operator versus the model it expresses.

[Kari Kraus] Yeah.

[Wendell Piez] Right? Those are all buried in all of these issues about how is it we actually encapsulate this thing.

[Kari Kraus] Right, right. So the data model issues are sort of more subterranean.

[Wendell Piez] They’re subterranean but they’re at the heart of it because if we can’t actually know what those things are, then how do we expect this thing to have any preservation at all and even look at much less to write about.

[Kari Kraus] Right, right, yeah.  I was also thinking about Allen’s talk too in terms of the issue of identity conditions. And I think you could sort of think of different communities having more rigid or looser identity conditions in terms of how they approach the similarity or the sameness between two or more objects. It seems like many members of the game community adopt relatively loose identity conditions, or they operate—to think about FRBR terms—they operate at the FRBR work entity type level. One additional point of contrast is that in OAIS, as it’s now implemented usually or customarily, there’s really no such thing as a duplicate of an object in the OAIS model. So if you create a duplicate of a file or if you migrate a file, let’s just stick with the duplicate, you’ve got a duplicate that still has the same bit stream values as the original. In OAIS implementation those are considered two distinct objects. The second dupe that is not considered a variant of the original, but rather its own unique, distinct object, although you do preserve the relationship between them, which is a derivation relationship. But they’re treated, there is no such thing as two objects being the same, I don’t think, in the OAIS model.

[Maximilian Schich] Right. There’s a very interesting book by Salvatore Settis which is called The Future of the Classic. I don’t know if there is an English-word translation. There is an Italian and a German. And that very much follows or basically brings another example for your point about the difference of preservation. So he says, basically, Europeans always, or the West of the world always focuses on depicting ruins to signify old age. While Chinese and Asian people, in general, tend to signify old age with say, a really old tree, and ruins were actually something pulled from the West. And if you look at preservation, a Japanese temple is still considered old even if there’s no piece of wood which is older than fifty years because it’s done in the same way as 500 years ago.

[Kari Kraus] Oh, right. I’ve actually heard of that model of preservation where, well it’s the parable of Theseus’ ship, right? Isn’t that what it is? Where Theseus has his ship, is it Theseus?

Maximillian [Schich] Yes, Theseus.

[Kari Kraus] Yeah, and over time he has to keep replacing planks as they rot and so at the end of the day, is the ship with completely replaced planks, is it the same ship that he set sail with? Because structurally they’re the same but –

[Wendell Piez] They maintain Theseus’ ship in Athens, I understand. Every twenty years they rebuild it next to itself with completely new materials. This has been going on since the year 600 or so. And so every 20 years there’s just a different building.

Maximillian [Schich] And I think the point is we don’t need to. So basically what we have in mind when we think about progression models is relic—or a kind of Catholic church kind of thing.

[Kari Kraus] Right, right.

[Schich] Which is to rest something. Which is something that is not used anymore. You just look at it. But if  you look at other things like the Dome of Cologne, the church in Cologne, which is restored all the time because if you’re ready on one end, you have to start at the other end because of rot spots in building. And I think there may be examples all across the world where we can actually trace that and basically foot that somewhere, which maybe is a better argument. Because as it looks right now, my hunch right now would be that there’s a better user base for gamers preserving games in Asia, Korea, Japan and China than here. Is that true, or…?

[Kari Kraus] Well that’s actually a good question. I actually don’t know. I mean it’s an open question.

[Laurent Romary] I was thinking about something I would ask later and I can do it now. From what I heard that concerning also the OAIS model and the previous joke with the discussion about proxies is that when discussing data modeling, we need somehow to discuss a metadata model that we need to interpret what’s going on in the evolution of information. We’re constantly in this situation, and I call that “surrogate” in some of my prose, a kind of generic view that every digital object we create at some point is a surrogate of something. The duplication is one operation, one possible operation.

[Kari Kraus] Right.

[Laurent Romary] Or creating a document compiling annotations about […]. Creating a kind of surrogate also from the object. And by really identifying what is specific to such a surrogate, authorship, time stamping, relation to possible sources in a kind of recursive way, is necessary so that we can see those elements of preservation as also taking into account those evolutionary phenomena and we probably see this the community as a whole.

[Kari Kraus] Yeah, I think the temporal dimension is really, really interesting.

[Julia Flanders] I have a question that has come in through the Twitter feed from Toma Tasovac who is talking about an analogous problem preserving performance, and he’s asking about whether it’s possible to model immediacy and presence, things that are inherently and virtuously ephemeral. I’m adding those words on his behalf because he was dealing with 140 characters.

[Kari Kraus] So, preserving the attributes of ephemerality essentially?

[Julia Flanders] Yeah, I’m guessing that that’s implied in the modeling immediacy and presence.

[Kari Kraus] Yeah, yeah, okay. Because I thought of, well, I thought of Agrippa [Project], which I know a number of you in the room know that story. This poem that was designed to erase itself by William Gibson. So if you ran it in your browser once and scrolled down your screen and it encrypted itself and it was gone forever. So the creators, including Gibson, actually had a good time imagining what librarians would do when they actually had to create a filename for one of these objects. But, okay so presence. So how do you capture and preserve presence? I immediately think of someone again like Jon Ippolito, who also works with this kind of art. He’s created this new media art questionnaire, which he gives to new media artists to fill out before their artwork is exhibited or acquired by a museum. And it basically has the artist go through and indicate which attributes or features are absolutely necessary to preserve over time, and which ones he or she is willing to sacrifice. I imagine that that questionnaire probably captures some of those things, but I guess I’d have to think about a concrete example in terms of some…how something like presence would play out. Yeah?

[Lisa Swanstrom] To add to that, the Agrippa Project is a great example. I was lucky enough to be on that project. And the way that it was managed to be archived was kind of through a cheat. We’re filming it, like a picture. We have it on a site. And I was thinking about that in terms of the immediacy of the experience of gameplay itself. And so how might that be archived and is there a possible cheat? And a possible cheat might be machinima or something along those lines. That kind of thing.

[Kari Kraus] Okay, yeah. Generally as part of it, OAIS also preserves context information, which is different from presentation information, which is much more technical. The context information is the other information you’d have to include in order to help a user at some time in the future make sense of and understand the significance of this particular piece of work. In the case of the text adventure game, Colossal Cave Adventure, we included, for example, Dennis Jerz’s article on the game. He actually went and explored the cave system in Kentucky that the game tried to model and he kind of mapped similarities and differences, and he traced the whole provenance and history of the game and so forth. So that article became part of it. I think you could indirectly get a presence and things like through, again, surrogate documentation. Well there’s also things like my colleague Henry Lowood talks about this quite a bit, things like game demos or certain genres of games like Doom, where you’re not video recording gameplay or a game session, but rather there is a feature of the game engine that will capture gameplay through a set of instructions. It’s basically documenting or notating the inputs of the player and then you need that same version of the game engine to play back that demo. But again, it’s not video recording, it’s actually documenting or notating the input of the player as a set of instructions.

[Daniel Pitti] The word for it is ‘transaction.’ Capturing the transactions.

[Kari Kraus] Okay, that’s what it is. Okay, yeah so something like that.

[Syd Bauman] And it’s been done for hundreds of years with, say, chess.

[Fotis Jannidis] Actually they’re using Fraps and similar tools to capture what is on the screen and there’s a huge, a new, sort of text that’s called “Let’s Play,” where you have people sitting in front of the screen and they comment themselves while they’re playing so they’re not just recording the game but they’re recording what they’re doing then, at that moment, how they’re feeling, and so. This could be an answer.

[Kari Kraus] Yeah, playthroughs and walkthroughs.

[Lisa Swanstrom] It’s extremely exciting for narrative studies when all of this kind of layer of all that has happened in this environment is just [calm?]. Very exciting.

[Kari Kraus] Right, right.

[Maximilian Schich] There’s an interesting analogy to art history because art historians for a very long time like to think of very famous books, for example, The Cathedral by […]. He basically studied them from photographs and that’s what’s going to happen with the history of video games if we go down that road, because if nobody had Competition Pro 5000 Choice, they’d take hours to play a certain game, but he does not know how it feels to play the game.

[Kari Kraus] Right. That’s right. It makes me think too of, there was a practice in the Renaissance where Renaissance artists like Titian would try to reconstitute lost paintings of antiquity based on surviving verbal descriptions of them.

[Maximilian Schich] Very good.

[Jim Kuhn] So I have a question about your experience with the OAIS model. It’s extraordinarily complex. And I wonder if this is an example that is constructed in a negative kind of way because this is a conceptual model that is so complex that implementation of it, in practice, is almost impossible. And I don’t think that is exactly an exaggeration. A Portico report from 2001 [sic 2011] about trying to implement OAIS models in Cornell for digital preservation, and discovered that a model that they called Zip and Hold, where you just zip up a bunch of files and get them onto spinning disks, was actually far more reliable. And one of the questions I had about OAIS was, that I think is relevant to the discussion here, is when is a conceptual model an impediment to getting required work done. Because in the cultural heritage community and, in particular, the OAIS model is so intimidating that people are letting their portable hard drives and CDs and DVDs pile up because they don’t have a solution that will perfectly implement the OAIS model. We’re losing cultural heritage because the preservation data modeling is so extreme

[Kari Kraus] No, I agree. I actually wish Jerry McDonough was here because he was really the lead on that part of the project, but we had a very small case set so that’s why it was feasible. But again, I would point in contrast to the gaming community where they are very, I’m reaching for a word, it’s not quite opportunistic, but they’ll do things like, you know, in the early 90s when bandwidth was really awful they would rip out certain behaviours of the game like the sound or some of the graphics and just upload what they were able to. Or in the case of Agrippa again, the player community—or not the player community in this case, the community interested in the work of William Gibson, science fiction and so forth or interested in new media art—they simply circulated on the net, a three hundred line poem, a transcript of the poem, in plain ASCII text. That circulated for years, a poor fragment, a poor representation of what was originally this book artifact with a three and a half inch floppy sitting in the middle of it and then you plugged the floppy in and so forth. But because they floated the 300 line plain ASCII text for years, a decade, eventually researchers were aware of it and knew of it and had encountered it so they at that point had the resources to go in and do something more sophisticated with it. So it’s almost like however weak that signal we can send down a conductor of history it can be amplified by a later age.

[Maximilian Schich] There’s another example that goes in the same direction that if you think about how much time and money, how an institution has spent all that building image databases and different fields and then look at ARTstor, for example, has 2 million objects. Facebook has 25 billion photographs. It’s the largest image database on Earth. And if you look at people running through museums, if photography is forbidden, they’d never have taken that […] picture and looked it up, and showed it to their friends. And I think that’s resources, which are untapped outside of a few scientists who actually have access to the data and publish in a paper so it can be in the data. So that’s a very, very interesting point that you make that basically all our models and all our setups are actually not encouraging documentation. They’ve actually inhibiting documentation.

[Kari Kraus] Yeah.  It’s such that old afterwords they have the perfect meaning of the good or whatever, yeah.

[Daniel Pitti]

Two points in regard to the OAIS. You might take a look at something like Archivematica, which is an open source OAIS development. Have you seen it? And someone doing it from an archival point of view. And then the other would be just call attention to the Electronic Record Archive at the National Archives [and Records Administration] in the United States, which the initial development of that was in the hands of Lockheed Martin and it was based on OAIS. And it ran into a wide variety of things that caused problems, not the least of which is probably the fact that you have a large contractor, government contractors are used to being paid large sums of money and not producing much, unfortunately. But the, what was I going to say about this? But the requirements within the context of the National Archives and government archives, and this is true around the world, is that you have things that are legally mandated in terms of the archiving of records and maintaining them over time. And so that adds an extra layer in here that has to be met. And of course in a lot of cases what’s going to have to happen is you’re going to have to go back and revisit the legislation that set it up to begin with because, for example, with the national government of the United States you had transferrals and schedules that allowed a created agency to keep their records for fifteen years and in a digital environment, if that’s electronic records, fifteen years after the fact can make them a little difficult…

[Kari Kraus] Yeah, absolutely.

[Daniel Pitti] Yeah, so it gets really, really complicated so on one hand, yes it’s complicated, but on the other hand it’s a complicated problem.

[Kari Kraus] So the two examples that you gave me, those were just flat-out unsuccessful efforts to…

[Daniel Pitti] No, no, I do not want to be interpreted saying that the National Archives is a failure, as a member of the advisory committee.

[Kari Kraus] Okay. They were just problems and challenges.

[Daniel Pitti] Let’s just say it’s met a certain set of requirements, but it didn’t realize everything that they want. And so part of it is over time it will have to be further developed. And officially, on the record, that’s my position.

[Kari Kraus] Thank you.

1 thought on “Kari Kraus, “Preserving Virtual Worlds”

  1. Pingback: Knowledge Organization and Data Modeling in the Humanities: An ongoing conversation | datasymposium

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s