Direct download link: Interview with Rebecca Fraimow.
In this week’s episode, Christie chats with Rebecca Fraimow about her article from Issue 4 (order now) called “Thinking ‘Long Term’ with LTO”, and the many challenges of archiving digital media for long-term preservation.
- National Digital Stewardship Residency
- American Archive of Public Broadcasting
- Linear Tape-Open
- Digital Public Library of America
Checkout DevOpsDays Portland, coming up on August 9-10. Use discount code recompiler2016 for 20% off! We’re also raffling off a free ticket to one lucky Recompiler reader/listener. Enter here by August 1st.
Our latest issue is in the shop now! Issue 4: Legacy systems focuses on how systems and code change over time, and how we maintain them. In this issue we have tips on how to maintain legacy code bases effectively, a history of the GNOME project and how it relates to urban change, a case study on working with legacy data, and other interesting and informative articles. Order now!
Support the Podcast
We love hearing from you! Feedback, comments, questions…
We’d love hearing from you, so get in touch!
This is a raw transcript. Please send corrections to firstname.lastname@example.org
Christie: Hello, and welcome to the Recompiler, a feminist hacker podcast where we talk about technology in a fun and playful way. I’m your host, Christie Koehler. This is episode 7, and I chat with Rebecca Fraimow about her article from issue 4 called, “Thinking Long-Term with LTO and the Many Challenges of Archiving Digital Media for Long-Term Preservation.”
Christie: Hi, good morning, or I guess if you’re in Boston, it’s afternoon, isn’t it.
Rebecca: Just barely afternoon, officially.
Christie: Great! Let’s get started. Rebecca, tell the audience a bit about yourself.
Rebecca: So, I’m a moving image … specifically a moving image and digital archivist. I graduated from a program that focused specifically on moving image archiving. But since most videos these days are shot on a digital camera that had a pretty heavy component of digital archives focus, but one of the first …
Christie: So moving images meaning movies, basically?
Rebecca: Yeah. Basically anything … I mean, there’s a couple different ways you could say it. The sort of fancier way, I guess, is people call it time-based media, which is anything that is not a fixed image or document, but something that relies on the passage of time to experience. So, a video, audio as well, something like a podcast … Anything like that where there’s a 4th dimensional component.
Christie: So, you might even say running through a slide carousel and telling a story? Could … Well, I guess if that were recorded, that would just be a video. Never mind.
Rebecca: Basically, yeah. But it’s pretty much that same idea. You know, it doesn’t have to be … When people talk about it, I think the thing that people tend to think of first is [inaudible 00:02:06] because that’s what you hear most about in restoring old Hollywood movies and nitrate film, which is very cool and definitely a component of it. But, it encompasses a lot more than that and a lot more really cool stuff than sort of he first thing that people tend to think of which is sitting over [inaudible 00:02:24] film reels and cleaning, painstakingly, little pieces of celluloid.
Christie: Okay, so back to the digital aspect of … What did you call it again? Moving picture archiving?
Rebecca: Movie image archiving, or audio visual archiving I guess is more all encompassing because that includes the audio that’s as well, which I’ve most worked with film and video and digital video. I tend to forget about the audio part which is a major component.
Christie: Okay, so your studies and professional background lead you to this position at WGBH?
Rebecca: Yeah. The sort of intermediate stuff there is when I wrote … The project that I was working on that I wrote the article about is something called National Digital Stewardship Residency. That’s not audio/visual focused. It’s a program that was started by the Institute of Museum and Library Services to get more graduates of Library Science and Information Management programs to focus specifically on the process of digital preservation and get hands-on training for that.
One of the challenges right now is that a lot of traditional library archives, methodologies are still being taught in library programs and the newer process used for managing digital materials. Programs are still sort of figuring out how it’s most effective to teach those and how to get people who are really trained in the different ways you need to think when you’re thinking about …
Christie: The Skype connection between Rebecca and I got disconnected here, so there’s a little bit of an interruption. I’ll use the opportunity to say, “Hey, if you’re into this interview, go subscribe to the magazine!”
All right, back to the interview.
So, you were in the middle of talking about how the science and practice of archiving digital content … That the academic institutions are still kind of figuring out the best way to do that, and the best way to teach that.
Rebecca: Yes. So, that sums it up pretty concisely. It’s an evolving field, and the ways in which it’s evolving are challenges for everyone. So, the idea behind the residency program which I took part of, is that people who have just come out of … Gotten a Master’s Degree in some field related to Library Science would go and … To spend …
At the time that I was in it, the program that I was part of was a 9 month program. I spent 9 months embedded in an institution working on a project for that institution and participating in training and education focused around digital stewardship. They would learn from being in a professional environment about the challenges that you’re likely to face with doing digital stewardship. The host institution would have the benefit of participating in this Digital Stewardship community with the teachers, and the organizers of the program, and the other institutions, and the residents who are coming in straight out of Grad school with this fresh perspective.
So that’s the National Digital Stewardship Residency Program. There have been a couple iterations thus far. I was … At the time that I was in the program in Boston, all of the other residents were working at educational institutions; MIT, Harvard, other universities in the area. I was the only one who was working at a public television station because of my background in movie-image archiving.
Rebecca: I guess I’ll do a plug for the one we’re running now, which I’m actually … Now that I’m out of that program and continuing my work at WGBH, we’re actually running a program now that focuses on sending residents out to public television stations around the country which have much less of a grounding in issues of preservation. So it’s a little bit more of getting the training out to those places and instituting ways to help them work with their digital content, that’s obviously being created at a very, very, very great rate.
Because their mandate is to keep creating more content, they don’t have a mandate to preserve the same way a cultural institution does. It’s a little bit more challenging for them to dedicate the resources to working to make sure that stuff stays accessible and available down the line.
That’s the project that we’re working on now is getting recent graduates with Masters programs out to those stations all around the country so that they can work on preserving that stuff. That’s my current work and interest. That grew out of the program that I was in when I was in Boston.
Christie: Yeah, I know, at least I know here in Portland that a lot of the local coverage and capturing of local culture and issues is done by the public radio stations.
Rebecca: Yeah, we’re actually … One of our residents is going to going out to Portland near you guys to work with KBOO community radio.
Rebecca: So, that’s …
Christie: I’m a big fan of KBOO. That’s awesome.
Rebecca: Yeah, they’re fantastic. We’re really excited to be working with them.
Rebecca: So you guys are going to get a [inaudible 00:07:43] resident up your way in about a month.
Christie: Awesome. So, you … I think we’re not caught up to the work you did as a resident at WGBH with the linear tape.
Rebecca: Yes. I started as a resident at WGBH. Originally, I had a couple of different projects that I was supposed to work on, and the main focus had to do with the American Archive of Public Broadcasting, which is an initiative … It’s all part of again, that same challenge of getting public media stations to … In this case, it was more about the backlog because there’s 40 years of content that in most cases is just sitting around on people’s shelves on various forms of video tape which do not have a very long life span. But, it’s expensive to digitize them, and once you’ve digitized them, it’s expensive to take care of those digital files that you have.
The American Archive … Sorry, you were going to ask a question?
Christie: Just when you say video tape, do you mean something like VHS? The original thing that the …
Rebecca: Mm-hmm (affirmative)- Like VHS, but in most cases, VHS is actually a pretty recent format in the grand scheme of things. It’s not a very good format. In broadcast purposes, in most cases what you’re talking about is something like beta-cam tapes, U-matic tapes, all sort of different forms of video tape that were around before or concurrent with VHS, and are generally a little bit higher quality. Most people aren’t as familiar with them because they were never really used for consumer use. They were pretty much only circulating in the broadcast environment.
Christie: Okay, but something like magnetic tape in some sort of plastic case …
Christie: … On shelves and shelves and shelves?
Rebecca: Yep, shelves, and shelves, and shelves of sad, deteriorating, magnetic tapes. There’s a scare number that people throw around, which is that by the year, I think it’s 2025, is the year that people tend to throw around, that we’re not going to really have the opportunity to digitize a lot of the magnetic tape anymore, because so much of it will be deteriorated.
Christie: That’s 8-8 1/2 years away. That’s coming up very quickly.
Rebecca: It’s coming up very fast. Obviously, not all magnetic tapes are just going to crumble into a pile of dust at that point, but it’s very, very true that the longer it goes on, the harder it is to get a good digital copy of those magnetic tapes. The longer they sit, the less life span they have.
Christie: That’s just because material breaks down in the natural process of aging, right? The same way that paper will yellow and get brittle?
Rebecca: Yes, pretty much, except it happens at a much faster rate even than paper for magnetic media. It’s a lot harder to … Well, the problem is also, it’s not just that the materials deteriorate, it’s that the ability to transfer those, the decks that play them back, and the technology and equipment that they have is again, becoming obsolete. The less consumer and wide-spread the format was, the harder it is to get ahold of the equipment that’s necessary to play it back and convert it to a digital format.
With something like VHS … There are still a lot of VHS players floating around. If necessary, someone could digitize a VHS tape at home, and it’s relatively non-complex to get all the equipment that’s necessary to do that. But, if you’re talking about something like a beta-cam tape or a high 8 tape, or an open-reel 1 inch tape, those … Not only are those decks not being manufactured anymore, the ones that we have are falling apart. There aren’t very many people at all left who know how to repair them.
It’s extremely expensive to get one of those decks, to keep it in good repair order, and there’s really no knowing just how long it’s even going to be possible to continue to maintain that equipment.
Christie: Okay, wow. I’m starting to get a sense of the scale of the issue.
Rebecca: Yep, it sort of comes in from all sides. The American Archive of Public Broadcasting is a partnership between WGBH and the Library of Congress to start addressing that problem by working with the stations and helping to … Well, in this initial project that we were working on, we funded a lot of digitization. We got a lot of the stations to look through their materials and identify which they felt were the most important that they felt to prioritize for preservation.
That’s also another challenge, because when you’re looking at a tape that’s from 30 years ago, and maybe you don’t have a good record of it, and maybe it says “puppies” on it, and you have no idea about what that means. Whether you’re talking about the test tape that someone did of their pet dog frolicking in a field or a human interest story about a dog that saved a child from a well. You just don’t know.
We ask the stations to do their best, to look through this stuff and figure out what they thought might be important and at risk. Then we funded the digitization of about 40,000 hours of material, which is a lot of stuff. Then all of that is now available. A big chunk of it is available online, the stuff that we determine we can make available because of it was in the public domain or it was fair-used or close enough. The rest of it you can see either at our station or the Library of Congress.
In addition to all of that, all of the material that was digitized and included in the American Archive of Public Broadcasting, we also, looking ahead, we’re starting to look at the challenges of born digital materials as well. We asked a lot of the stations to send as much of their born digital content … By born digital, I mean stuff that’s filmed on a digital camera that was never on a magnetic tape. It’s sort of in a file format from the beginning of its life, all the way through.
Rebecca: Which poses different challenges than stuff that comes … That’s digitized off of magnetic tape. Managing that stuff, because so much more of it is created all the time … If you deal with a digital camera, you’re just creating 100’s and 100’s, and 100’s of different small files, as opposed to just everything on one tape.
There’s a big volume, and it’s complicated, and it comes in weird proprietary digital formats, and handling it is a challenge. That’s sort of the next challenge that we’re looking forward to now that the stations aren’t really shooting on magnetic tape anymore.
The other component of the American Archive of Public Broadcasting Project was collecting born-digital materials from these stations, and we as WGBH leading the charge, we basically pulled all of the digital files that we could find that were sitting around on our servers and sent them off to be included. We sent them off to our partners because we wanted to be all done in the standardized process and the data about them to be generated in the standardized way so that they could included in this overall archive.
This is all background, so I’m sorry this has taken so long to lead up to what I was actually working on. Originally, I was supposed to do a bunch of different projects related to this, and one of them was to take the files that we’d pulled off of our server and senT off to our partners to be included in the American Archive, and basically re-ingest them back into our new content management systems at WGBH. Because over the intervening time that this had happened, our content management system, the expense of maintaining the license, had become no longer tenable for WGBH. So we were moving to a new system, and we needed to transfer everything over.
Christie: Okay, so you’re taking care of two things at once. You’re with the organization that happens to be included in this American Archive, you’re then going to use the output of that to repopulate your content management database?
Rebecca: Exactly. That was supposed to be a couple months’ worth of the project of the 9 months that I was working on.
Christie: I love all this “supposed to be.” We’re building, we’re building. I can tell.
Rebecca: Oh, it’s always supposed to be.
Christie: I know we’re just around the corner from the big, hairy, yak.
Rebecca: Yeah, pretty much. So here comes the big, hairy, yak. Around … Over the summer before I started at WGBH, as the people who were working on this were pulling stuff out of the original content management system where it was stored on LTO-4 tapes, which are again, a magnetic format, but different from the kind of magnet formats that you use to … Like a VHS. These are magnetic data storage formats, and they’re used pretty commonly to store digital files that you don’t expect to need to access very frequently.
Although they’re kind of a pain to load up and access, they’re relatively inexpensive. Much less expensive than putting them on servers and they’re relatively stable. I know I just gave a whole long spiel about how magnetic tape deteriorates, and it does, but LTO tapes are stable for 10-15 years which is longer than you’re going to get out of your average hard drive probably.
Christie: Yeah, I think your article says that the average … Some large proportion of hard drives fail within 5 years, is that … Am I getting it right?
Rebecca: Yeah. It’s not all of them, but a significant chunk. It’s enough that it’s a cause of concern if you’re talking about saving your data for a long period of time, which is ideally what you want to be doing.
LTO tapes are … Even though magnetic tape deteriorates, it’s not engraving something or putting it on film where it’s going to last for a very, very long period of time. But it’s stable, it’s predictable. If you have your data on an LTO tape, you have a pretty good sense of how long you can leave it to sit on a shelf before you need to migrate forward. You have a pretty good sense of when the technology is going to move forward and when you can put everything on a schedule of when you’re going to migrate it on.
Rebecca: It’s a pretty popular choice if you’re working on a budget and you want to keep large amounts of data somewhere that you can access them, that they’re in your hands not 3rd party hands. But you don’t expect that you’re going to need to be using those files every day if you were putting them on an active server and constantly working with them and editing with them and so forth.
Christie: It’s sort of the microfiche.
Rebecca: Yeah, basically. It’s a lot like the microfiche of the data storage world. It’s not an ideal solution, no one loves it, much like microfiche, which everyone hated at the time, but it’s better than not having a solution and it’s better than a lot of the alternatives.
We had all of our stuff on LTO-4 tapes that were accessed through a complex data management system. As they were being pulled out of storage, we started to notice that we were having a lot of failures of these files as we pulled them out of storage. So a lot of them, for whatever reason, were just not making it all the way off the …
The file, as it came off the LTO tape, once it arrived on the other end on our local systems to be put on hard drives and be mailed off to Crawford, the files were just not complete. Either they were failing … At first there was … All that we knew or all that was known was that they files were failing. They were not the files that they were expected to be. They weren’t reading as moving-image file formats. There wasn’t really much that could be done with them. There wasn’t really a clear sense of why they were failing.
Was it that the tapes had gone bad and that there had been deterioration much faster than we expected? Was there a problem with the system? No one really … Because of the volume of stuff that was coming through, and the size of the project, no one really had time to sit down and try and sort this out and figure out what was going on.
Christie: Also, my understanding too is that you’ve got … There’s a lot of … It’s sort of a black box. You would issue a command to the management system and … Are the tapes being pulled automatically with robotic automation, or are people actually going and getting them?
Rebecca: Nope, that’s exactly it. They were being pulled through a tape robot, and yeah, it was pretty much was a black box. The system in place had been instituted by our IT department probably about 10 years before this project took place.
A lot of the people who had originally set up the system were no longer working in IT. We as the archives, although we do a lot of work with IT, it’s not the same department. We don’t have access to a lot of the inner workings of what’s going on there. It was doubly complicated by the fact that the system was a 10-year-old system that we were trying to move away from and we didn’t have a great understanding of what was going on in the inner workings. We weren’t going and pulling tapes ourselves and saying, “Okay, I know this file is on this tape, and I’m going to grab it off.” We were instituting … We were sending a command to a tape robot that was then pulling it, and pulling the file, and sending it back out over the network to us. There was some …
Christie: So you had some … There were so many steps where things could be going wrong?
Rebecca: Exactly. That was the challenge. When I started my project, in addition to working with the files that were coming back that had successfully transferred and were coming back from our partners with the American Archive Project, I was tasked with trying to figure out … Was doing a little bit more investigation into trying to figure out what was going wrong and whether the files really were still on the LTO tape; whether there had been tape failure, the magnetic tapes were deteriorating, or any of the components of the system were deteriorating, or whether there was a connection problem and what might be causing these causing these failures.
One of the biggest challenges, going back to that issue of legacy equipment and deck drivers and things that are not forward compatible, is that in the interim … After these challenges and after seeing the difficulty of getting these files out of the automated system, the archives have decided that going forward, they were going to have a much more hands on method of handling the data.
Now … Starting from when I began this project up til now, we moved up to LTO decks which is a slightly more advanced format of LTO tape. We have our own decks hooked up to computers in the archive and we just do everything pretty much by hand. We have scripts to run a lot of their writing of data and the verifying of data, but it’s much less reliant on that black box process.
When I walked in, I thought, “All right, well we have these LTO decks.” I knew that LTO tapes are supposed to be … LTO decks are supposed to be backwards compatible 2 generations. In theory, an LTO-6 deck should be able to read an LTO-4 tape. They’re designed that way so that you can migrate forward.
I thought, “All right, I’m just going to pop these LTO-4 tapes in. I’ll take them out of the robot and pop them in and see if I can get at the data.” Of course, it was nowhere near as simple as that, because as with anything in the digital world, you’re not just talking about physical technology, but you’re talking about an inter-relationship of software and technology in different programs and different operating systems. Although an LTO-6 deck can read an LTO-4 tape if you have it mounted on a linux operating system. If you have it mounted on a different operating system, which at the time we did, the operating programs that the LTO tape deck needs in order to understand basic commands given to it to chug through data, are not built into the system.
Figuring that out took some time. Figuring out how to input the right kind of commands to the LTO tape took some time because the LTO-6 tape that I was used to working with have a much more sophisticated interface method. You can pretty much … Once they’re loaded up into an LTO-6 deck, you can treat them sort of like a hard drive.
Christie: Yeah, you mentioned that LTO-6 has the … Was it the LTFS? A specific file system for it?
Rebecca: Mm-hmm (affirmative).
Christie: Whereas the LTO-4 tapes were all block-level written.
Rebecca: Exactly. They’re all block level, and it’s just a series of tar packages written along the length of the tape. In order to say, “Give me everything on this tape,” you kind of have to write a script that says, “All right, take this block, pick it up, spit it out, move the tape forward a little, take this block, pick it up, spit it out until you reach the end and there’s no more blocks to go through.”
It takes a long time for the … You can set it up. I eventually figured out to write the script that would make the tape do that, but it takes awhile to let the tape run all the way through its length and to check every piece of data for what might be on it and then spit it back out. It’s a little bit different … The system that it was part of was designed to say, “All right, this is the location on which this tar file is written to this tape, and here’s how you get it out.”
But if you’re not operating within the confines of that system, and you just want to say, “I need to find everything on this tape,” then you’re using sort of the wrong tools for the job. But, if you don’t know what the right tools are, then you … You have your hammer and you’re trying to make something look like a nail.
Christie: How long would it take to go through a whole tape? Is it something that, you would start a script and then go home for the night and come back? Or …
Rebecca: A lot of times, yes. I’m trying to remember how many hours it took. It was at least a 4-hour process.
Rebecca: I think it might have been more once you started the script and had to go all the way through. [crosstalk 00:26:32]
Christie: Was it pretty reliable once you got that going, or would you sometimes have errors in the middle of that?
Rebecca: It was mostly reliable. What I found was I was able to get most of the data off of the tapes. Most of the data that I’d got off the tapes and identified was pretty much okay. It was on the tapes. But, I did find that as I would go through, I would sometimes get error messages saying, “This tar file can’t be read,” or “This tar file is corrupt.”
Now what I don’t know and I still don’t know because we haven’t managed to get our hands on an LTO-4 deck yet, but that would really be the next step in the investigation, is whether that’s a problem caused by the interaction of the tape and the deck, or whether it’s really a problem on the tape.
The reason that that’s a potential concern is that a lot of magnetic tape formats, both audio and digital, they can be finicky with the way their data is read by the machine. Sometimes, if the alignment of the tape and the magnetic head that is reading the tape are not quite lined up, you can get blips or errors in the data.
Christie: Oh my goodness. Of course I’m flashing back to the cassette/audio cassette in the car, when it’d go “YEEEEEEEAHHHHHH” and then you’d have to wind it with a pen. I’m sure it’s not quite like that, but that’s the image that came to mind.
Rebecca: That’s the image that came to my mind too, honestly. Most of my experiences, as I’ve said has been working with video and audio tape. So dealing with data tapes is a little bit different … New and different for me. I’m not … I’m trying not to carry over too many of my assumptions from dealing with a very different kind of technology even though it’s based in the same kind of carrier, but I do think that there’s … With any technology, if it’s coming into interactions that weren’t planned for or weren’t expected, you’re going to have blips.
Really the best way to test any kind of technology, especially when you’re talking about magnetic media, is to make sure that it has … It’s dealing with everything in a way that it’s very comfortable with and very familiar with. That means having the exact right kind of deck that was designed for that kind of equipment, you’re going to get the best performance even if it is inter operable with other kinds of equipment.
Christie: Even if you’ve got an LTO-6 deck around, if you can find an LTO-4 deck and get it tuned up, it’d be better to use that to start off with?
Rebecca: Yes, exactly.
Christie: I’m curious … You mentioned in the article, and you mentioned here that you couldn’t … You were using Macs in the office, but that you really needed a Linux machine to make the most out of accessing these tapes. Were you already familiar with Linux when that came up?
Rebecca: Yes, I was. I was pretty lucky actually, in that I was both familiar with Linux and at least familiar with LTO-6 decks and setting them up and hooking them up to a Linux machine from the last job I’d had working for an organization called the Dance Heritage Coalition.
For that job, I was pretty much just hired to digitize analog tapes. VHS and U-matic tape would com in, and I would run them, and we would create digital copies and then preserve them. But, halfway through, we’d been working with another organization at that time to … They were storing our data on LTO, the digitized files. Halfway through, the partnership changed, so it became our responsibility to handle the LTO tapes and the digital files.
What that meant, was that I came into work one morning, and there was a box on my floor which had an LTO deck in it. My boss sent me an email saying, “Can you hook this up? We’re going to be using this from now on.”
I was like, “Uh.”
Christie: You’re like, “Yeah, no big deal. I’ll get on that.”
Rebecca: No big deal, it’s fine! Fortunately we had a great technical advisor on that project. His name is Dave Rice. He’s pretty well-known in the audio/visual archiving community. He was able to help me get it set up. It was a little bit of a hack just to get it working because we had the LTO deck come in, and we realized we had the wrong kind of connector. There was no way to get it hooked up to the machine. Then we went and looked on Amazon desperately for interim connectors and found something that was, I believe intended to be used for bit-coin mining. I’m like, “Oh, this seems like it’ll work.” We plugged it in, and wonders, it actually worked.
It was a lot of DIY in that particular experience. But that definitely taught me … It allowed me to get a little more comfortable with DIY-ing it once I got to WGBH. I was like, “Oh, so the LTO deck won’t talk to the Mac? All right, what will it talk to? It’ll talk to a Linux? All right, let’s install a Linux on one of our old computers. Let’s get this hooked up and see what we can do with it.”
Whereas if I hadn’t had that experience of basically having an LTO deck just kind of land on my lap, and being expected to know what to do with it, that I probably wouldn’t have been near as comfortable just kind of playing around and seeing what happened.
Christie: I love that. I love hearing stories of when you’re able to build towards something. You probably didn’t know it at the time when that LTO deck just showed up and your boss was like, “Set this up.” You’d be able to carry it … That would be a really useful experience a little bit later on.
Rebecca: Exactly. I had no idea. I was terrified. I was like, “Why? Why is this being asked of me?!” But now, I’m really glad it happened because I really think that it … Not even just for this specific project, but just in a general mindset of being willing to just kind of play around with technology. It was a huge deal for me.
Christie: Yeah, I really love that homonym when people figure out that you can actually play around a lot with hardware without actually messing something up. It’s actually pretty forgiving and robust, for the most part.
Rebecca: Definitely. You look online, and there’s … You start to see that people have done all these different things and haven’t gotten electrocuted, and haven’t died. It becomes … The more you play around with it and screw up a little and realize that nothing catastrophic happens, the more comfortable you are doing it, for sure.
Christie: So, I want to make sure we’ve got the whole … We kind of wrap up the story with WGBH. You’ve got the LTO-6 decks hooked up to the Linux machine. You’re successfully pulling files off at a much better rate. Didn’t the error rate go down to something like 5%?
Rebecca: Yeah, of the files that we found. There’s a lot of factors that went in here, so there really isn’t an easy wrap-up solution to the problem like “Oh, we found all the files and it’s fine.” We’re still re-pulling and we’re going to look at the rates that we have that. But basically we found that, of the files that were there, the error rate was something like 5%. But we also found that in some cases, the files weren’t where they had been reported to be. So, we’d asked the system, “Where are these files? What tape are they on?”
Then the system would say, “All right, there’s X files on X folder on X tape.” I’m vastly over-simplifying it, but that’s sort of the general gist.
Then we’d look on the actual tapes and realize that those files were not actually on those tapes. They’d been misreported by the system and they were probably on an entirely different tape. Although the … It’s not like those files just weren’t pulled out, the system was getting those files and mimicking successfully. So it knew what it was doing, it just wasn’t reporting it correctly. The reason that that’s potentially an issue, is because when we’re building the script to pull the files off, we were loading up and basically saying … We were looking at the reports and saying, “All right, X files are on X tape, so we’re going to pull all these files at once so the tape robot doesn’t have to be popping in … popping tapes in and out every other second.” By second I mean minute or maybe half hour because LTO tapes don’t do anything at the rate of seconds.
So if we were giving the wrong instructions to the machine based on our understanding of what files were on what tapes, if in fact those files were all on different tapes, then that increases the possibility for rate of error. You’re doing a lot more moving tapes around, pulling … Loading tapes in, unloading them, and that’s … This is just sort of a working hypothesis for what might have caused some of those errors. I don’t really have data to back this up, but I do know that the more you put tapes in and out, and the more you play around with them, the more likely you are to have problems.
Christie: You’re kind of adding more entropy to the system that way.
Rebecca: Exactly. More entropy to the system is a really good way to put it.
So unintentionally by the fact that we were using reporting that wasn’t accurate, we were adding more entropy to the system. I do think that probably caused some of our challenges.
Christie: It sounds like … Are you still there?
Rebecca: Yes, I’m still here.
Christie: Okay, scary! It got real quiet so I thought you dropped out again. It sounds like one of the key things that was missing in that system is some sort of periodic check of the archive, for integrity, and to make sure things were where they were supposed to be, and that they could be restored.
Rebecca: Yeah. There were … I don’t know what was … Because again, the archive and IT are two different compartments. For awhile it was considered that the IT department was responsible for that data.
Christie: Ah! But they’re not the subject matter experts in archiving.
Rebecca: Exactly. Also, they’re extremely busy managing many, many, many other crazies and fires that go up all over WGBH.
Part of this whole process was reclaiming archival control over that data. Saying, “All right, this is kind of our job to check up on, to run regular checks, and make sure that we have all the content there. So we’re going to take this over.” That’s why we have our own LTO machines now, and we’re keeping a much closer eye on things, and were are running those regular checks and making sure we have all the information, making sure that we have all the data we need to preserve those files accurately.
Christie: Do you think some of that will be helped by the fact that there is a more advanced file system on the LTO-6 and it’s more user-friendly?
Rebecca: Definitely. It’s a much smaller learning curve now for archivists who aren’t necessarily trained in all the various forms of talking to a computer. It’s a lot easier if there’s a graphical user interface available so that you can go in and intuit what needs to be done. Obviously, you still need a ton of documentation, and that’s something else that we’ve really made implement over the past couple of years, is just documenting everything we do and making sure that there’s information about all the systems we use, and how we did things and why.
So that if in 10 years when none of the rest of us are working on this project anymore; someone comes in and is like, “I need to get this data! Crap, how do I understand it? How do I use these legacy systems?” The information is all there to be found. It does get easier. It’s definitely … I think that the systems we’re using now are much easier to handle and much less scary than when LTO-4 … LTO was sort of more in its infancy and required a lot more understanding of basic computer processes to work with.
Christie: How has this whole experience changed how you think about archiving your own personal digital content?
Rebecca: Oh gosh, don’t ask me about my own personal digital content. What do they say? The shoemaker’s kids go barefoot? I’m terrible about backing up my own stuff. As much as I know what needs to be done in the professional world, for the stuff I’m actually paid to be responsible for.
Christie: Yeah, because it’s such a human thing to be concerned with the here and now, right?
Rebecca: Exactly. You think, “Oh well, I know that I need to be putting everything … Backing up everything up into places and making sure that it’s … Every day it’s going and getting checked.” Of course I don’t actually do that.
The basic principles of any kind of archiving are, if you really want to save something, make sure that it’s in a couple different places. Make sure you know where it is. Make sure you know what it is. That’s sort of the fundamentals for actually being able to check up on it. Make sure you know what the lifespan is of the thing that you’re putting it on. If you put it on a hard drive, and then you put the hard drive in a closet, and then 5 years from now, you come back and you’re like, “Great! All my stuff is on this hard drive!” Odds are that maybe it is, and maybe it isn’t.
Christie: You have a way to read the hard drive, right?
Rebecca: Exactly. You need to make sure that your current computer can access the hard drive. Maybe it was a Windows formatted drive, and now you have a Mac machine. Maybe the drive has failed. Maybe you put … You have all your files in Word Star and now you do not have Word Star on your computer. Whatever it is, there’s a whole bunch of factors to keep in mind. The best way to actually make sure that you’re not forgetting about any of those is just to check pretty frequently and be like, “Oh, I still have Word Star, but no one really uses it anymore, so maybe I should figure out a way to get these into a different format.”
Christie: Yeah, so you can sort of flag those things that are becoming obsolete as they’re becoming obsolete and you still have a chance to move them … Migrate them to something else.
Rebecca: Exactly. It’s kind of a constant vigilance mindset which is hard to do because if you’re not actively working on something and working with something, it can be a bit ‘out of sight, out of mind.’
Even just putting a calendar reminder once a year and saying, “Time to check all the stuff that I care about for personal files,” it’s not a bad notion. Just to make it part of the regular ‘spring cleaning’ or whatever it is. Check on your digital content. Make sure that it’s still accessible and you know where it is, and you know what it is, and you still have a good feeling that it’s going to be safe where it is for another year, or it might be to migrate it forward.
Christie: I think that is a good place to see if you’re ready to take the Recompiler questionnaire.
Christie: We have … I actually have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 questions. You can answer with a short, or as long as you want. We ask all the guests this.
What is your favorite tech?
Rebecca: My favorite tech? Can I go back to magnetic tape?
Rebecca: Because I’m very, very … Before I started my Grad program, I really wasn’t familiar with any kinds of magnetic tech, other than VHS which I used as a kid. But, I kind of fell a little bit in love with U-matic tapes which are the pre-VHS. They’re bigger, they’re clunkier, and when I first was digitizing them, I was like, “Oh, these are old tapes. They’re not going to look good.” Then I put them in, and they look beautiful! So much better than VHS.
Because they were cheap and inexpensive for the time, a lot of people like artists, and small public television stations, and activists were using them to store their … To record stuff in the 70’s and 80’s. So the content that’s on them is all this really neat independent media, and history of community activism, and all this stuff. That’s the one part of my job that I used to have that I really miss, is digitizing all the U-matic content and seeing what was on there. I would go with U-matic tape.
Christie: What is your least favorite tech?
Rebecca: Let’s see. What really gives me headaches? You know what? Printers! Printers. I can’t figure … When a printer is broken, as much as … It’s fun playing around with decks and stuff, and with content, and with older and newer formats of technology, but something as basic as a printer, I just end up banging my head against.
Christie: Still in 2016 … Yeah, I managed to …
Rebecca: Still in 2016.
Christie: I managed the other day, to make our printer spit out pages and pages of gobble-de-goop. Like I did when I was a teenager. I was just like, “How is this still a thing?”
What is one thing about tech that energizes you?
Rebecca: I love working in a field that is constantly changing. I love that there’s always more to learn and that there’s … Things are getting easier, well not necessarily easier. Things are changing all the time, and it’s exhilarating to hear about what’s going on and to see the changes happening and to think back and realize that 2 years ago, some of the stuff that you’re doing would not have been possible.
Christie: Yeah. What is one thing about tech that drains you of energy?
Rebecca: I mean, I guess it’s sort of the same thing. It can be hard to keep up. When you feel like there’s 20 or 30 new things at any given time that you need to be aware of and need to be keeping abreast of and feel like you need to be fluent in. There’s absolutely no way to be fluent in 30 different new technologies at any one time. Picking and choosing where you want your … Where you have to put your attention and what you should be paying attention to can be difficult when you feel like you want to be learning all the things. Sometimes that’s dictated by the work that you end up doing or the places that you end up in your career. You have the choice about what to focus on, and that can be overwhelming.
Christie: Yeah. What is one of your every day carry items?
Rebecca: Tech-wise or just in general?
Christie: Either way. Whichever one you want to answer.
Rebecca: I try to always have just a flash drive with me because the number of times that I quickly needed to quickly move something around is uncountable, and the number of times that I’ve lost that USB flash drive and, “Now I have no way to move this thing from point A to point B,” is also uncountable.
Christie: They’re kind of small and easy to lose.
Rebecca: Yes. Just another reason why you don’t want to use them as a primary back-up medium.
Christie: Good point. What is your favorite song or sound to work to?
Rebecca: It depends. If I’m doing stuff that’s pretty wrote, that’s not engaging too much of my brain, where I’m moving files around or running through a process I know pretty well, then I like having music with cool lyrics and songs that tell a story. If I’m doing something like writing code or writing an article, for example, I actually go back to classical, and I’ll put on something that doesn’t have words. Therefore it can engage a different part of my brain.
Christie: Awesome. What is your least favorite song or sound to work to?
Rebecca: I find it really difficult to work when other people are having conversations near me. That’s something you have to live with, but it’s definitely a huge distraction. The sound of someone else on the telephone, actually, is what I find the most distracting of all. If two people are having a conversation it’s like, “All right, they’re having a conversation. I kind of know what’s going on.” But if someone’s on the telephone, I don’t want to trying to fill in the gaps, but I can’t help it. That’s an area of my brain that could be engaged in doing other things.
Christie: Yeah, that’s interesting. Our brains really like to pattern-match.
Rebecca: They really do.
Christie: What is one project you’ve never worked on that you would like to work on sometime?
Rebecca: I’ve never actually been involved in designing a digital asset management system. That’s something for how to capture all the information that you need to store digital files in the long term. That’s something that, I’ve worked with a lot of them and I’ve worked with testing them and inputting data and outputting data, but I think it would be really cool to have the opportunity to design one from scratch. It would be very, very challenging, obviously. But to think about all those different components, that’s a project that I would like to do someday.
Christie: What is one project that you would not like to work on under any circumstances?
Rebecca: Oh gosh. That’s a hard one to answer! This is kind of going back to my days as a … When I was doing more work with analog tech. When I was in my Grad program, I did some internships and some projects that were focused entirely on film. It was fun. Handling film is fun, but doing it day in and day out and having job that focused entirely on handling old film … For me it didn’t engage me enough.
I think I would not want to go back to working on a project that was entirely working with film. I like working with analog, and I like working with digital, and I like the different components of those. While film is very cool and there’s a glamour to it, I think a lot of people who enter my program, and I think I did too, going, “I’m going to come and out and I’m going to work with old film prints, and I’m going to restore these beautiful movies.” In the long run, I think it’s the newer stuff that’s cooler.
Christie: Awesome. Last one, what is one thing in tech you would like to accomplish by the end of your lifetime?
Rebecca: By the end of my lifetime? You know what, if I can, at the end of my lifetime go back and see any of the stuff that I worked on early in my career in 60 … Well, 60 years is probably just being generous … 40 years from now, if I can go and look at any of the files that I’m working with now or the files that I worked with at the Dance Heritage Coalition and be like, “Hey they’re still here and they’re still around! We did our job!” That would be pretty cool.
Christie: That is a superb answer for an archivist. Thank you for that. That was a great questionnaire. Is there any last things that you want to tell our audience about? Something you think they should follow? Any of these projects? Any other things you want to plug?
Rebecca: Definitely check out the American Archive of Public Broadcasting if you’re at all interested in U. S. history of the past 40 years. It’s a really cool way to experience it. That’s my plug.
I can’t think of anything else. There’s a lot of interesting stuff happening in archives right now. There’s a lot of really cool … All kinds of projects. The DPLA-Digital Public Library of America is a really neat one. It’s a really cool way to access a lot of the content that’s being digitized and put up all around the country.
Keep an eye on it. I’m starting to see more and more articles written about digital archiving and the way that it’s adapting to the digital technology. The last thing I would say is, a lot of those articles do have a little bit of a scare … Apocalyptic angle to them. Like, “Ah, we’re going to lose everything on the web. We’re going to lose everything that’s created digitally.”
I guess I want to say that it’s hard, and we’re struggling, but people are working on it. I do think that … I’m pretty optimistic that 40 years from now we’ll be able to see a lot of the stuff that I’m working on now, so don’t panic.
Christie: We’ll have to do a follow up interview in 40 years to see how things are doing.
Rebecca: Yeah, put it on your calendar.
Christie: All right, thanks Rebecca.
Rebecca: Thanks, have a great afternoon!
Christie: Christie here again for community announcements for episode 7. Check out Dev-Ops Days, Portland coming on August 9th and 10th. Use discount code RECOMPILER2016 for 20% off, and we’re raffling off a free ticket to one lucky Recompiler reader or listener. Enter final link to the show notes and enter by August 1st.
The latest issue of the Recompiler is in the shop now; shop.recompilermag.com. Issue 4: Legacy Systems focuses on how systems and code change over time and how we maintain them. In this issue, we have tips on how to maintain legacy code bases effectively, a history of the genome project and how it relates to urban change, [inaudible 00:52:01] study on working with legacy data, and other interesting and informative articles including Rebecca’s. Order today: shop.recompilermag.com.
That’s a wrap. I’m your host, Christie Koehler. You can find me online on Twitter, where I’m ChristiewithanI3k. You can find this and all previous episodes of the Recompiler podcast on the web at recompilermag.com/podcast. There, you’ll find links to subscribe using your favorite podcast app.
You can also find us directly on iTunes. We love your feedback! Tweet to recompilermag, or send us an email, podcast at recompilermag.com. If you like what you hear on the podcast, the best way to support us is to purchase a paid subscription to the magazine and/or contribute a few bucks on a recurring basis.
You can find links to do either of those things on the website and in the show notes from this episode. It’s been great chatting with you all this week. Talk to you again soon.