by Rebecca Fraimow
LTO stands for ‘Linear Tape-Open.’ It’s a reasonably intuitive name for a non-proprietary data storage technology (that’s the open part) that allows a user to store hundreds of gigabytes of information on a cartridge of half-inch magnetic tape. To be honest, though, I can never remember that without looking it up—I always end up mentally subbing out ‘Linear Tape’ for ‘Long Term.’
I’m a digital archivist, and we’re supposed to think in the long term. If you talk to most archivists who work with physical materials, they’re pretty confident that the ‘long term’ will stretch into the centuries. We’ve got a decent handle on the process of keeping paper and photographs in good shape; even film will sit happily for a century in the right kind of storage, and maybe much longer. (It’s hard to predict, given that we just started manufacturing film a little over a century ago.) For digital content, it’s not so easy. Digital storage technologies become obsolete at an astoundingly rapid rate—just try finding a computer store that sells a working Zip drive reader. Not only that, digital storage containers are often fragile. How many of us have had the experience of breathing on a hard drive wrong and finding it won’t start up the next day?
Over the past twenty years, digital archivists have come to accept that we can’t just put digital items in storage and expect them to be fine when we come back to them twenty years later. Digital archiving is a hands-on profession. Picking a form of digital storage that seems like it will be relatively stable and accessible for at least a few years is a good first step, but only a first step. Step two (and 2.1, and 2.2 and 2.3, and 2.4…) is checking everything that lives on it frequently to make sure that your bits don’t rot while you’re not paying attention. Step three is planning ahead so that in five years’ time you can migrate it all forward to whatever newer, better, cheaper storage has just come on the market. Lather, rinse, repeat.
For that reason, LTO is a favorite among archival institutions. After the original purchase cost for the equipment and setup, LTO provides relatively inexpensive storage on a relatively stable medium. Magnetic tape doesn’t have an incredibly long lifespan, but in proper storage it will usually sit comfortably for at least ten or twenty years, while over twenty percent of spinning disk drives fail in the first five. A new generation of LTO comes out approximately every 2-3 years, and every LTO deck is backward compatible by two generations. That means that LTO-4 tapes can be read by LTO-5 and LTO-6 decks, allowing archives to schedule regular forward migrations and continue updating their content storage in a standardized fashion – at least, that’s the theory.
In practice, of course, it’s not usually that easy. I work at the public media station WGBH, which for almost a decade has been using a SAM-QFS storage management system with an LTO-4 tape library on the back end. The LTO-4 decks in this system–state of the art, in 2006–are never accessed by humans; the storage management system relays instructions to a tape robot, which retrieves the tape in question, identifies the desired file, and copies it over a network into local storage.
Nobody noticed any difficulties with this system until 2014, when the WGBH Media Library and Archives undertook a massive project to retrieve every single digital video in storage for inclusion in the American Archive of Public Broadcasting (AAPB), a collaboration between WGBH and the Library of Congress to preserve the history of public media. LTO-4 is now on the verge of obsolescence, and the archive was planning to move forward to LTO-6 technology that would be under direct archival control, rather than wrapped up in a proprietary system. The idea was to use the storage management system to pull all the data involved–300 TB of files, some up to 100 GB in size–and, after transferring them to local drives for inclusion in the AAPB, copy the files back onto the LTO-6 tapes in use by the archives.
Unfortunately, we soon realized that some batches of files were showing massive failure rates, in one case over 50%. Even after repeated efforts, the files weren’t making it out of the system successfully. The most frightening part was that, because of the complexity of the interconnected layers of software involved the storage management system, we weren’t exactly sure why and how the files were failing, and whether the LTO-4 tapes themselves might be at fault–which would mean the information that we had been tasked to preserve was permanently lost.
This was about the time when I started working at WGBH and took my turn at poking at the problem. I’m not a storage management expert by any means, but I’ve worked a lot with LTO-6 tape. My primary concern, at this point, was making sure that the data still existed on the LTO-4 tapes. I thought that if I could cut through the complexity of the storage management system and access a sample set of a few LTO-4 tapes directly, I might be able to answer some of our questions about the data. I knew that LTO decks are always backwards compatible two generations, so I figured that popping an LTO-4 tape into our LTO-6 deck and seeing what it contained should be simple.
Obviously, I’d made a few key errors here. I was used to LTO-6 tapes, which are formatted using a specification called LTFS (Linear Tape File System) that allows them to be indexed and viewed like a hard drive or flash disk. LTO-4 tapes predate the development of LTFS, so instead the data on those tapes is formatted in a ‘block’ structure, with chunks of data written as archive files along the magnetic tape. There’s no way to view all this data at once; in order to retrieve it, you have to instruct your computer to read block-by-block through an entire tape, using the built-in mt (magnetic tape) function and copying out data as you go. You also need to know the size of the data blocks used in the initial formatting of the tape, and whether they were written as standard tar files, or using a proprietary format—and since many of the people involved in initially purchasing and managing the storage management system at WGBH had moved on to new jobs in the intervening decade, a lot of that information wasn’t readily available. Just to add an additional wrinkle, the modern Mac computers that we were using to access our LTO-6 decks with LTFS didn’t have the built-in mt function that allows for direct control of a pre-LTFS LTO tape deck.
Still, while a lot of my assumptions about the ease of accessing this legacy technology were wrong, I got lucky in some key respects. After hooking up one of our LTO-6 decks to a Linux machine and playing around a little, I found out that the back-end of our storage management system had been written using standard mt and tar functions, rather than anything proprietary, and a pretty common block structure. Most importantly, I managed to turn up a SAM-FS troubleshooting guide from 2006 with instructions on how to recover data from damaged systems.
Using the instructions I found, I was able to do a massive data dump from the LTO tapes and start sorting through the resulting tar archive files. I was able to show that many of the files that we’d initially tagged as ‘failed’ could still be recovered safely from the LTO tapes when they were accessed directly. The problems that we’d been seeing were with the retrieval system, not the tapes themselves. Some of the tar archive files–about 5%–did appear to be corrupted, but I’d learned enough from my investigations into LTO-4 tapes by this point not to draw any final conclusions from that until I’d had a chance to look into the tapes with a native LTO-4 deck and make sure the problems weren’t due to a poor interaction between older tapes and newer technology. (I still haven’t managed to get my hands on a freestanding LTO-4 deck, so investigation here is pending.)
So what’s the takeaway from all this, other than ‘legacy systems can be a massive pain’? Well, like I said at the beginning: preserving digital stuff for the long term isn’t impossible, but it does mean that you have to cultivate a habit of constant vigilance. Information is only as ‘safe’ as its storage technology is accessible, and even with a technology as relatively predictable as LTO, where you have a sense of when and how things are going to become obsolete, you can still run into nasty surprises. At WGBH, we’re now taking a much more hands-on approach with our LTO-6 data than we did with our SAM-QFS system; we’re documenting everything we do, and making sure that we (and whoever may be doing our job after us) is ready for the next migration forward. The only thing we can know for sure is that it’s not going to be long until our shiny new LTO-6 decks are just as obsolete as the decks that came before them.
Rebecca Fraimow is an archivist who works at WGBH and lives in Boston. She spends a lot of her time banging her head against problems of digital and audiovisual preservation, with occasional breaks to read, attempt useful crafts, and write fiction.
Listen to an interview with Rebecca on our podcast.
We have a print edition too! Find this issue in the shop.