![]() We weren’t overly concerned about loss because we selected and collated all content prior to digitization, managed the delivery of the digital content and then did virtually 100% quality assurance on what was returned. When that became impractical for the increasingly larger collections because of the number of CDs required for delivery, we set up CD towers that we had to load manually.ĭuring this time we talked amongst ourselves about automated ways to ensure that we had gotten all data off the CDs-intact-but the momentum of managing the throughput and putting the content online by our year-2000 milestone preempted our earliest efforts to develop a process to verify the pre- and post-transfer data. Digital collections that exceeded 50 GB in total seemed huge to us, and we FTP’ed all the incoming data through our Windows desktop machines onto our one web server. We optimized the image sizes for users with dialup connections. ![]() The file naming convention was 8.3 (eight-character filename, three-character extension). People still regularly included both the http and the And when we digitized materials from the collections, we received the scanned images back on CDs with somewhere in the neighborhood of 700 megabytes of capacity. I came to LC in 1997 as part of the National Digital Library Program to help with American Memory, the Library’s flagship digital library on the worldwide web (how people commonly referred to it in those days). A comparison of the decade before BagIt and the decade since illustrates how far we’ve come here in the digital library world of the Library of Congress, and how one little specification gave us a much-needed standardized framework for building ingest and transfer processes that were gigabyte-scale in the beginning and are petabyte-scale now. The early adoption of BagIt at the Library of Congress was critical to our success at expanding digital ingest activities to accommodate the increase in size and number of digital content deliveries that have occurred in the first two decades of this new millennium. A common LC BagIt bag looks something like this:īagIt helped the Library bridge the gap between the old world, where the only digital content that we managed was created through digitization of a small number of select physical collections, and the current era, where in 2018 we received more eJournals through Copyright deposit than we received physical journals for the first time. It also allows for optional basic descriptive elements that are stored within the bag (in a file called bag-info.txt) to provide recipients or custodians of the content with enough information to identify the provenance, contact information, and context for the file delivery or storage package. Named for the concept of “bag it and tag it”, BagIt provides a directory structure and a specifies a set of files for transferring and storing files that includes clear delineations between the digital content itself (stored in a subdirectory called “data”) and the metadata quantifying it, including a manifest of filenames and checksum values (called a manifest). The child of a National Digital Information Infrastructure & Preservation Program (NDIIPP)-era collaboration between LC and the California Digital Library, BagIt derives its simplicity and practicality from the years of lessons learned from digital content transfer and management in the earliest era of the modern digital library. The BagIt File Packaging Format hit two milestones in 2018: it celebrated its tenth anniversary and in October it became an IETF 1.0 specification. ![]() This 10th anniversary celebration of BagIt is a guest post by Liz Madden, Digital Media Project Coordinator in the Office of the Chief Information Officer’s Platform Services Division – and contributor to BagIt’s development, adoption, and use at the Library of Congress.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |