File Format Analysis and Preservation Planning Born Digital Collecti
Listed on 2026-01-12
-
IT/Tech
Digital Media / Production, Technical Writer
Contract Opportunity - File Format Analysis and Preservation Planning for Born Digital Collections at Library of Congress
25 April 2022
Digital Collections Management and Services requires contract support to analyze the technical characteristics of complex heterogeneous eBook and eJournal content accessible in the Library of Congress’ (LOC) onsite access platform Stacks to inform preservation planning. This content is published, born digital material acquired from a wide range of publishers through the Cataloging in Publication Program (CIP), and Copyright Deposit through the U.S. Copyright Office.
The contractor shall analyze the technical characteristics of complex heterogeneous eBook and eJournal content accessible in the Library’s onsite access platform Stacks to inform preservation planning. Using specialized tools such as Apache Tika, this research helps understand the structure and composition of over 50,000 ePub files, 100,000 PDF files, and a small number of XML/ONIX for Books, JATS, and HTML files.
Many of these sets of files contain embedded data such as audio, video, and other interactive features that are not fully transparent. This research will inform action plans for access and preservation.
The deliverables from the project shall include:
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).