If you have a print collection of newspapers, first ensure that you have copyright permission to reproduce the content for online display. Many third-party vendors are able to scan the paper copy, often providing a better digital copy than would originate from a microfilm or microfiche copy. Scanning in-house is also an option if you have a large bed scanner or a camera with tripod and sufficient staff time to dedicate to this task.
Swift ProSys is an experienced provider of newspaper digitization services. Our staff has many years of experience scanning newspapers both from paper form and from microfilm. If you have a collection of newspapers that you would like to make accessible and searchable, Swift Prosys’s newspaper digitization services are for you.
If you have a collection of newspapers on microfilm or fiche, first ensure that you have copyright permission to reproduce the content for online display. Many third-party vendors will scan hundreds or thousands of images directly from microformats. Scanning microformats in-house is also an option if you have a microfilm reader/scanner that can render high quality digital files, a desktop license for OCR software, and sufficient staff time to dedicate to this task.
“The METS and ALTO have now been utilized for a number of years. Libraries, universities, newspaper publishers, and newspaper aggregators are familiar with these standards.
METS is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, using XML. Though METS is excellent at describing the structure of a digital object, it is missing the ability to describe the content and layout of each piece of the digital object. So an extension to METS, called ALTO (Analyzed Layout and Text Object), is required for this purpose. The combination of METS and ALTO was originally developed by the METAe project, and later was adopted by the Library of Congress for its large-scale National Digital Newspaper Program (NDNP). Since then, METS/ALTO has been used in many newspaper digitization projects—both large and small—as well as a number of projects digitizing books and journals.
A typical METS/ALTO object encodes the complete logical and physical structure of a document (i.e., chapters, sections, articles, pages, etc., and their associated metadata), as well as the full-text content of each section of the document, and even the physical coordinates of every word in the document.
“