Content Digitization, OCR, Data Capture, eBook Services in INDIA

Case Study


Why METS/ALTO Conversion?

“The METS and ALTO have now been utilized for a number of years. Libraries, universities, newspaper publishers, and newspaper aggregators are familiar with these standards.

METS is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, using XML. Though METS is excellent at describing the structure of a digital object, it is missing the ability to describe the content and layout of each piece of the digital object. So an extension to METS, called ALTO (Analyzed Layout and Text Object), is required for this purpose. The combination of METS and ALTO was originally developed by the METAe project, and later was adopted by the Library of Congress for its large-scale National Digital Newspaper Program (NDNP). Since then, METS/ALTO has been used in many newspaper digitization projects—both large and small—as well as a number of projects digitizing books and journals.

A typical METS/ALTO object encodes the complete logical and physical structure of a document (i.e., chapters, sections, articles, pages, etc., and their associated metadata), as well as the full-text content of each section of the document, and even the physical coordinates of every word in the document.