Content Digitization, OCR, Data Capture, eBook Services in INDIA

Case Study

Digital Library for Political Party, INDIA


The political party is having a huge collection of Books, Newspaper articles and audio and video materials for the past 37 years. And presently day-to-day, they are making new digital content from a different source which is not maintained centrally.

The political leaders have to make a quick decision on day to day basis to have debate or counter-attacks for politicians and also use in social media.

The new members need to know the vision and history of the political party to align with the party’s vision.

Intended Outcomes:

To have centralised repository system to upload, maintain,  search and retrieve the digital content at any point of time from anywhere even from out of Party office.

The messages from party leader which is to be followed by all the level members without any misunderstanding.

To take notes and prepare before going to assembly or TV debates.

To get content to handle their social media.

To counter the opposite party on day to day issues with evidence from the past and present.

The Process: 

To start with, first, the Digital Library is implemented to upload, maintain, search & retrieve of the current day to day activities and past six months materials, which is about 125K pages and thousands of Audio and Video files.

The DL has been customized based on the requirement from a political point of view.

The DL is ready for the party staffs to upload daily content (OCR text with metadata) and for the party leaders to search & retrieve the content to make favours and notes.

Digitizing the archives (old materials), 2.5 million newspaper articles and 10K books, which is under progress based on the monthly budget.

Second Stage, to crawl the content from other government websites, magazines, newspapers and other online media into the DL as a central repository to keep up to date on current scenarios.

Third Stage, to use the digital content efficiently by ML/AI to make auto synopsis for press notes, debates etc.,  To implement the photo search and video search using ML/AI to use those content for social media.


Approx 2.5 million newspaper articles

Approx 10000 books (2.5 million pages)

Several thousand video and audio files