ABSTRACT
Nowadays, within the period of having huge information, literary information is rapidly developing and is
accessible in numerous diverse languages. Often due to time limitations, we are not able to devour all the
information that is accessible. With the fast-paced world, it is troublesome to peruse all the textual content.
Therefore, the necessity for content summarization comes to the spotlight. It is in this manner we are able to
summarize the content so that it gets easier to ingest the data, keeping up the substance, and understanding the
data. A few content summarization approaches have been presented in the past for a long time for English and
some other European languages but there are startlingly few methods that can be found for the local languages of
India. This paper presents a study of extractive content summarization methods for multiple Indian and
international languages like Hindi, Kannada, Telugu, Marathi, German, French, etc. This paper proposes a system
of Optical Character Recognition (OCR) which extracts the content from the uploaded picture. The main motive
of the OCR is the creation of editable records from documents that already exist or picture files. The Optical
Character Recognition also works on sentence discovery to protect a document’s structure. The paper also
presents a strategy for programmed sentence extraction utilizing the Text-rank algorithm. This approach relegates
scores to the sentences by weighting the highlights like term frequency, word events, and noun weight and
expressions. The outcome of this work demonstrates that our approach gives more accuracy and also provides
text-to-speech with the interpretation of one language to another while maintaining coherence and accomplishes
superior results when compared with existing methods.
Keywords: - Natural Language Processing, Optical-Character Recognition, Summarization, Text-rank algorithm,
Text-to-speech.