Separation of Tamil and Devanagari Script Words in Printed Bilingual Document Images

R. Rathinapriya; S. Abirami; B. Manjula

doi:10.36039/ciitaas/1/1/2009/107050.15-21

Separation of Tamil and Devanagari Script Words in Printed Bilingual Document Images

R. Rathinapriya ¹, S. Abirami ², B. Manjula ³

Affiliations
1 Department of Computer Science, Anna University, Chennai, India
2 Department of Computer Science, College of Engineering, Anna University, Chennai, India
3 Computer Science and Engineering Department, College of Engineering, Anna University, Guindy, India

Identification of scripts from bi-script document is one of the important steps in the design of an OCR system for successful analysis and recognition. Most optical character recognition (OCR) systems can recognize at most a few scripts. But for large archives of document images that contain different scripts, there must be some way to automatically categorize these documents before applying the proper OCR on them. Much work has already been reported in this area. In the Indian context, though some results have been reported, the task is still at its infancy. This paper presents a research in the identification of Tamil, Devanagari scripts at word level irrespective of their font faces and sizes. The proposed technique performs document vectorization method which generates vectors from the nine zones segmented over the characters based on their shape, density and transition features. Then script is proposed technique identifies scripts with minimal pre-processing and high accuracy. It can also be extended for other scripts. Since this determined by using Rule based classifiers containing set of classification rules which are raised from the vectors. Results from experiments, simulations, and human vision encounter that the system can act as a plug-in, this can be embedded with OCR prior to the recognition stage.

Keywords

Bi-Lingual Document, Script Identification, Rule Based Classification, Optical Character Recognition (OCR).

I-Scholar

Journal Help

User

Subscription Login to verify subscription

Notifications

Journal Content
Browse

Font Size

Information

Abstract Views: 217

PDF Views: 4

Separation of Tamil and Devanagari Script Words in Printed Bilingual Document Images

Abstract Views: 217 | PDF Views: 4

Authors

R. Rathinapriya
Department of Computer Science, Anna University, Chennai, India

S. Abirami
Department of Computer Science, College of Engineering, Anna University, Chennai, India

B. Manjula
Computer Science and Engineering Department, College of Engineering, Anna University, Guindy, India

Abstract

Keywords

Bi-Lingual Document, Script Identification, Rule Based Classification, Optical Character Recognition (OCR).

DOI: https://doi.org/10.36039/ciitaas%2F1%2F1%2F2009%2F107050.15-21

Username
Password
Remember me

Username
Password
Remember me

Automation and Autonomous Systems

Automation and Autonomous Systems

Separation of Tamil and Devanagari Script Words in Printed Bilingual Document Images

Subscribe/Renew Journal

Keywords

Separation of Tamil and Devanagari Script Words in Printed Bilingual Document Images

Authors

Abstract

Keywords