Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Script Identification from Camera Captured Indian Document Images with CNN Model


Affiliations
1 Department of Mathematical and Computational Sciences, Sri Sathya Sai University for Human Excellence Kalaburagi Campus, India
2 Department of Computer Science, Garden City University, India
3 Department of Computer Science, Central University of Karnataka, India
     

   Subscribe/Renew Journal


Compared to typical scanners, handheld cameras offer convenient, flexible, portable, and noncontact image capture, which enables many new applications and breathes new life into existing ones, but camera-captured documents may suffer from distortions caused by a nonplanar document shape and perspective projection, which lead to the failure of current optical character recognition (OCR) technologies. This paper presents a new CNN model for script identification from camera-captured Indian multilingual document images. To evaluate the performance of the proposed model 9 regional languages, one national language and one international Roman languages are considered. Two languages, Hindi national language, and Roman English language are taken as the common languages with regional language for the study. The proposed method is applied on Bi-script, Tri-script, and Multi-script combinations. The average recognition accuracy for three script combinations is 92.92%, for bi-script 91.33%, and for tri-script 87.33%. is achieved. The proposed method is the unified approach used for identifying the script from bi-script, tri-script and multi-script camera-captured document images and is the novelty of this paper. The proposed model is compared with the Alexnet pretrained CNN model, and it achieved the highest recognition accuracy.

Keywords

OCR, Deep Neural Network, Alexnet, CNN, Script Identification.
Subscription Login to verify subscription
User
Notifications
Font Size

  • D. Doermann and H. Li, “Progress in Camera-Based Document Images Analysis”, Proceedings of International Conference on Document Analysis and Recognition, pp. 606-616, 2013.
  • L. Li and C.L. Tan, “Script Identification of Camera-Based Images”, Proceedings of International Conference on Pattern Recognition, pp. 1-4, 2008.
  • G. Mukarambi, B.V. Dhandra and S. Mallappa. “Camera-based Bi-Lingual Script Identification at Word Level using SFTA Features”, International Journal of Recent Technology and Engineering, Vol. 8, pp. 2988-2994, 2019.
  • B.V. Dhandra, Satishkumar Mallappa and Gururaj Mukarambi, “Script Identification at Line-level using SFTA and LBP Features from Bilingual and Trilingual Documents Captured from the Camera”, International Journal of Computer Applications, Vol. 13, No. 4, pp. 975-980, 2020.
  • B.V. Dhandra, Satishkumar Mallappa and Gururaj Mukarambi, “Script Identification of Camera based Bilingual Document Images using SFTA Features”, International Journal of Human-Computer Interaction, Vol. 15, pp. 1-12, 2019.
  • B.V. Dhandra, Satishkumar Mallappa and Gururaj Mukarambi, “Script Identification from Camera based Tri-Lingual Document”, Proceedings of International Conference on Sensing, Signal Processing Security, pp. 214-217, 2017.
  • . B.V. Dhandra, Satishkumar Mallappa and Gururaj Mukarambi, “Camera-based Tri-Lingual Script Identification at Word Level using a Combination of SFTA and LBP Features”, International Journal of Advanced Science and Technology, Vol. 29, No. 3, pp. 6609-6617, 2020.
  • A.K. Bhunia, A. Konwer, A. Bhowmick, P.P. Roy, and U. Pal, “Script Identification In Natural Scene Image and Video Frames using an Attention based Convolutional-LSTM Network”, Pattern Recognition, Vol. 85, pp. 172-184, 2019.
  • M. Jajoo and R. Sarkar, “Script Identification from Camera- Captured Multi-script Scene Text Components”, Proceedings of International Conference on Recent Developments in Machine Learning and Data Analytics, pp. 740-746, 2019.
  • O.K. Fasil, S. Manjunath and V.N. Manjunath Aradhya, “Word-Level Script Identification from Scene Images”, Advances in Intelligent Systems and Computing, Vol. 516, pp. 417-426, 2019.
  • Xin Zhang, Yongcheng Wang, Ning Zhang, Dongdong Xu and Bo Chen, “Research on Scene Classification Method of High-Resolution Remote Sensing Images based on RFPNet”, Applied Sciences, Vol. 67, No. 1, pp. 1-10, 2019.
  • A.F. Costa, G. Humpire Mamani and A.J.M.H. Traina, “An Efficient Algorithm for Fractal Analysis of Textures”, Proceedings of International Conference on Computer Graphics and Image Processing, pp. 39-46, 2012.
  • Rikiya Yamashita, Mizuho Nishio, Richard Kinh, Gian Do and Kaori Togashi, “Convolutional Neural Networks: An Overview and Application in Radiology”, Insights Imaging, Vol. 9, pp. 611-629, 2018.[14] S. Albawi, T.A. Mohammed and S. Al-Zawi, “Understanding of a Convolutional Neural Network”, Proceedings of International Conference on Engineering and Technology, pp. 1-6, 2017.
  • X. Lu and Tzyy Chyang, “CNN Convolutional Layer Optimisation based on Quantum Evolutionary Algorithm”, Connection Science, Vol. 33, No. 3, pp. 482-494, 2021.
  • S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”, Proceedings of International Conference on Machine Learning, pp. 448-456, 2015.
  • A. Fagbohungbe, and Lijun Qian, “Effect of Batch Normalization on Noise Resistant Property of Deep Learning Models”, Proceedings of International Conference on Machine Learning and Deep Learning, pp. 1-12, 2022.
  • S. Brownlee and A. Jason, “Gentle Introduction to the Rectified Linear Unit (ReLU)”, Proceedings of International Conference on Machine Learning, pp. 1-6, 2019.
  • Chaity Banerjee, Tathagata Mukherjee and Eduardo Pasiliao, “The Multi-Phase ReLU Activation Function”, Proceedings of International Conference on Computing, pp. 1-7, 2020.

Abstract Views: 37

PDF Views: 1




  • Script Identification from Camera Captured Indian Document Images with CNN Model

Abstract Views: 37  |  PDF Views: 1

Authors

Satishkumar Mallappa
Department of Mathematical and Computational Sciences, Sri Sathya Sai University for Human Excellence Kalaburagi Campus, India
B. V. Dhandra
Department of Computer Science, Garden City University, India
Gururaj Mukarambi
Department of Computer Science, Central University of Karnataka, India

Abstract


Compared to typical scanners, handheld cameras offer convenient, flexible, portable, and noncontact image capture, which enables many new applications and breathes new life into existing ones, but camera-captured documents may suffer from distortions caused by a nonplanar document shape and perspective projection, which lead to the failure of current optical character recognition (OCR) technologies. This paper presents a new CNN model for script identification from camera-captured Indian multilingual document images. To evaluate the performance of the proposed model 9 regional languages, one national language and one international Roman languages are considered. Two languages, Hindi national language, and Roman English language are taken as the common languages with regional language for the study. The proposed method is applied on Bi-script, Tri-script, and Multi-script combinations. The average recognition accuracy for three script combinations is 92.92%, for bi-script 91.33%, and for tri-script 87.33%. is achieved. The proposed method is the unified approach used for identifying the script from bi-script, tri-script and multi-script camera-captured document images and is the novelty of this paper. The proposed model is compared with the Alexnet pretrained CNN model, and it achieved the highest recognition accuracy.

Keywords


OCR, Deep Neural Network, Alexnet, CNN, Script Identification.

References