Open Access Open Access  Restricted Access Subscription Access

Preprocessing Framework for Document Image Analysis


Affiliations
1 Department of Electronics & Communication Engineering, B.L.D.E.A’s V.P. Dr. P.G.Halakatti College of Engineering & Technology, Vijayapur, Karnataka – 586103, India
2 Department of Computer Science & Engineering, B.L.D.E.A’s V.P. Dr. P.G.Halakatti College of Engineering & Technology, Vijayapur, Karnataka – 586103, India
 

Preprocessing is the first step used in all the document image analysis algorithms. A well organized preprocessing could lead to better results of the analysis. This paper proposes a framework for preprocessing of document image for analysis. The frame work uses four steps such as color image to grayscale conversion, enhancement of grayscale image, binarizing the grayscale image and finally removal of clutter-noise. Horizontal and vertical projections are used to detect possible locations of clutter noise in this work. Then foreground pixels are replaced by background colored pixels based on the run length. The frame work provided better results for test images.

Keywords

Analysis, Clutter noise, Noise Removal, Preprocessing.
User
Notifications
Font Size

  • Umesh D. Dixit and M.S.Shirdhonkar, “A Survey on Document Image Analysis and Retrieval System”, International Journal on Cybernetics & Informatics, 4(2), 2015, 259-270.
  • Atena Farahmand, Abdolhossein Sarrafzadeh, Jamshid Shanbehzadeh, “Document Image Noises and Removal Methods”, Proc. of International Conf. of Engineers and Computer Scientists 2013, 1-5.
  • Yue Wang, Jobin J Mathew, Eli Saber, David Larson, Peter Bauer, George Kerby, Jerry Wagner, “Scanned Document Enhancement based on Fast Text detection”, Proc. of IEEE ICASSP 2016, 1961-1965.
  • Nouman Khna and Shalini Puri, “A study on text detection techniques of printed documents”, Proc. of IEEE WiSPNET 2016, 2478-2482.
  • Xujun Peng, Huaigu Cao, Prem Natarajan, “Document image quality assessment using discriminative sparse representation”, Proc. of 12th IAPR Workshop on Document Analysis Systems 2016, 227-232.
  • Vijay Kumar, Amit Bansal, Goutam Hari Tulsiyan, Anand Mishra, Anoop Namboodiri and C. V. Jawahar, “Sparse document image coding for restoration”, Proc. of 12th IEEE Conf. on Document Analysis and Recognition 2013,713-717.
  • Thibault Lelore and Frederic Bouchara, “FAIR: A Fast Algorithm for Document Image Restoration”, IEEE Transactions on Pattern Analysis And Machine Intelligence, 35(8), 2013, 2039-2048.
  • J. Banerjee, A. M. Namboodiri, C. V. Jawahar, "Contextual Restoration of Severely Degraded Document Images," Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2009, 517-524.
  • E. M. Sgarbi, W. A. Della Mura, N. Moya, J. Facon and H. A. L. Ayala, "Restoration of Old Document Images Using Different Color Spaces", Proc.of International Conference on Computer Vision Theory and Applications (VISAPP), 2014, 82-88.
  • H. Deborah and A. M. Arymurthy, "Image Enhancement and Image Restoration for Old Document Image Using Genetic Algorithm", Proc. of Second International Conference on Advances in Computing, Control, and Telecommunication Technologies, 2010, 108-112.
  • M. R. Yagoubi, A. Serir, A. Beghdadi, "Blind Document Image Enhancement Based on Diffusion Process", Proc. of 5th European Workshop on Visual Information Processing (EUVIP), 2014, 1-6.
  • A. Rajwade, A. Rangarajan and A. Banerjee, "Image Denoising Using the Higher Order Singular Value Decomposition", IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(4), 2013, 849-862.
  • Hossein Nezamabadi-pour and Saeid Saryazdi, “An Efficient Method for Document Image Enhancement”, Proc. of International Symposium on Telecommunications, 2005, 175-180.
  • Chew Lim Tan, R. Cao, Peiyi Shen, "Restoration of Archival Documents Using a Wavelet Technique", IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(10), 2002, 1399-1404.
  • E. Balamurugan, P. Sengottuvelan, K. Sangeetha, “Document image restoration using steerable filters based fuzzy unsharp masking”, International Journal of Soft Computing, 9(2), 2014, 88-94.
  • Anika Binte Islam, Fahim Salam Chowdhury, Fariha Nusrat, Kazi Lutful Kabir, Hasan Sarwar, “A Study on Image Enhancement Method for Printed Bangla Document Images”, Proc. of 5th International Conference on Informatics, Electronics and Vision (ICIEV), 2016, 725-730.
  • Anum Masood, Muhammad Alyas Shahid, Muhammad Sharif, “Content-Based Image Retrieval Features: A Survey”, Int. J. Advanced Networking and Applications, 10(1), 2018, 3741-3757.
  • Nagabhushana, Aravinda T V, A V Radhika, “Image Reconstruction Using Wavelet Method”, Int. J. Advanced Networking and Applications, 10(2), 2018, 3804-3807.
  • G.Zhu and D. Doermann, Tobacco-800 Complex Document Image Database and Ground truth. online, 2008. http://lampsrv01.umiacs.umd.edu/projdb/edit/project.php?id=52.
  • Lim, Jae S., Two-Dimensional Signal and Image Processing, Englewood Cliffs, NJ, Prentice Hall, 1990.
  • Otsu N., "A Threshold Selection Method from Gray-Level Histograms”, IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 1979, 62-66.

Abstract Views: 169

PDF Views: 0




  • Preprocessing Framework for Document Image Analysis

Abstract Views: 169  |  PDF Views: 0

Authors

Umesh. D. Dixit
Department of Electronics & Communication Engineering, B.L.D.E.A’s V.P. Dr. P.G.Halakatti College of Engineering & Technology, Vijayapur, Karnataka – 586103, India
M. S. Shirdhonkar
Department of Computer Science & Engineering, B.L.D.E.A’s V.P. Dr. P.G.Halakatti College of Engineering & Technology, Vijayapur, Karnataka – 586103, India

Abstract


Preprocessing is the first step used in all the document image analysis algorithms. A well organized preprocessing could lead to better results of the analysis. This paper proposes a framework for preprocessing of document image for analysis. The frame work uses four steps such as color image to grayscale conversion, enhancement of grayscale image, binarizing the grayscale image and finally removal of clutter-noise. Horizontal and vertical projections are used to detect possible locations of clutter noise in this work. Then foreground pixels are replaced by background colored pixels based on the run length. The frame work provided better results for test images.

Keywords


Analysis, Clutter noise, Noise Removal, Preprocessing.

References