Open Access Open Access  Restricted Access Subscription Access

Classification and Evaluation of Document Image Retrieval and Indexing Approach


Affiliations
1 Department of Computer Engineering, Alzahra University, Tehran, India
2 Department of Electrical and Computer, Qazvin Islamic Azad University (QIAU), Qazvin, Iran, Islamic Republic of
 

Document images are documents that normally begin on paper and are then via electronics scanned. These documents have rich internal structure and might only be available in image form. Supplementally, they may have been created by a union of printing technologies (or by handwriting); and include diagrams, tables, graphics and other non-textual component. Large collections of such complex documents are commonly found in legal investigation. Many approaches come in for indexing and retrieval document images. In this paper we proposed a framework for classify document image retrieval approaches, and then we evaluated these approaches based on important measures.

Keywords

Information Retrieval, Indexing, Document Image, Machine-print, Handwriting
User
Notifications

  • Barges C (1998) A tutorial on support vector machines for pattern recognition. Data Mining & Knowledge Discovery. 2, 121-167.
  • Christian Shin & David S Doermann (2006) Document image retrieval based on layout structural similarity. IPCV. pp: 606-612.
  • Christopher D Manning, Prabhakar Raghavan and Hinrich Schultz (2009) An introduction to information retrieval. Cambridge University Press Cambridge, England.
  • David Doermann (1998) The indexing and retrieval of document images: A survey. Computer Vision & Image Understanding (CVIU). 70, 287-298.
  • Guangyu Zhu & David Doermann (2009) Logo matching for document image retrieval. 10th Intl. Conf. Document Analysis & Recognition. pp: 606- 610.
  • Guangyu Zhu, Yefeng Zheng & David Doermann (2008) Signature-based document image retrieval. ECCV, 3, LNCS 5304. pp: 752-765.
  • Harish Srinivasan & Sargur Srihari (2009) Signature-based retrieval of scanned documents using conditional random fields. Comput. Methods for Counterterrorism. ISBN 978-3-642-01140-5. Springer-Verlag Berlin Heidelberg. pp: 17-32.
  • Lawrence O’Gorman & Rangachar Kasturi (2009) Document image analysis. IEEE Comput. Soc. Executive Briefings, Book.
  • Lawrence Spitz A (1997) Duplicate document detection. Int. Soc. Optical Engg., Document Recognition IV, San Jose. pp: 88-94.
  • Manesh B Kokare & Shirdhonkar MS (2010) Document image retrieval: an overview. Int. J. Computer Applications. 1(7), 114-119.
  • Million Meshesha & Jawahar CV (2008) Matching word images for content-based retrieval from printed document images. Intl. J. Document Analysis & Recognition. 11(1), 29-38.
  • Niyogi D & Srihari S (1997) The use of document structure analysis to retrieve information from documents in digital libraries. Proc. SPIE, Document Recognition IV. 3027, 207-218.
  • Omid E Kia (1997) Document image compression and analysis, submitted of the faculty of the Graduate school of the University of Maryland at college park in partial fulfillment of the requirements of the degree of Doctor of Philosophy.
  • Rusinol M & Llados J (2009) Logo spotting by a bag-of-words approach for document categorization, in ICDAR ’09. Proc. Tenth IntL. Conf. Document Analysis & Recognition, Barcelona, Spain. pp: 111–115.
  • Shijian Lu, Linlin Li & Chew Lim Tan (2008) Document image retrieval through word shape coding. IEEE Trans. Pattern Analysis & Machine Intelligence. 30(11), 1913-1918.
  • Shuyong Bai, Linlin Li and Chew Lim Tan (2009) Keyword spotting in document images through word shape coding. 10th Intl. Conf. Document Analysis & Recognition. pp: 331-335.
  • Simone Marinai (2006) A survey of document image retrieval in digital libraries. 9th Colloque Intl. Francophone sur l'Ecrit et le Document (CIFED). pp: 193-198.
  • Sophea Prum & Muriel Visani Jean-Marc Ogier (2010) On-line handwriting word recognition using a bi-character model. Int. Conf. Pattern Recognition. pp: 2700-2703.
  • Steven M Beitzel, Eric C Jensen and David A Grossman (2003) A survey of retrieval strategies for ocr text collections. Proce. Symp.Document Image Understanding Technol.
  • Wang H & Chen Y (2009) Logo detection in document images based on boundary extension of feature rectangles, in ICDAR ’09: Proc. Tenth Intl. Conf. Document Analysis & Recognition, Barcelona, Spain. pp: 1335–1339.
  • Wikipedia (2012) Optical character recognition. Free Encyclopedia.
  • Zagoris K, Papamarkos N & Chamzas C (2006) Web document image retrieval system based on word spotting. pp: 477-480.
  • Zhang B, Srihari SN & Huang C (2004) Word image retrieval using binary features. Document Recognition & Retrieval XI, SPIE, San Jose, CA. pp: 45-53.
  • Zhe Li, Matthias Schulte-Austum and Martin Neschen (2010) Fast logo detection and recognition in document images. Int. Conf. Pattern Recognition. pp: 2716-2719.
  • Zhu G & Doermann D (2007) Automatic document logo detection, in ICDAR ’07: Proc. Intl. Conf. Document Analysis & Recognition, Washington, DC, USA. pp: 864–868.

Abstract Views: 487

PDF Views: 0




  • Classification and Evaluation of Document Image Retrieval and Indexing Approach

Abstract Views: 487  |  PDF Views: 0

Authors

Mohammadreza Keyvanpour
Department of Computer Engineering, Alzahra University, Tehran, India
Reza Tavoli
Department of Electrical and Computer, Qazvin Islamic Azad University (QIAU), Qazvin, Iran, Islamic Republic of

Abstract


Document images are documents that normally begin on paper and are then via electronics scanned. These documents have rich internal structure and might only be available in image form. Supplementally, they may have been created by a union of printing technologies (or by handwriting); and include diagrams, tables, graphics and other non-textual component. Large collections of such complex documents are commonly found in legal investigation. Many approaches come in for indexing and retrieval document images. In this paper we proposed a framework for classify document image retrieval approaches, and then we evaluated these approaches based on important measures.

Keywords


Information Retrieval, Indexing, Document Image, Machine-print, Handwriting

References