Open Access Open Access  Restricted Access Subscription Access

Front Index Extraction from Research Documents Using Meta-Content Framework


Affiliations
1 Department. of Computer Science, Chhatrapati Shivaji Institute of Technology, Durg, C.G,-491 001, India
2 Department of Information Technology, Bhilai Institute of Technology, Durg, C.G,-491 001, India
 

Text mining is providing new areas of research for the researchers. Front index extraction is one of such area in the field of text mining. Front index for a book is a tabular management of topics and subtopics with page numbers. Various ongoing researches focus on front index extraction from e-books using various techniques such as image processing. The present scheme focuses on front index extraction from research documents using a string matching algorithm. The paper also describe the working of a framework called Meta-Content framework for e-books, MCFE, which uses the front index extraction process and uses the extracted front index as meta information. The framework takes e-book in PDF form and extracts the front index by converting the PDF format e-book in text. The framework is developed using Java and iText library.

Keywords

Text Mining, Front Index, e-book, Meta-information, PDF, Java, itext
User
Notifications

  • Sarang Pitale and Tripti Sharma (2011) Information Extraction tools for portable document format, International journal of computer technology and applications,Vol 2 (6), 2047-2051
  • iText ® - Free / Open Source PDF Library for Java and C# , http://www.itextpdf.com/
  • César García-Osorio, Carlos Gómez-Palacios, Nicolás García-Pedrajas (2008) A Tool for Teaching LL and LR Parsing Algorithms, Proceedings of the 13th annual conference on Innovation and technology in computer science education, ACM New York, NY, USA ©2008, pp-317-317 M. Young, The Technical Writer's Handbook. Mill Valley, CA: University Science, 1989.
  • Mandal, S Chowdhury, SP Das, AK and Chanda B (2003) Automated detection and segmentation of table of contents page from document images, Seventh International Conference on Document Analysis and Recognition, vol.1, 398–402.
  • Shinji Tsuruoka, Chihiro Hirano, Tomohiro Yoshikawa, and Tsuyoshi Shinogi (2001) Workshop on Document Layout Interpretation and its Applications (DLIA).

Abstract Views: 272

PDF Views: 0




  • Front Index Extraction from Research Documents Using Meta-Content Framework

Abstract Views: 272  |  PDF Views: 0

Authors

Tripti Sharma
Department. of Computer Science, Chhatrapati Shivaji Institute of Technology, Durg, C.G,-491 001, India
Sarang Pitale
Department of Information Technology, Bhilai Institute of Technology, Durg, C.G,-491 001, India

Abstract


Text mining is providing new areas of research for the researchers. Front index extraction is one of such area in the field of text mining. Front index for a book is a tabular management of topics and subtopics with page numbers. Various ongoing researches focus on front index extraction from e-books using various techniques such as image processing. The present scheme focuses on front index extraction from research documents using a string matching algorithm. The paper also describe the working of a framework called Meta-Content framework for e-books, MCFE, which uses the front index extraction process and uses the extracted front index as meta information. The framework takes e-book in PDF form and extracts the front index by converting the PDF format e-book in text. The framework is developed using Java and iText library.

Keywords


Text Mining, Front Index, e-book, Meta-information, PDF, Java, itext

References