Open Access
Subscription Access
Front Index Extraction from Research Documents Using Meta-Content Framework
Text mining is providing new areas of research for the researchers. Front index extraction is one of such area in the field of text mining. Front index for a book is a tabular management of topics and subtopics with page numbers. Various ongoing researches focus on front index extraction from e-books using various techniques such as image processing. The present scheme focuses on front index extraction from research documents using a string matching algorithm. The paper also describe the working of a framework called Meta-Content framework for e-books, MCFE, which uses the front index extraction process and uses the extracted front index as meta information. The framework takes e-book in PDF form and extracts the front index by converting the PDF format e-book in text. The framework is developed using Java and iText library.
Keywords
Text Mining, Front Index, e-book, Meta-information, PDF, Java, itext
User
Information
- Sarang Pitale and Tripti Sharma (2011) Information Extraction tools for portable document format, International journal of computer technology and applications,Vol 2 (6), 2047-2051
- iText ® - Free / Open Source PDF Library for Java and C# , http://www.itextpdf.com/
- César García-Osorio, Carlos Gómez-Palacios, Nicolás García-Pedrajas (2008) A Tool for Teaching LL and LR Parsing Algorithms, Proceedings of the 13th annual conference on Innovation and technology in computer science education, ACM New York, NY, USA ©2008, pp-317-317 M. Young, The Technical Writer's Handbook. Mill Valley, CA: University Science, 1989.
- Mandal, S Chowdhury, SP Das, AK and Chanda B (2003) Automated detection and segmentation of table of contents page from document images, Seventh International Conference on Document Analysis and Recognition, vol.1, 398–402.
- Shinji Tsuruoka, Chihiro Hirano, Tomohiro Yoshikawa, and Tsuyoshi Shinogi (2001) Workshop on Document Layout Interpretation and its Applications (DLIA).
Abstract Views: 272
PDF Views: 0