Front Index Extraction from Research Documents Using Meta-Content Framework

Tripti Sharma; Sarang Pitale

Front Index Extraction from Research Documents Using Meta-Content Framework

Affiliations
1 Department. of Computer Science, Chhatrapati Shivaji Institute of Technology, Durg, C.G,-491 001, India
2 Department of Information Technology, Bhilai Institute of Technology, Durg, C.G,-491 001, India

Abstract
References
Article Metrics
Refbacks

Text mining is providing new areas of research for the researchers. Front index extraction is one of such area in the field of text mining. Front index for a book is a tabular management of topics and subtopics with page numbers. Various ongoing researches focus on front index extraction from e-books using various techniques such as image processing. The present scheme focuses on front index extraction from research documents using a string matching algorithm. The paper also describe the working of a framework called Meta-Content framework for e-books, MCFE, which uses the front index extraction process and uses the extracted front index as meta information. The framework takes e-book in PDF form and extracts the front index by converting the PDF format e-book in text. The framework is developed using Java and iText library.

Keywords

Text Mining, Front Index, e-book, Meta-information, PDF, Java, itext

About the Journal

Editorial Board

Current Issue

Archives

Advanced Search

Article Submission

Registration

Subscription

User

Notifications

Journal Content
Browse

Information

Sarang Pitale and Tripti Sharma (2011) Information Extraction tools for portable document format, International journal of computer technology and applications,Vol 2 (6), 2047-2051

iText ® - Free / Open Source PDF Library for Java and C# , http://www.itextpdf.com/

César García-Osorio, Carlos Gómez-Palacios, Nicolás García-Pedrajas (2008) A Tool for Teaching LL and LR Parsing Algorithms, Proceedings of the 13th annual conference on Innovation and technology in computer science education, ACM New York, NY, USA ©2008, pp-317-317 M. Young, The Technical Writer's Handbook. Mill Valley, CA: University Science, 1989.

Mandal, S Chowdhury, SP Das, AK and Chanda B (2003) Automated detection and segmentation of table of contents page from document images, Seventh International Conference on Document Analysis and Recognition, vol.1, 398–402.

Shinji Tsuruoka, Chihiro Hirano, Tomohiro Yoshikawa, and Tsuyoshi Shinogi (2001) Workshop on Document Layout Interpretation and its Applications (DLIA).

Abstract Views: 272

PDF Views: 0

Username
Password
Remember me

Username
Password
Remember me

Indian Journal of Education and Information Management

Front Index Extraction from Research Documents Using Meta-Content Framework

Keywords

Front Index Extraction from Research Documents Using Meta-Content Framework

Authors

Abstract

Keywords

References