The PDF file you selected should load here if your Web browser has a PDF reader plug-in installed (for example, a recent version of Adobe Acrobat Reader).

If you would like more information about how to print, save, and work with PDFs, Highwire Press provides a helpful Frequently Asked Questions about PDFs.

Alternatively, you can download the PDF file directly to your computer, from where it can be opened using a PDF reader. To download the PDF, click the Download link above.

Fullscreen Fullscreen Off


Objectives: Internet is the repository of information, which contains enormous information about the past, present which can be used to predict future. To know the unknown users are inclined towards searching the internet rather than referencing the library because of ease of availability. This requirement initiates the need to find the content of a web page with in shortest period of time irrespective of the form the page is. So information and content extraction need to be at a basic generic level and easier to implement without depending on any major software. Methods: The study aims on extraction of information from the available data after the data is digitized. The digitized data is converted to pixel- maps which are universal. The pixel map will not face the issues of the form and the format of the web page content. Statistical method is incorporated to extract the attributes of the images so that issues of language hence text-script and format do not pose problems, the extracted features are presented to the Back Propagation algorithm. Findings: The accuracy is presented and how the content extraction within certain bounds could be possible Tested using unstructured word sets chosen from web pages. The method is demonstrated for mono lingual, multi-lingual and transliterated documents so that the applicability is universal. Applications/Improvement: The method is generic, uses pixel-maps of the data which is software and language independent.

Keywords

Back Propagation, Content Extraction, Information, Statistical, Deterministic.
User