Abstract Views :205 |
PDF Views:0
Authors
Affiliations
1 Division of Instrumentation and Control Engineering, Netaji Subhas Institute of Technology, IN
2 Division of Computer Engineering, Netaji Subhas Institute of Technology, IN
Source
ICTACT Journal on Soft Computing, Vol 5, No 1 (2014), Pagination: 829-835
Abstract
The world-wide-web offers a posse of textual information sources which are ready to be utilized for several applications. In fact, given the rapidly evolving nature of online data, there is a real risk of information overload unless we continue to develop and refine techniques to meaningfully segregate these information sources. Specifically, there is a dearth of content-oriented and intelligent techniques which can learn from past search experiences and also adapt to a user's specific requirements during her current search. In this paper, we tackle the core issue of prioritizing textual information sources on the basis of the relevance of their content to the central theme that a user is currently exploring. We propose a new Source Prioritization Algorithm that adopts an iterative learning approach to assess the proclivity of given information sources towards a set of user-defined seed words in order to prioritise them. The final priorities obtained serve as initial priorities for the next search request. This serves a dual purpose. Firstly, the system learns incrementally from several users' cumulative search experiences and re-adjusts the source priorities to reflect the acquired knowledge. Secondly, the refreshed source priorities are utilized to direct a user's current search towards more relevant sources while adapting also to the new set of keywords acquired from that user. Experimental results show that the proposed algorithm progressively improves the system's ability to discern between different sources, even in the presence of several random sources. Further, it is able to scale well to identify the augmented information source when a new enriched information source is generated by clubbing existing ones.
Keywords
Textual Information Source Prioritization, Search Engines, Domain Specificity, Term-Source Matrix, Text Information Density.