Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

An Intelligent Search Engine for Extracting Documents Relevant to Poorly Defined Criteria


Affiliations
1 Computer Engineering Department, Cairo University, Giza-12613, Egypt
2 IT Section, Ministry of Finance, Cairo, Egypt
     

   Subscribe/Renew Journal


Information retrieval (IR) deals with the representation, storage, organization and access to information items. Often users' queries to search engines are not well formulated and hence donot express what the user is searching for exactly. Such poorly defined criteria result in the retrieval of documents that donot exactly meet user expectations. Many attempts have been made for refining document retrieval based on interaction with user. Mostly, those attempts provide the user with functionalities for editing queries and marking documents. To many users these functionalities are too complicated and hence users hardly use them.
In this paper we present an intelligent search engine that targets those poorly defined queries and interactively helps users fine tune their search. The user merely specifies those documents among initially retrieved documents that are most relevant to his request. Then the system makes use of users' relevance feedback in response to initial search results and automatically updates the search criteria initially submitted by the user. The search results are then updated to improve the selection of documents retrieved. The system adopts RBIR (Ranked Boolean IR), which is a modified Boolean model that estimates document relevance using keyword weights to rank search results. Its accuracy is comparable with Vector Space, while keeping processing overhead low.
Results show that a remarkable improvement in precision is achieved already at the first iteration after relevance feedback, especially at very poor criteria and low recall. As recall rate increases the improvement in precision drops, however improvement remains even at a recall rate of 100%. Generally, the average performance of RBIR with relevance feedback is always better than vector space and RBIR. The average improvement ranges between 12% and 60% relative to vector space and 32% and 25% relative to RBIR at low recall rates. As queries become less definitive the enhancement is more profound.

Keywords

Boolean IR Model, IR Evaluation, Relevance Feedback, Recall-Precision Measure, Vector Space Model.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 194

PDF Views: 2




  • An Intelligent Search Engine for Extracting Documents Relevant to Poorly Defined Criteria

Abstract Views: 194  |  PDF Views: 2

Authors

Magda B. Fayek
Computer Engineering Department, Cairo University, Giza-12613, Egypt
Hatem M. El-Boghdadi
Computer Engineering Department, Cairo University, Giza-12613, Egypt
Mohamed A. Gawad
IT Section, Ministry of Finance, Cairo, Egypt

Abstract


Information retrieval (IR) deals with the representation, storage, organization and access to information items. Often users' queries to search engines are not well formulated and hence donot express what the user is searching for exactly. Such poorly defined criteria result in the retrieval of documents that donot exactly meet user expectations. Many attempts have been made for refining document retrieval based on interaction with user. Mostly, those attempts provide the user with functionalities for editing queries and marking documents. To many users these functionalities are too complicated and hence users hardly use them.
In this paper we present an intelligent search engine that targets those poorly defined queries and interactively helps users fine tune their search. The user merely specifies those documents among initially retrieved documents that are most relevant to his request. Then the system makes use of users' relevance feedback in response to initial search results and automatically updates the search criteria initially submitted by the user. The search results are then updated to improve the selection of documents retrieved. The system adopts RBIR (Ranked Boolean IR), which is a modified Boolean model that estimates document relevance using keyword weights to rank search results. Its accuracy is comparable with Vector Space, while keeping processing overhead low.
Results show that a remarkable improvement in precision is achieved already at the first iteration after relevance feedback, especially at very poor criteria and low recall. As recall rate increases the improvement in precision drops, however improvement remains even at a recall rate of 100%. Generally, the average performance of RBIR with relevance feedback is always better than vector space and RBIR. The average improvement ranges between 12% and 60% relative to vector space and 32% and 25% relative to RBIR at low recall rates. As queries become less definitive the enhancement is more profound.

Keywords


Boolean IR Model, IR Evaluation, Relevance Feedback, Recall-Precision Measure, Vector Space Model.