Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Improved Feature Set Extraction From Documents Using Modified Bag Of Words


Affiliations
1 Department of Computer and Information Science, Annamalai University, India
     

   Subscribe/Renew Journal


In conventional literatures, there are several different methods of collection and extraction and are also used to minimize dimensionality. Traditional methods are intuitively designed to delete redundant and outdated information to help define new test cases more effectively. But the number of specific words in the Bag of Words (BoW) model must be manually calculated, requiring time and work and portability of deficiencies. In addition, the number of codebook vectors in BoW rises as cancer types grow and the efficiency and accuracy of detection are reduced. The BoW model is therefore not ideal for multi-operative failure diagnosis. Therefore, we propose an improved BoW in this paper which selects the number of special terms required to collect cancer diagnostic functions from different documents. The overall recognition and accuracy rates are higher than other existing extraction models. The improved BoW method has been verified to be highly effective in operating conditions that meet the requirements in real time.

Keywords

Bag of Words, Cancer Document Retrieval, Codebook, Dimensionality Reduction.
Subscription Login to verify subscription
User
Notifications
Font Size

Abstract Views: 2

PDF Views: 0




  • Improved Feature Set Extraction From Documents Using Modified Bag Of Words

Abstract Views: 2  |  PDF Views: 0

Authors

R. Sathish Babu
Department of Computer and Information Science, Annamalai University, India
R. Nagarajan
Department of Computer and Information Science, Annamalai University, India

Abstract


In conventional literatures, there are several different methods of collection and extraction and are also used to minimize dimensionality. Traditional methods are intuitively designed to delete redundant and outdated information to help define new test cases more effectively. But the number of specific words in the Bag of Words (BoW) model must be manually calculated, requiring time and work and portability of deficiencies. In addition, the number of codebook vectors in BoW rises as cancer types grow and the efficiency and accuracy of detection are reduced. The BoW model is therefore not ideal for multi-operative failure diagnosis. Therefore, we propose an improved BoW in this paper which selects the number of special terms required to collect cancer diagnostic functions from different documents. The overall recognition and accuracy rates are higher than other existing extraction models. The improved BoW method has been verified to be highly effective in operating conditions that meet the requirements in real time.

Keywords


Bag of Words, Cancer Document Retrieval, Codebook, Dimensionality Reduction.