Open Access Open Access  Restricted Access Subscription Access

A New Algorithm For Document Classification Based On Weighting Features and Files


Affiliations
1 Sama Community College, Islamic Azad University, Kazeroon Branch, Kazeroon, Iran, Islamic Republic of
 

With regard to the increasing amount of information in the present world, there is increasing need for new powerful instruments for changing data to useful knowledge. One of the vital ways of controlling and managing data is classifying texts. This article presents an algorithm for classifying documents. It has capabilities such as quality control of created classification based on feedback from F evaluation measure, weighing features based on the classes, assigning weight to each file in all classes and transferring file to a class that has the most weight. This procedure deletes the redundancy words with high quality due to improvement in classes. Finally we evaluate the algorithm, that is, first, the influence of different early random classifications are studied, then the influence of different weighing methods TFCRF،TFRF،TFIDF and the proposed weighing method is investigated on the output of the proposed classification algorithm. Finally, the proposed algorithm is compared with other algorithms. The results show that all mentioned cases collectively increase quality and accuracy of the classification.

Keywords

Documents Classification, Weighting Features, Retrieving Documents.
User
Notifications
Font Size

Abstract Views: 144

PDF Views: 0




  • A New Algorithm For Document Classification Based On Weighting Features and Files

Abstract Views: 144  |  PDF Views: 0

Authors

Mahbubeh Ziaee
Sama Community College, Islamic Azad University, Kazeroon Branch, Kazeroon, Iran, Islamic Republic of

Abstract


With regard to the increasing amount of information in the present world, there is increasing need for new powerful instruments for changing data to useful knowledge. One of the vital ways of controlling and managing data is classifying texts. This article presents an algorithm for classifying documents. It has capabilities such as quality control of created classification based on feedback from F evaluation measure, weighing features based on the classes, assigning weight to each file in all classes and transferring file to a class that has the most weight. This procedure deletes the redundancy words with high quality due to improvement in classes. Finally we evaluate the algorithm, that is, first, the influence of different early random classifications are studied, then the influence of different weighing methods TFCRF،TFRF،TFIDF and the proposed weighing method is investigated on the output of the proposed classification algorithm. Finally, the proposed algorithm is compared with other algorithms. The results show that all mentioned cases collectively increase quality and accuracy of the classification.

Keywords


Documents Classification, Weighting Features, Retrieving Documents.