Open Access Open Access  Restricted Access Subscription Access

Federated Document Summarization Using Probabilistic Approach for Kannada Language


 

The number of documents and the amount of information available online is being overloaded. From the last one decade information is getting doubled in size leading to the concept of big data; at the same time, it is being saved in unstructured manner. People used to collect huge amount of information related to many issues and areas, whether it is useful or not at that moment, and when it is required to get the needed information out of the collected information, summarization of that particular document can be made. Summaries of large documents will help to find the correct information. In this work, we present a method to produce extractive summaries of documents in Kannada language, limited to the number of sentences mentioned by user. This paper proposes a federated approach to summarization combining Text Rank algorithm and Naïve Bayesian approach. Text Rank uses keyword extraction to rank the sentences with Jaccard’s similarity score. The sentences with higher ranks are expected to be a part of summary. Since Text Rank is unsupervised, the proposed work uses Naïve Bayesian to incorporate supervised learning aspects. Training sets are prepared for certain category of Kannada documents, followed by training the system.


Keywords

big data, information, summary, federated, Text Rank, Naïve Bayesian, similarity, supervised, unsupervised
User
Notifications
Font Size

Abstract Views: 137

PDF Views: 2




  • Federated Document Summarization Using Probabilistic Approach for Kannada Language

Abstract Views: 137  |  PDF Views: 2

Authors

Abstract


The number of documents and the amount of information available online is being overloaded. From the last one decade information is getting doubled in size leading to the concept of big data; at the same time, it is being saved in unstructured manner. People used to collect huge amount of information related to many issues and areas, whether it is useful or not at that moment, and when it is required to get the needed information out of the collected information, summarization of that particular document can be made. Summaries of large documents will help to find the correct information. In this work, we present a method to produce extractive summaries of documents in Kannada language, limited to the number of sentences mentioned by user. This paper proposes a federated approach to summarization combining Text Rank algorithm and Naïve Bayesian approach. Text Rank uses keyword extraction to rank the sentences with Jaccard’s similarity score. The sentences with higher ranks are expected to be a part of summary. Since Text Rank is unsupervised, the proposed work uses Naïve Bayesian to incorporate supervised learning aspects. Training sets are prepared for certain category of Kannada documents, followed by training the system.


Keywords


big data, information, summary, federated, Text Rank, Naïve Bayesian, similarity, supervised, unsupervised