Open Access Open Access  Restricted Access Subscription Access

Classification using Latent Dirichlet Allocation with Naive Bayes Classifier to detect Cyber Bullying in Twitter


Affiliations
1 Bharathiyar University, Coimbatore - 641046, Tamil Nadu, India
2 Panimalar Engineering College, Chennai - 600123, Tamil Nadu, India
 

Objectives: Social networks are becoming a risk for minors especially those are using it regularly. This action can also lead to Cyber bullying. The unstructured texts which are present in the enormous amount of information cannot simply be used for further processing by computers. So, the specific preprocessing methods and algorithms are needed in order to extract useful patterns. Methods/Analysis: One of the important research issues in the field of text mining is Text Classification. The Twitter corpus is used as the training and test data to build a sentiment classifier. The positive or negative sentiments of a new tweet are used to detect Cyber Bullying messages in Twitter using LDA with Naive Bayes classifier. Findings: The result shows that our model gives the better result of precision, recall and F-measure as nearly 70%. Naive Bayes is the most appropriate algorithm comparing with other algorithms like J48 and Knn. The CPU processing time for Naive Bayes algorithm is comparatively less than the other two classification algorithm. Improvements: The performance of the system can be improved by adding extra features to more amount of data.

Keywords

Cyber Bullying, LDA, Naive Bayes, Text Mining, Twitter.
User

Abstract Views: 157

PDF Views: 0




  • Classification using Latent Dirichlet Allocation with Naive Bayes Classifier to detect Cyber Bullying in Twitter

Abstract Views: 157  |  PDF Views: 0

Authors

K. Nalini
Bharathiyar University, Coimbatore - 641046, Tamil Nadu, India
L. Jaba Sheela
Panimalar Engineering College, Chennai - 600123, Tamil Nadu, India

Abstract


Objectives: Social networks are becoming a risk for minors especially those are using it regularly. This action can also lead to Cyber bullying. The unstructured texts which are present in the enormous amount of information cannot simply be used for further processing by computers. So, the specific preprocessing methods and algorithms are needed in order to extract useful patterns. Methods/Analysis: One of the important research issues in the field of text mining is Text Classification. The Twitter corpus is used as the training and test data to build a sentiment classifier. The positive or negative sentiments of a new tweet are used to detect Cyber Bullying messages in Twitter using LDA with Naive Bayes classifier. Findings: The result shows that our model gives the better result of precision, recall and F-measure as nearly 70%. Naive Bayes is the most appropriate algorithm comparing with other algorithms like J48 and Knn. The CPU processing time for Naive Bayes algorithm is comparatively less than the other two classification algorithm. Improvements: The performance of the system can be improved by adding extra features to more amount of data.

Keywords


Cyber Bullying, LDA, Naive Bayes, Text Mining, Twitter.



DOI: https://doi.org/10.17485/ijst%2F2016%2Fv9i28%2F132694