Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Dynamic Induction Model for Student’s Behavior Analysis


Affiliations
1 School of Computer Engineering and Technology, MITWPU, Pune, Maharashtra, India
     

   Subscribe/Renew Journal


The volume of data is growing rapidly due to the usage of social sites like twitter, facebook etc. 80% of the college students spend their maximum time on social media. They share their views, feelings, emotions on it. This massive data is useful for institutes for getting feedback about any student or services provided by them. This feedback will help institutes to provide proper mentoring to students or to take any corrective action which will improve quality of service. The use of machine learning algorithms for analyzing this data will add more knowledge into the knowledge of institutes. Decision tree algorithm provides visual representation of data which is useful for social media data analysis. Traditional machine learning algorithms like C4.5 or CART have a limitation of memory size because they store all data on memory for building a model. So, these algorithms are not suitable for large volume of data. These algorithms performs best if the size of data is small but if size of data increases the same algorithms shows poor results. In this paper, we have used Hoeffding tree for large volume of data and proved with results that Hoeffding tree performs best against other Machine learning algorithms. Other algorithms like SVM, Naïve Bayes, Decision Tree C4.5 work well if the data set is small but their performance degrades if data size increases. To increase accuracy, we have used different classifiers at leaf level and analyzed different split criteria’s. We have collected dataset from twitter social site. Different phases of social media data mining are also explained in detail.

Keywords

Decision Trees, Hoeffding Trees, Social Media Data.
Subscription Login to verify subscription
User
Notifications
Font Size


  • B. Zheng, K. Thompson, S. S. Lam, S. W. Yoon, and N. Gnanasambandam, “Customers’ behavior prediction using artificial neural network,” Proceeding of the 2013 Industrial and Systems Engineering Research Conference, pp. 700-709, 2013.
  • O. J. Mengshoel, R. Desai, A. Chen, and B. Tran, “Will we connect again? Machine learning for link, prediction in mobile social networks,” Eleventh Workshop on Mining and Learning with Graphs, Chicago, Illinois, USA, 2013.
  • C. Yadav, S. Wang, and M. Kumar, “Algorithm and approaches to handle large data - A survey,” International Journal of Computer Science and Network (IJCSN), vol. 2, no. 3, 2013.
  • K. Bakshi, “Considerations for big data: Architecture and approach,” IEEE Aerospace Conference Proceedings, 2012.
  • Global Pulse, “Big data for development: Challenges and opportunities,” May 2012.
  • M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica, “Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing,” in Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, pp. 2-2, 2012.
  • A. Bawa-Cavia, “Sensing the Urban: Using location-based social network data in Urban analysis,” Proceedings of the First Workshop on Pervasive Urban Applications, San Francisco, California, pp. 1-7, 12-15 June 2011.
  • E. Cho, S. A. Myers, and J. Leskovec, “Friendship and mobility: User movement in location-based social networks,” Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, pp. 1082-1090, 21-24 August 2011.
  • L. Tang, and H. Liu, “Leveraging social media networks for classification,” Data Mining and Knowledge Discovery, vol. 23, no. 3, pp. 447-478, 2011.
  • F. O. Catak, M. E. Balaban, “CloudSVM: Training an SVM classifier in cloud computing systems,” in Q. Zu, B. Hu, and A. Elçi, (eds), Pervasive Computing and the Networked World, ICPCA/SWS 2012, Lecture Notes in Computer Science, vol. 7719, Springer, Berlin, Heidelberg, pp. 57-68, 2013.
  • “The big data and standards market research report,” January 2016.
  • C.-Y. Yeh, W.-P. Su, and S.-J. Lee, “Employing multiple-kernel support vector machines for counterfeit banknote recognition,” Applied Soft Computing, vol. 11, no. 1, pp. 1439-1447, 2011.
  • K. I. Kim, K. Jung, S. H. Park, and H. J. Kim, “Support vector machines for texture classification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 11, pp. 1542-1550, 2002.
  • J. Vaidya, B. Shafiq, W. Fan, D. Mehmood, and D. Lorenzi, “A random decision tree framework for privacy-preserving data mining,” IEEE Transactions on Dependable and Secure Computing, vol. 11, no. 5, pp. 399-411, 2014.
  • S. Desai, and S. T. Patil, “Differential evolution algorithm with support vector machine to classify objects efficiently,” International Journal of Advance Research in Computer Science and Management Studies (IJARCSMS), vol. 2, no. 3, pp. 71-74, March 2014.
  • S. Desai, and S. T. Patil, “Efficient regression algorithms for classification of social media data,” 2015 International Conference on Pervasive Computing (ICPC), IEEE, 2015.
  • S. Desai, A. Fakaria, P. Saini, and S. Sinha, “Analyzing trends in social media marketing,” IJCA, December 2014.
  • T. Mitchell, “Decision tree learning,” Princeton University.
  • L. Tang, Z. Ni, H. Xiong, and H. Zhu, “Locating targets through mention in twitter,” World Wide Web, vol. 18, no. 4, pp. 1019-1049, Springer, 2015.
  • M. Naaman, J. Boase, and C.-H. Lai, “Is it really about me?: Message content in social awareness streams,” Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work (CSCW’10), pp. 189-192, 06-10 February 2010.
  • D. Davidov, O. Tsur, and A. Rappoport, “Enhanced sentiment learning using twitter hashtags and smileys,” COLING 2010, pp. 241-249, ACL, Stroudsburg, 2010.
  • J. Hannon, M. Bennett, and B. Smyth, “Recommending twitter users to follow using content and collaborative filtering approaches,” Proceedings of the 2010 ACM Conference on Recommender Systems (RecSys’10), Barcelona, Spain, 26-30 September 2010.
  • H. Kwak, C. Lee, H. Park, and S. Moon, “What is twitter, a social network or a news media?,” Proceedings of the 19th International Conference on World Wide Web (WWW’10), pp. 591-600, 26-30 April 2010.
  • S. Burton, and A. Soboleva, “Interactive or reactive? Marketing with twitter,” Journal of Consumer Marketing, vol. 28, no. 7, pp. 491-499, 2011.
  • Y. S. Kim, and V. Tran, “Selecting core target users for online social networking marketing with target marketing: A preliminary report,” Proceedings of the Seventeenth Americas Conference on Information Systems, Detroit, Michigan, 04-07 August 2011.
  • G. Adomavicius, and Y. Kwon, “Improving aggregate recommendation diversity using ranking-based techniques,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 5, pp. 896-911, 2012.
  • L. Tang, and H. Liu, “Leveraging social media networks for classification,” Data Mining and Knowledge Discovery, vol. 23, no. 3, pp. 447-478, 2011.
  • W. Dai, and W. Ji, “A MapReduce Implementation of C4.5 decision tree algorithm,” International Journal of Database Theory and Application, vol. 7, no. 1, pp. 49-60, 2014.
  • I. Frías-Blanco, J. del Campo-Ávila, G. Ramos-Jiménez, R. Morales-Bueno, A. Ortiz-Díaz, and Y. Caballero-Mota,, “Online and non-parametric drift detection methods based on hoeffding’s bounds,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 3, pp. 810-823, March 2015.
  • M. Karim, and M. R. Rahman, “Decision tree and Naïve Bayes algorithm for classification and generation of actionable knowledge for direct marketing,” Journal of Software Engineering and Applications, vol. 6, no. 4, pp. 196-206, 2013.
  • P. Zhang, X. Wu, X. Wang, and S. Bi, “Short-term load forecasting based on big data technologies,” CSEE Journal of Power and Energy Systems, vol. 1, no. 3, pp. 59-67, 2015.
  • http://www.bestcolleges.com/resources/top-5-mental-health-problems-facing-college-students/
  • D. Wang, A. Al-Rubaie, A. A. Dhanhani, and J. Ng, “Smart text-classification of user-generated data in educational social networks,” 2015 IEEE Frontiers in Education Conference (FIE), IEEE, El Paso, TX, USA, pp. 1-5, 2015.
  • S. Cetintas, L. Si, H. P. Aagard, K. Bowen, and M. Cordova-Sanchez, “Microblogging in a classroom: Classifying students’ relevant and irrelevant questions in a microblogging-supported classroom,” IEEE Transactions on Learning Technologies, vol. 4, no. 4, pp. 292-300, October-December 2011.
  • http://www.technicianonline.com/opinion/article_d1142b70-5a92-11e5-86b4-cb7c98a6e45f.html

Abstract Views: 216

PDF Views: 0




  • Dynamic Induction Model for Student’s Behavior Analysis

Abstract Views: 216  |  PDF Views: 0

Authors

Sharmishta Desai
School of Computer Engineering and Technology, MITWPU, Pune, Maharashtra, India

Abstract


The volume of data is growing rapidly due to the usage of social sites like twitter, facebook etc. 80% of the college students spend their maximum time on social media. They share their views, feelings, emotions on it. This massive data is useful for institutes for getting feedback about any student or services provided by them. This feedback will help institutes to provide proper mentoring to students or to take any corrective action which will improve quality of service. The use of machine learning algorithms for analyzing this data will add more knowledge into the knowledge of institutes. Decision tree algorithm provides visual representation of data which is useful for social media data analysis. Traditional machine learning algorithms like C4.5 or CART have a limitation of memory size because they store all data on memory for building a model. So, these algorithms are not suitable for large volume of data. These algorithms performs best if the size of data is small but if size of data increases the same algorithms shows poor results. In this paper, we have used Hoeffding tree for large volume of data and proved with results that Hoeffding tree performs best against other Machine learning algorithms. Other algorithms like SVM, Naïve Bayes, Decision Tree C4.5 work well if the data set is small but their performance degrades if data size increases. To increase accuracy, we have used different classifiers at leaf level and analyzed different split criteria’s. We have collected dataset from twitter social site. Different phases of social media data mining are also explained in detail.

Keywords


Decision Trees, Hoeffding Trees, Social Media Data.

References