Refine your search
Collections
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Wanas, Nayer
- Clustering Posts in Online discussion forum Threads
Abstract Views :214 |
PDF Views:133
Authors
Dina Said
1,
Nayer Wanas
2
Affiliations
1 Department of Computer Science, University of Calgary, Calgary, CA
2 Informatics Department, Electronics Research Institute, Giza, EG
1 Department of Computer Science, University of Calgary, Calgary, CA
2 Informatics Department, Electronics Research Institute, Giza, EG
Source
AIRCC's International Journal of Computer Science and Information Technology, Vol 3, No 2 (2011), Pagination: 1-14Abstract
Online discussion forums are considered a challenging repository for data mining tasks. Forums usually contain hundreds of threads which in turn consist of hundreds, or even thousands, of posts. Clustering posts can be used to discover outlier and off-topic posts and would provide better visualization and exploration of online threads.In this paper, we propose the Leader-based Post Clustering (LPC), a modification to the Leader algorithm to be applied to the domain of clustering posts in threads of discussion boards. We also suggest using asymmetric pair-wise distances to measure the dissimilarity between posts. We further investigate the effect of indirect distance between posts, and how to calibrate it with the direct distance. In order to evaluate the proposed methods, we conduct experiments using artificial and real threads extracted from Slashdot and Ciao discussion forums. Experimental results demonstrate the effectiveness of the LPC algorithm when using the linear combination of direct and indirect distances, as well as using an averaging approach to evaluate a representative indirect distance. Furthermore, the results show the potential of the LPC algorithm for detecting off-topic or outlier posts compared with two state-of-the-art methods for off-topic post detection.Keywords
Distance Metrics, Clustering, Outlier Detection, Off-Topic Detection, Online Forums Mining.- Utilizing Diagnosing Problems in a Probabilistic Domain to Build Student Models
Abstract Views :179 |
PDF Views:107
Authors
Affiliations
1 Informatics Dept., Electronics Research Institute, Tahrir St., Giza, EG
2 Dept. of Computer Engineering, Cairo University, Giza, EG
1 Informatics Dept., Electronics Research Institute, Tahrir St., Giza, EG
2 Dept. of Computer Engineering, Cairo University, Giza, EG
Source
AIRCC's International Journal of Computer Science and Information Technology, Vol 2, No 4 (2010), Pagination: 88-97Abstract
In this paper we aim to estimate the differential student knowledge model in a probabilistic domain within an intelligent tutoring system. The student answers to questions requiring diagnosing skills are used to estimate the actual student model. Updating and verification of the model are conducted based on the matching between the student's and model answers. Two different approaches to updating are suggested, i) coarse and ii) refined model updating. Moreover, the effect of the order of which questions are presented to the student is investigated. Results suggest that the refined model, although takes more computational resources, provides a slightly better approximation of the student model. In addition, the accuracy of the algorithm is highly insensitive to the order of which the questions are presented, more so when using the refined model updating approach.Keywords
Bayesian Networks, Abduction, Intelligent Tutoring System, Student Modelling.- Detection and Handling of Different Types of Concept Drift in News Recommendation Systems
Abstract Views :270 |
PDF Views:165
Authors
Affiliations
1 Informatics Department, Electronics Research Institute, Giza, EG
2 Computer Engineering Department, Cairo University, Giza, EG
1 Informatics Department, Electronics Research Institute, Giza, EG
2 Computer Engineering Department, Cairo University, Giza, EG
Source
AIRCC's International Journal of Computer Science and Information Technology, Vol 11, No 1 (2019), Pagination: 87-106Abstract
To address the increase in volume of data streams online users interact with, there are a growing number of tools and models to summarize and extract information. These tools use prediction models to personalize and extract useful information. However, data streams are highly prone to the phenomena of concept drift, in which the data distribution changes over time. To maintain the performance level of these models, models should adapt to handle the existence of adrift. In this work, we present the Incremental Knowledge Concept Drift (IKCD) algorithm, an adaptive unsupervised learning algorithm for recommendation systems in news data stream. Data modelling in IKCD uses k-means clustering to determine the occurrence of a drift while avoiding the dependency on the availability of data labels. Once a drift is detected, new retraining data is composed from the old and new concept. IKCD is tested using synthetic and real benchmark datasets from various domains, which demonstrate the different drift types and with different rate of change. Experimental results illustrate an enhanced performance with respect to (a) reducing model sensitivity to noise, (b) reducing model rebuilding frequency up to 50% in case of re-occurring drift and (c) increasing accuracy of the model by about 10% with respect the accuracy of confidence distribution batch detection algorithm.Keywords
Concept Drift, Change Detection, Recommendation Systems.References
- F. Ricci, B. Shapira, and L. Rokach, Recommender systems handbook, Second edition. 2015.
- J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez, “Recommender systems survey,” Knowledge-Based Syst., vol. 46, pp. 109–132, 2013.
- A. Adomavicius, Gediminas and Tuzhilin, “Towards the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions,” IEEE Trans. Knowl. &Data Eng., no. 6, pp. 734--749, 2005.
- I. Žliobaitė, M. Pechenizkiy, and J. Gama, “An Overview of Concept Drift Applications,” pp. 91– 114, 2016.
- J. D. Leskovec, Jure and Rajaraman, Anand and Ullman, Mining of massive datasets. 2014.
- J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A survey on concept drift adaptation,” ACM Comput. Surv., vol. 46, no. 4, pp. 1–37, 2014.
- A. Karpatne, “Predictive Learning with Heterogeneity in Populations,” 2017.
- S. Wang, B. Zou, C. Li, K. Zhao, Q. Liu, and H. Chen, “CROWN: A Context-aware RecOmmender for Web News,” Proc. - Int. Conf. Data Eng., vol. 2015–May, pp. 1420–1423, 2015.
- Y. Kadwe and V. Suryawanshi, “A Review on Concept Drift,” IOSR J. Comput. Eng., vol. 17, no. 1, pp. 20–26, 2015.
- A. and B. Šili´c, “Exploring classification concept drift on a large news text corpus,” in International Conference on Intelligent Text Processing and Computational Linguistics, 2012, pp. 428--437.
- M. M. {Gaber, “Advances in data stream mining,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 2, no. 1, pp. 79--85, 2012.
- E. Lughofer, “On-line active learning: A new paradigm to improve practical useability of data stream modeling methods,” Inf. Sci. (Ny)., vol. 415, pp. 356--376, 2017.
- B. J. Hammer, Hugo Lewi and Yazidi, Anis and Oommen, “On the classification of dynamical data streams using novel ‘Anti-Bayesian’ technique,” Pattern Recognit., vol. 76, pp. 108--124, 2018.
- S.-L. Nguyen, Thi Thu Thuy and Nguyen, Tien Thanh and Liew, Alan Wee-Chung and Wang, “Variational inference based bayes online classifiers with concept drift adaptation,” Pattern Recognit., vol. 81, pp. 280--293, 2018.
- G. Desrosiers, Christian and Karypis, A comprehensive survey of neighborhood-based recommendation methods. 2011.
- I. Žliobaitė, “Learning under Concept Drift: an Overview,” pp. 1–36, 2010.
- D. Brzezinski and J. Stefanowski, “Reacting to Different Types of Concept Drift :,” vol. 25, no. 1, pp. 81–94, 2014.
- A. Tsymbal, “The problem of concept drift: definitions and related work,” Comput. Sci. Dep. Trinity Coll. Dublin, vol. 106, no. 2, 2004.
- G. Widmer and M. Kubat, “Effective learning in dynamic environments by explicit context tracking,” Eur. Conf. Mach. Learn. (ECML 1993), vol. 667, pp. 227–243, 1993.
- D. Klinkenberg, Ralf & Renz, Ingrid & Ag, “Adaptive Information Filtering: Learning in the Presence of Concept Drifts,” 1999.
- P. R. L. Almeida, L. S. Oliveira, A. S. Britto, and R. Sabourin, “Adapting dynamic classifier selection for concept drift,” Expert Syst. Appl., vol. 104, pp. 67–85, 2018.
- Y. Sun, K. Tang, Z. Zhu, and X. Yao, “Concept Drift Adaptation by Exploiting Historical Knowledge,” IEEE Trans. Neural Networks Learn. Syst., vol. 29, no. 10, pp. 4822–4832, 2018.
- I. Khamassi, M. Sayed-Mouchaweh, M. Hammami, and K. Ghédira, “Self-Adaptive Windowing Approach for Handling Complex Concept Drift,” Cognit. Comput., vol. 7, no. 6, pp. 772–790, 2015.
- L. I. Kuncheva, “Classifier ensembles for changing environments,” in International Workshop on Multiple Classifier Systems, 2004, pp. 1--15.
- C. J. Tsai, C. I. Lee, and W. P. Yang, “Mining decision rules on data streams in the presence of concept drifts,” Expert Syst. Appl., vol. 36, no. 2 PART 1, pp. 1164–1178, 2009.
- W. F. Hsiao and T. M. Chang, “An incremental cluster-based approach to spam filtering,” Expert Syst. Appl., vol. 34, no. 3, pp. 1599–1608, 2008.
- R. Bifet, Albert and Gavalda, “Learning from time-changing data with adaptive windowing,” in Proceedings of the 2007 SIAM international conference on data mining, 2007, pp. 443--448.
- P. Lindstrom, B. Mac Namee, and S. J. Delany, “Drift detection using uncertainty distribution divergence,” Evol. Syst., vol. 4, no. 1, pp. 13–25, 2013.
- Y. Kim and C. H. Park, “An efficient concept drift detection method for streaming data under limited labeling,” IEICE Trans. Inf. Syst., vol. E100D, no. 10, pp. 2537–2546, 2017.
- A. Liu, J. Lu, F. Liu, and G. Zhang, “Accumulating regional density dissimilarity for concept drift detection in data streams,” Pattern Recognit., vol. 76, pp. 256–272, 2018.
- F. Ricci, L. Rokach, and B. Shapira, Recommender Systems Handbook. 2015.
- K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, “Feature-rich part-of-speech tagging with a cyclic dependency network,” Proc. 2003 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - NAACL ’03, vol. 1, no. June, pp. 173–180, 2003.
- J. O. JOSEPHSEN, “Hypertensjon og hjertets st??rrelse.,” Nord. Med., vol. 56, no. 37, pp. 1335– 1339, 1956.
- Wael H. Gomaa and Aly A. Fahmy, “A Survey of Text Similarity Approaches,” Int. J. Comput. Appl., vol. 68, no. 13, pp. 13–18, 2013.
- A. Bifet et al., “Early Drift Detection Method,” 4th ECML PKDD Int. Work. Knowl. Discov. from Data Streams, vol. 6, pp. 77–86, 2006.
- P. Gama, Joao and Medas, Pedro and Castillo, Gladys and Rodrigues, “Learning with drift detection,” in Brazilian symposium on artificial intelligence, 2004, pp. 286--295.
- B. M. Sundheim, “Tipster/MUC-5 information extraction system evaluation,” Proc. a Work. held Fredericksburg, Virginia Sept. 19-23, 1993 -, p. 147, 1993.