Fast Expectation Maximization Clustering Algorithm

D.J Nagendra Kumar; J.V.R. Murthy; N.B. Venkateswarlu

Fast Expectation Maximization Clustering Algorithm

D.J Nagendra Kumar , J.V.R. Murthy , N.B. Venkateswarlu

Subscribe/Renew Journal

Abstract
References
Article Metrics
Refbacks

Expectation Maximization (EM) is an efficient and widely employed mixturemodel based data clustering algorithm. EM algorithm iterates two prominent computationally demanding steps to name: expectation and maximization steps. Many researchers reported that compared to contemporary clustering algorithms EM algorithm to be giving exceptionally good clustering results, but demands huge computational efforts. In the present paper, methods are proposed to reduce the computational time required for computing probability density function (pdf) which involves computation of a quadratic term and whose computational complexity is O(d²) , where d is number of dimensions. Winograd's approach is employed to reduce computational effort required for Expectation step. Experiments are carried out with popular data mining data sets from UCI ML repository and simulated data sets. The algoritm is observed to be giving about 5 to 6 speed-up compared to standard implementations such as Cluster package of Purdue University.

Keywords

Expectation Maximization, Clustering, Speed-up, Gaussian Mixture Model, Winograd’s method

I-Scholar

Journal Help

Subscription Login to verify subscription

User

Notifications

Journal Content
Browse

Font Size

Information

Cédric Archambeau and John Aldo Lee and Michel Verleysen. On Convergence Problems of the EM Algorithm for Finite Gaussian Mixtures. ESANN, pages 99--106, 2003.

Jeff A. Bilmes. A gentle tutorial on the EM algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Technical report, 1997.

P. S. Bradley and Usama Fayyad and Cory Reina. Scaling EM (Expectation- Maximization) Clustering to Large Databases. 1999.

A. P. Dempster and N. M. Laird and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B, 39(1):1--38, 1977.

A. K. Jain and M. N. Murty and P. J. Flynn. Data Clustering: A Review. ACM Computing Surveys, 31(3), 1999.

H. D. Jin and M. L. Wong and K. S. Leung. Scalable Model-Based Clustering for Large Databases Based on Data Summarization. IEEE Trans. Pattern Analysis and Machine Intelligence, 27(11):1710--1719, 2005.

F. -X. Jollois and M. Nadif. Speed-up for the expectation-maximization algorithm for clustering categorical data. 2007.

Wojtek Kowalczyk and Nikos Vlassis. Newscast EM. In NIPS 17, pages 713-- 720, 2005. MIT Press.

Radford Neal and Geoffrey E. Hinton. A View Of The EM Algorithm That Justifies Incremental, Sparse, And Other Variants. Learning in Graphical Models, pages 355--368, 1998. Kluwer Academic Publishers.

C. K. Reddy and H. D. Chiang and B. Rajaratnam. TRUST-TECH-Based Expectation Maximization for Learning Finite Mixture Models. IEEE Trans. Pattern Analysis and Machine Intelligence, 30(7):1146--1157, 2008.

S.P. Smith, C.Y. Lin. Efficient implementation of the new restricted maximum likelihood algorithm. 1989.

Ruslan Salakhutdinov and Sam Roweis and Zoubin Ghahramani. Optimization with EM and Expectation-Conjugate-Gradient. pages 672--679, 2003.

Bo Thiesson and Christopher Meek and David Heckerman. Accelerating EM for large databases. Technical report, Machine Learning, 2001.

N B Venkateswarlu and S. Balaji and R D Boyle. Some Further Results of Three Stage ML Classification Applied to Remotely Sensed Images. Technical report, 1993.

Jakob J. Verbeek and Jan R. J. Nunnink and Nikos Vlassis. Accelerated EMbased clustering of large data sets. Data Mining and Knowledge Discovery, 13:2006, 2006.

Jason Wolfe and Aria Haghighi and Dan Klein. Fully distributed EM for very large datasets. In William W. Cohen and Andrew McCallum and Sam T. Roweis, editors, Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), Helsinki, Finland, June 5-9, 2008 in ACM International Conference Proceeding Series, pages 1184--1191, 2008. ACM.

Lei Xu and Michael I. Jordan. On Convergence Properties of the EM Algorithm for Gaussian Mixtures. Neural Computation, 8:129--151, 1995.

R. Xu and D. Wunsch. Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3):645--678, 2005.

Ian H. Witten, Eibe Frank, and Mark A. Hall. Data Mining. Third Edition. Morgan Kaufmann.

UCL Machine Learning Repository http://archive.ics.uci.edu/ml/datasets.html

Purdue University Cluster Software https://engineering.purdue.edu/~bouman/software/cluster/

GMM/GMR v2.0 http://www.calinon.ch/sourcecodes.php

D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. J. Symbolic Computation, 9:251-280, 1990.

Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. Introduction to Data Mining. Pearson Education

Abstract Views: 389

PDF Views: 0

International Journal of Computational Intelligence Research

Fast Expectation Maximization Clustering Algorithm

Subscribe/Renew Journal

Keywords

Fast Expectation Maximization Clustering Algorithm

Authors

Abstract

Keywords

References

Username
Password
Remember me

Username
Password
Remember me