Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Fast Expectation Maximization Clustering Algorithm


     

   Subscribe/Renew Journal


Expectation Maximization (EM) is an efficient and widely employed mixturemodel based data clustering algorithm. EM algorithm iterates two prominent computationally demanding steps to name: expectation and maximization steps. Many researchers reported that compared to contemporary clustering algorithms EM algorithm to be giving exceptionally good clustering results, but demands huge computational efforts. In the present paper, methods are proposed to reduce the computational time required for computing probability density function (pdf) which involves computation of a quadratic term and whose computational complexity is O(d2) , where d is number of dimensions. Winograd's approach is employed to reduce computational effort required for Expectation step. Experiments are carried out with popular data mining data sets from UCI ML repository and simulated data sets. The algoritm is observed to be giving about 5 to 6 speed-up compared to standard implementations such as Cluster package of Purdue University.

Keywords

Expectation Maximization, Clustering, Speed-up, Gaussian Mixture Model, Winograd’s method
Subscription Login to verify subscription
User
Notifications
Font Size


  • Cédric Archambeau and John Aldo Lee and Michel Verleysen. On Convergence Problems of the EM Algorithm for Finite Gaussian Mixtures. ESANN, pages 99--106, 2003.
  • Jeff A. Bilmes. A gentle tutorial on the EM algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Technical report, 1997.
  • P. S. Bradley and Usama Fayyad and Cory Reina. Scaling EM (Expectation- Maximization) Clustering to Large Databases. 1999.
  • A. P. Dempster and N. M. Laird and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B, 39(1):1--38, 1977.
  • A. K. Jain and M. N. Murty and P. J. Flynn. Data Clustering: A Review. ACM Computing Surveys, 31(3), 1999.
  • H. D. Jin and M. L. Wong and K. S. Leung. Scalable Model-Based Clustering for Large Databases Based on Data Summarization. IEEE Trans. Pattern Analysis and Machine Intelligence, 27(11):1710--1719, 2005.
  • F. -X. Jollois and M. Nadif. Speed-up for the expectation-maximization algorithm for clustering categorical data. 2007.
  • Wojtek Kowalczyk and Nikos Vlassis. Newscast EM. In NIPS 17, pages 713-- 720, 2005. MIT Press.
  • Radford Neal and Geoffrey E. Hinton. A View Of The EM Algorithm That Justifies Incremental, Sparse, And Other Variants. Learning in Graphical Models, pages 355--368, 1998. Kluwer Academic Publishers.
  • C. K. Reddy and H. D. Chiang and B. Rajaratnam. TRUST-TECH-Based Expectation Maximization for Learning Finite Mixture Models. IEEE Trans. Pattern Analysis and Machine Intelligence, 30(7):1146--1157, 2008.
  • S.P. Smith, C.Y. Lin. Efficient implementation of the new restricted maximum likelihood algorithm. 1989.
  • Ruslan Salakhutdinov and Sam Roweis and Zoubin Ghahramani. Optimization with EM and Expectation-Conjugate-Gradient. pages 672--679, 2003.
  • Bo Thiesson and Christopher Meek and David Heckerman. Accelerating EM for large databases. Technical report, Machine Learning, 2001.
  • N B Venkateswarlu and S. Balaji and R D Boyle. Some Further Results of Three Stage ML Classification Applied to Remotely Sensed Images. Technical report, 1993.
  • Jakob J. Verbeek and Jan R. J. Nunnink and Nikos Vlassis. Accelerated EMbased clustering of large data sets. Data Mining and Knowledge Discovery, 13:2006, 2006.
  • Jason Wolfe and Aria Haghighi and Dan Klein. Fully distributed EM for very large datasets. In William W. Cohen and Andrew McCallum and Sam T. Roweis, editors, Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), Helsinki, Finland, June 5-9, 2008 in ACM International Conference Proceeding Series, pages 1184--1191, 2008. ACM.
  • Lei Xu and Michael I. Jordan. On Convergence Properties of the EM Algorithm for Gaussian Mixtures. Neural Computation, 8:129--151, 1995.
  • R. Xu and D. Wunsch. Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3):645--678, 2005.
  • Ian H. Witten, Eibe Frank, and Mark A. Hall. Data Mining. Third Edition. Morgan Kaufmann.
  • UCL Machine Learning Repository http://archive.ics.uci.edu/ml/datasets.html
  • Purdue University Cluster Software https://engineering.purdue.edu/~bouman/software/cluster/
  • GMM/GMR v2.0 http://www.calinon.ch/sourcecodes.php
  • D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. J. Symbolic Computation, 9:251-280, 1990.
  • Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. Introduction to Data Mining. Pearson Education

Abstract Views: 389

PDF Views: 0




  • Fast Expectation Maximization Clustering Algorithm

Abstract Views: 389  |  PDF Views: 0

Authors

Abstract


Expectation Maximization (EM) is an efficient and widely employed mixturemodel based data clustering algorithm. EM algorithm iterates two prominent computationally demanding steps to name: expectation and maximization steps. Many researchers reported that compared to contemporary clustering algorithms EM algorithm to be giving exceptionally good clustering results, but demands huge computational efforts. In the present paper, methods are proposed to reduce the computational time required for computing probability density function (pdf) which involves computation of a quadratic term and whose computational complexity is O(d2) , where d is number of dimensions. Winograd's approach is employed to reduce computational effort required for Expectation step. Experiments are carried out with popular data mining data sets from UCI ML repository and simulated data sets. The algoritm is observed to be giving about 5 to 6 speed-up compared to standard implementations such as Cluster package of Purdue University.

Keywords


Expectation Maximization, Clustering, Speed-up, Gaussian Mixture Model, Winograd’s method

References