Article (Scientific journals)
Accelerated EM-based clustering of large data sets
Verbeek, Jakob J.; Nunnink, Jan R. J.; Vlassis, Nikos
2006In Data Mining & Knowledge Discovery, 13 (3), p. 291-307
Peer reviewed
 

Files


Full Text
download.pdf
Author preprint (249.07 kB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Gaussian mixtures; EM algorithm; free energy; kd-trees; large data sets
Abstract :
[en] Motivated by the poor performance (linear complexity) of the EM algorithm in clustering large data sets, and inspired by the successful accelerated versions of related algorithms like k-means, we derive an accelerated variant of the EM algorithm for Gaussian mixtures that: (1) offers speedups that are at least linear in the number of data points, (2) ensures convergence by strictly increasing a lower bound on the data log-likelihood in each learning step, and (3) allows ample freedom in the design of other accelerated variants. We also derive a similar accelerated algorithm for greedy mixture learning, where very satisfactory results are obtained. The core idea is to define a lower bound on the data log-likelihood based on a grouping of data points. The bound is maximized by computing in turn (i) optimal assignments of groups of data points to the mixture components, and (ii) optimal re-estimation of the model parameters based on average sufficient statistics computed over groups of data points. The proposed method naturally generalizes to mixtures of other members of the exponential family. Experimental results show the potential of the proposed method over other state-of-the-art acceleration techniques.
Disciplines :
Computer science
Identifiers :
UNILU:UL-ARTICLE-2011-716
Author, co-author :
Verbeek, Jakob J.
Nunnink, Jan R. J.
Vlassis, Nikos ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB)
Language :
English
Title :
Accelerated EM-based clustering of large data sets
Publication date :
2006
Journal title :
Data Mining & Knowledge Discovery
ISSN :
1384-5810
Volume :
13
Issue :
3
Pages :
291-307
Peer reviewed :
Peer reviewed
Available on ORBilu :
since 17 November 2013

Statistics


Number of views
41 (0 by Unilu)
Number of downloads
259 (0 by Unilu)

Scopus citations®
 
39
Scopus citations®
without self-citations
37
OpenCitations
 
33
WoS citations
 
32

Bibliography


Similar publications



Contact ORBilu