Accelerated EM-based clustering of large data sets

Verbeek, Jakob J.; Nunnink, Jan R. J.; VLASSIS, Nikos

doi:10.1007/s10618-005-0033-3

Download

Article (Scientific journals)

Accelerated EM-based clustering of large data sets

Verbeek, Jakob J.; Nunnink, Jan R. J.; VLASSIS, Nikos

2006 • In Data Mining & Knowledge Discovery, 13 (3), p. 291-307

Peer reviewed

Permalink
https://hdl.handle.net/10993/11037

DOI
10.1007/s10618-005-0033-3

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

download.pdf

Author preprint (249.07 kB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Gaussian mixtures; EM algorithm; free energy; kd-trees; large data sets

Abstract :

[en] Motivated by the poor performance (linear complexity) of the EM algorithm in clustering large data sets, and inspired by the successful accelerated versions of related algorithms like k-means, we derive an accelerated variant of the EM algorithm for Gaussian mixtures that: (1) offers speedups that are at least linear in the number of data points, (2) ensures convergence by strictly increasing a lower bound on the data log-likelihood in each learning step, and (3) allows ample freedom in the design of other accelerated variants. We also derive a similar accelerated algorithm for greedy mixture learning, where very satisfactory results are obtained. The core idea is to define a lower bound on the data log-likelihood based on a grouping of data points. The bound is maximized by computing in turn (i) optimal assignments of groups of data points to the mixture components, and (ii) optimal re-estimation of the model parameters based on average sufficient statistics computed over groups of data points. The proposed method naturally generalizes to mixtures of other members of the exponential family. Experimental results show the potential of the proposed method over other state-of-the-art acceleration techniques.

Disciplines :

Computer science

Identifiers :

UNILU:UL-ARTICLE-2011-716

Author, co-author :

Verbeek, Jakob J.

Nunnink, Jan R. J.

VLASSIS, Nikos ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB)

Language :

English

Title :

Accelerated EM-based clustering of large data sets

Publication date :

2006

Journal title :

Data Mining & Knowledge Discovery

ISSN :

1384-5810

Volume :

Issue :

Pages :

291-307

Peer reviewed :

Peer reviewed

Available on ORBilu :

since 17 November 2013

Statistics

Number of views

102 (0 by Unilu)

Number of downloads

298 (0 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

WoS citations^™