[en] Dirichlet Process (DP) mixture models are promising candidates for clustering applications where the number of clusters is unknown a priori. Due to computational considerations these models are unfortunately unsuitable for large scale data-mining applications. We propose a class of deterministic accelerated DP mixture models that can routinely handle millions of data-cases. The speedup is achieved by incorporating kd-trees into a variational Bayesian algorithm for DP mixtures in the stick-breaking representation, similar to that of Blei and Jordan (2005). Our algorithm differs in the use of kd-trees and in the way we handle truncation: we only assume that the variational distributions are fixed at their priors after a certain level. Experiments show that speedups relative to the standard variational algorithm can be significant.
Disciplines :
Computer science
Identifiers :
UNILU:UL-ARTICLE-2011-711
Author, co-author :
Kurihara, Kenichi
Welling, Max
VLASSIS, Nikos ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB)
Language :
English
Title :
Accelerated variational dirichlet process mixtures
Publication date :
2007
Event name :
Advances in Neural Information Processing Systems 19
Event date :
2007
Main work title :
Advances in Neural Information Processing Systems 19