References of "Verbeek, Jakob J"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailAccelerated EM-based clustering of large data sets
Verbeek, Jakob J.; Nunnink, Jan R. J.; Vlassis, Nikos UL

in Data Mining & Knowledge Discovery (2006), 13(3), 291-307

Motivated by the poor performance (linear complexity) of the EM algorithm in clustering large data sets, and inspired by the successful accelerated versions of related algorithms like k-means, we derive ... [more ▼]

Motivated by the poor performance (linear complexity) of the EM algorithm in clustering large data sets, and inspired by the successful accelerated versions of related algorithms like k-means, we derive an accelerated variant of the EM algorithm for Gaussian mixtures that: (1) offers speedups that are at least linear in the number of data points, (2) ensures convergence by strictly increasing a lower bound on the data log-likelihood in each learning step, and (3) allows ample freedom in the design of other accelerated variants. We also derive a similar accelerated algorithm for greedy mixture learning, where very satisfactory results are obtained. The core idea is to define a lower bound on the data log-likelihood based on a grouping of data points. The bound is maximized by computing in turn (i) optimal assignments of groups of data points to the mixture components, and (ii) optimal re-estimation of the model parameters based on average sufficient statistics computed over groups of data points. The proposed method naturally generalizes to mixtures of other members of the exponential family. Experimental results show the potential of the proposed method over other state-of-the-art acceleration techniques. [less ▲]

Detailed reference viewed: 73 (0 UL)
Peer Reviewed
See detailGaussian fields for semi-supervised regression and correspondence learning
Verbeek, Jakob J.; Vlassis, Nikos UL

in Pattern Recognition (2006), 39(10), 1864-1875

Gaussian fields (GF) have recently received considerable attention for dimension reduction and semi-supervised classification. In this paper we show how the GF framework can be used for semi-supervised ... [more ▼]

Gaussian fields (GF) have recently received considerable attention for dimension reduction and semi-supervised classification. In this paper we show how the GF framework can be used for semi-supervised regression on high-dimensional data. We propose an active learning strategy based on entropy minimization and a maximum likelihood model selection method. Furthermore, we show how a recent generalization of the LLE algorithm for correspondence learning can be cast into the GF framework, which obviates the need to choose a representation dimensionality. [less ▲]

Detailed reference viewed: 78 (0 UL)
Full Text
Peer Reviewed
See detailThe global k-means clustering algorithm
Likas, Aristidis; Vlassis, Nikos UL; Verbeek, Jakob J.

in Pattern Recognition (2003), 36(2), 451-461

We present the global k-means algorithm which is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure consisting of N ... [more ▼]

We present the global k-means algorithm which is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure consisting of N (with N being the size of the data set) executions of the k-means algorithm from suitable initial positions. We also propose modifications of the method to reduce the computational load without significantly affecting solution quality. The proposed clustering methods are tested on well-known data sets and they compare favorably to the k-means algorithm with random restarts. [less ▲]

Detailed reference viewed: 176 (1 UL)
Full Text
Peer Reviewed
See detailCoordinating Principal Component Analyzers
Verbeek, Jakob J.; Vlassis, Nikos UL; Kröse, Ben J. A.

in Proc. Int. Conf. on Artificial Neural Networks, Madrid, Spain, (2002)

Mixtures of Principal Component Analyzers can be used to model high dimensional data that lie on or near a low dimensional manifold. By linearly mapping the PCA subspaces to one global low dimensional ... [more ▼]

Mixtures of Principal Component Analyzers can be used to model high dimensional data that lie on or near a low dimensional manifold. By linearly mapping the PCA subspaces to one global low dimensional space, we obtain a `global' low dimensional coordinate system for the data. As shown by Roweis et al., ensuring consistent global low-dimensional coordinates for the data can be expressed as a penalized likelihood optimization problem. We show that a restricted form of the Mixtures of Probabilistic PCA model allows for a more efficient algorithm. Experimental results are provided to illustrate the viability method. [less ▲]

Detailed reference viewed: 59 (0 UL)