A Non-MCMC Procedure for Fitting Dirichlet Process Mixture Models
Date
2012-07-18
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ORCID
Type
Degree Level
Masters
Abstract
Cluster analysis is concerned with partitioning cases into clusters such that the cases in a cluster are similar in terms of a set of variables. The K-Means Clustering Algorithm is a popular clustering method. It finds k clusters by choosing k data points as initial cluster centroids. Each data point is then assigned to the cluster with center that is closest to that point. In K-Means, the number of clusters has to be supplied in advance, which may be difficult in practice. A new method, the X-Means Clustering Algorithm, was proposed to solve this problem, which starts with an initial partition, then recursively runs a local K-Means in each cluster to split it until a lower Bayesian Information Criterion (BIC) value is reached compared with the previous larger cluster. However, this method would introduce a more severe local mode problem, that is, the previous inappropriate partition of cases cannot be corrected in the following local splitting. In this work, we develop a new algorithm that is based on Bayesian Dirichlet process mixture models, called the Non-MCMC DPM clustering algorithm. In the new clustering algorithm, we run the EM algorithm with all the cases to find a tentative partition, and then decide whether to accept the new partion with a criterion called DPC. We have tested our new clustering algorithm based on several simulated data sets, and found that it performs better than X-Means. We have also applied the algorithm to a real micorarray sample data set for predicting the class label (cancer or normal) based on the clustering results found by our new algorithm, and found that the prediction performance is comparable to state-of-the-art methods.
Description
Keywords
Cluster Analysis, Non-MCMC
Citation
Degree
Master of Science (M.Sc.)
Department
Mathematics and Statistics
Program
Mathematics