Show simple item record

dc.contributor.advisorLonghai, Lien_US
dc.creatorLi, Zhengrongen_US
dc.date.accessioned2013-01-03T22:31:43Z
dc.date.available2013-01-03T22:31:43Z
dc.date.created2012-06en_US
dc.date.issued2012-07-18en_US
dc.date.submittedJune 2012en_US
dc.identifier.urihttp://hdl.handle.net/10388/ETD-2012-06-492en_US
dc.description.abstractCluster analysis is concerned with partitioning cases into clusters such that the cases in a cluster are similar in terms of a set of variables. The K-Means Clustering Algorithm is a popular clustering method. It finds k clusters by choosing k data points as initial cluster centroids. Each data point is then assigned to the cluster with center that is closest to that point. In K-Means, the number of clusters has to be supplied in advance, which may be difficult in practice. A new method, the X-Means Clustering Algorithm, was proposed to solve this problem, which starts with an initial partition, then recursively runs a local K-Means in each cluster to split it until a lower Bayesian Information Criterion (BIC) value is reached compared with the previous larger cluster. However, this method would introduce a more severe local mode problem, that is, the previous inappropriate partition of cases cannot be corrected in the following local splitting. In this work, we develop a new algorithm that is based on Bayesian Dirichlet process mixture models, called the Non-MCMC DPM clustering algorithm. In the new clustering algorithm, we run the EM algorithm with all the cases to find a tentative partition, and then decide whether to accept the new partion with a criterion called DPC. We have tested our new clustering algorithm based on several simulated data sets, and found that it performs better than X-Means. We have also applied the algorithm to a real micorarray sample data set for predicting the class label (cancer or normal) based on the clustering results found by our new algorithm, and found that the prediction performance is comparable to state-of-the-art methods.en_US
dc.language.isoengen_US
dc.subjectCluster Analysis, Non-MCMCen_US
dc.titleA Non-MCMC Procedure for Fitting Dirichlet Process Mixture Modelsen_US
thesis.degree.departmentMathematics and Statisticsen_US
thesis.degree.disciplineMathematicsen_US
thesis.degree.grantorUniversity of Saskatchewanen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMaster of Science (M.Sc.)en_US
dc.type.materialtexten_US
dc.type.genreThesisen_US
dc.contributor.committeeMemberSoteros, Chrisen_US
dc.contributor.committeeMemberLaverty, William H.en_US
dc.contributor.committeeMemberMcQuillan, Ianen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record