Abstract
We propose a new model for cluster analysis in a Bayesian nonparametric framework.\r\nOur model combines two ingredients, species sampling mixture models of Gaussian\r\ndistributions on one hand, and a deterministic clustering procedure (DBSCAN) on the\r\nother. Here, two observations from the underlying species sampling mixture model\r\nshare the same cluster if the distance between the densities corresponding to their\r\nlatent parameters is smaller than a threshold; this yields a random partition which is\r\ncoarser than the one induced by the species sampling mixture. Since this procedure\r\ndepends on the value of the threshold, we suggest a strategy to fix it. In addition, we\r\ndiscuss implementation and applications of the model; comparison with more standard\r\nclustering algorithms will be given as well. Supplementary materials for the article are\r\navailable online.
| Lingua originale | Inglese |
|---|---|
| pagine (da-a) | 1126-1142 |
| Numero di pagine | 17 |
| Rivista | Journal of Computational and Graphical Statistics |
| Volume | 23 |
| Numero di pubblicazione | 4 |
| DOI | |
| Stato di pubblicazione | Pubblicato - 2014 |
All Science Journal Classification (ASJC) codes
- Statistica e Probabilità
- Matematica Discreta e Combinatoria
- Statistica, Probabilità e Incertezza
Keywords
- Bayesian nonparametrics
- DBSCAN algorithm
- Dirichlet process