Skip to main navigation Skip to search Skip to main content

Hierarchical Normalized Completely Random Measures to Cluster Grouped Data

  • Raffaele Argiento
  • , A. Cremaschi*
  • , M. Vannucci
  • *Corresponding author
  • University of Oslo
  • Department of Bioengineering

Research output: Contribution to journalArticle

Abstract

In this article, we propose a Bayesian nonparametric model for clustering grouped data. We adopt a hierarchical approach: at the highest level, each group of data is modeled according to a mixture, where the mixing distributions are conditionally independent normalized completely random measures (NormCRMs) centered on the same base measure, which is itself a NormCRM. The discreteness of the shared base measure implies that the processes at the data level share the same atoms. This desired feature allows to cluster together observations of different groups. We obtain a representation of the hierarchical clustering model by marginalizing with respect to the infinite dimensional NormCRMs. We investigate the properties of the clustering structure induced by the proposed model and provide theoretical results concerning the distribution of the number of clusters, within and between groups. Furthermore, we offer an interpretation in terms of generalized Chinese restaurant franchise process, which allows for posterior inference under both conjugate and nonconjugate models. We develop algorithms for fully Bayesian inference and assess performances by means of a simulation study and a real-data illustration. Supplementary materials for this article are available online.
Original languageEnglish
Pages (from-to)1-26
Number of pages26
JournalJournal of the American Statistical Association
Issue numberNA
DOIs
Publication statusPublished - 2019

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Keywords

  • Bayesian nonparametrics
  • Clustering
  • Hierarchical models
  • Mixture models

Fingerprint

Dive into the research topics of 'Hierarchical Normalized Completely Random Measures to Cluster Grouped Data'. Together they form a unique fingerprint.

Cite this