Model-Based Clustering of Categorical Data Based on the Hamming Distance

  • R. Argiento
  • , E. Filippi-Mazzola
  • , Lucia Paci*
  • *Autore corrispondente per questo lavoro

Risultato della ricerca: Contributo in rivistaArticolopeer review

Abstract

A model-based approach is developed for clustering categorical data with no natural ordering. The proposed method exploits the Hamming distance to define a family of probability mass functions to model the data. The elements of this family are then considered as kernels of a finite mixture model with an unknown number of components. Conjugate Bayesian inference has been derived for the parameters of the Hamming distribution model. The mixture is framed in a Bayesian nonparametric setting, and a transdimensional blocked Gibbs sampler is developed to provide full Bayesian inference on the number of clusters, their structure, and the group-specific parameters, facilitating the computation with respect to customary reversible jump algorithms. The proposed model encompasses a parsimonious latent class model as a special case when the number of components is fixed. Model performances are assessed via a simulation study and reference datasets, showing improvements in clustering recovery over existing approaches. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
Lingua originaleInglese
pagine (da-a)1178-1188
Numero di pagine11
RivistaJournal of the American Statistical Association
Volume120
Numero di pubblicazione550
DOI
Stato di pubblicazionePubblicato - 2024

All Science Journal Classification (ASJC) codes

  • Statistica e Probabilità
  • Statistica, Probabilità e Incertezza

Keywords

  • Bayesian clustering
  • Conditional algorithm
  • Dirichlet process
  • Finite mixture models
  • Markov chain Monte Carlo

Fingerprint

Entra nei temi di ricerca di 'Model-Based Clustering of Categorical Data Based on the Hamming Distance'. Insieme formano una fingerprint unica.

Cita questo