Clustering categorical data via Hamming distance

Lucia Paci, Raffaele Argiento, Edoardo Filippi-Mazzola

Risultato della ricerca: Contributo in libroContributo a convegno

Abstract

Clustering methods have typically found their application when dealing with continuous data. However, in many modern applications data consist of multiple categorical variables with no natural ordering. In the heuristic framework the problem of clustering these data is tackled by introducing suitable distances. In this work, we develop a model-based approach for clustering categorical data with nominal scale. Our approach is based on a mixture of distributions defined via the Hamming distance between categorical vectors. Maximum likelihood inference is delivered through an expectation-maximization algorithm. A simulation study is carried out to illustrate the proposed approach.
Lingua originaleEnglish
Titolo della pubblicazione ospiteBook of short papers SIS 2021
Pagine752-757
Numero di pagine6
Stato di pubblicazionePubblicato - 2021
EventoSIS 2021 - Pisa
Durata: 21 giu 202125 giu 2021

Convegno

ConvegnoSIS 2021
CittàPisa
Periodo21/6/2125/6/21

Keywords

  • Expectation-Maximization algorithm
  • nominal data
  • mixture modeling
  • Hamming distribution

Fingerprint

Entra nei temi di ricerca di 'Clustering categorical data via Hamming distance'. Insieme formano una fingerprint unica.

Cita questo