Clustering categorical data via Hamming distance

Edoardo Filippi-Mazzola, Raffaele Argiento, Lucia Paci

Risultato della ricerca: Contributo in libroContributo a convegno


Clustering methods have typically found their application when dealing with continuous data. However, in many modern applications data consist of multiple categorical variables with no natural ordering. In the heuristic framework the problem of clustering these data is tackled by introducing suitable distances. In this work, we develop a model-based approach for clustering categorical data with nominal scale. Our approach is based on a mixture of distributions defined via the Hamming distance between categorical vectors. Maximum likelihood inference is delivered through an expectation-maximization algorithm. A simulation study is carried out to illustrate the proposed approach.
Lingua originaleEnglish
Titolo della pubblicazione ospiteBook of short papers SIS 2021
Numero di pagine6
Stato di pubblicazionePubblicato - 2021
EventoSIS 2021 - Pisa
Durata: 21 giu 202125 giu 2021


ConvegnoSIS 2021


  • Expectation-Maximization algorithm
  • Hamming distribution
  • mixture modeling
  • nominal data


Entra nei temi di ricerca di 'Clustering categorical data via Hamming distance'. Insieme formano una fingerprint unica.

Cita questo