Variational inference for semiparametric Bayesian novelty detection in large datasets

Luca Benedetti, Eric Boniardi, Leonardo Chiani, Jacopo Ghirri, Marta Mastropietro, Andrea Cappozzo, Francesco Denti*

*Autore corrispondente per questo lavoro

Risultato della ricerca: Contributo in rivistaArticolopeer review

Abstract

After being trained on a fully-labeled training set, where the observations are grouped\r\ninto a certain number of known classes, novelty detection methods aim to classify the\r\ninstances of an unlabeled test set while allowing for the presence of previously unseen\r\nclasses. These models are valuable in many areas, ranging from social network and\r\nfood adulteration analyses to biology, where an evolving population may be present.\r\nIn this paper, we focus on a two-stage Bayesian semiparametric novelty detector, also\r\nknown as Brand, recently introduced in the literature. Leveraging on a model-based\r\nmixture representation, Brand allows clustering the test observations into known train-\r\ning terms or a single novelty term. Furthermore, the novelty term is modeled with a\r\nDirichlet Process mixture model to flexibly capture any departure from the known pat-\r\nterns. Brand was originally estimated using MCMC schemes, which are prohibitively\r\ncostly when applied to high-dimensional data. To scale up Brand applicability to large\r\ndatasets, we propose to resort to a variational Bayes approach, providing an efficient\r\nalgorithm for posterior approximation. We demonstrate a significant gain in efficiency\r\nand excellent classification performance with thorough simulation studies. Finally, to\r\nshowcase its applicability, we perform a novelty detection analysis using the openly-\r\navailable Statlog dataset, a large collection of satellite imaging spectra, to search\r\nfor novel soil types.
Lingua originaleInglese
pagine (da-a)1-23
Numero di pagine23
RivistaAdvances in Data Analysis and Classification
Numero di pubblicazione18
DOI
Stato di pubblicazionePubblicato - 2023

All Science Journal Classification (ASJC) codes

  • Informatica Applicata
  • Matematica Applicata

Keywords

  • Bayesian modeling
  • Dirichlet process
  • Large datasets
  • Nested mixtures
  • Novelty detection
  • Variational inference

Fingerprint

Entra nei temi di ricerca di 'Variational inference for semiparametric Bayesian novelty detection in large datasets'. Insieme formano una fingerprint unica.

Cita questo