Optimal subset selection without outliers

Laura Deldossi, E. Pesce, C. Tommasi

Risultato della ricerca: Contributo in libroContributo a convegno

Abstract

With the advent of ‘Big Data’, massive data sets are becoming increasingly prevalent. Several subdata selection are proposed in these last few years both to reduce the computational burden and to improve cost effectiveness and learning of the phenomenon. Some of these proposals (Drovandi et al., 2017; Wang et al., 2019; Deldossi and Tommasi (2021) among others) are inspired to Optimal Experimental Design (OED). However, differently from the OED context - where researchers have typically complete control over the predictors - in subsampling methods these, and the responses as well, are passively observed. Thus if outliers are present in the ‘Big Data’, it is likely that they could be included in the sample selected applying the D-criterion, being the D-optimal design points on the boundary of the design space. In regression analysis, outliers - and more in general influential points – could have a large impact on the estimates; identify and exclude them in advance, especially in large datasets, is generally not an easy task. In this study, we propose an exchange procedure to select a compromise-optimal subset which is informative for the inferential goal and avoids outliers and ‘bad’ influential points
Lingua originaleEnglish
Titolo della pubblicazione ospiteProgramme and Abstracts, 22nd Annual ENBIS Conference, Trondheim, 26-30 June 2022
Pagine34-35
Numero di pagine2
Volume2022
Stato di pubblicazionePubblicato - 2022
EventoENBIS Conference - TRONDHEIM
Durata: 26 giu 202230 giu 2022

Convegno

ConvegnoENBIS Conference
CittàTRONDHEIM
Periodo26/6/2230/6/22

Keywords

  • active learning, data thinning, subsampling

Fingerprint

Entra nei temi di ricerca di 'Optimal subset selection without outliers'. Insieme formano una fingerprint unica.

Cita questo