Abstract
With the advent of ‘Big Data’, massive data sets are becoming increasingly prevalent. Several
subdata selection are proposed in these last few years both to reduce the computational burden and
to improve cost effectiveness and learning of the phenomenon. Some of these proposals (Drovandi
et al., 2017; Wang et al., 2019; Deldossi and Tommasi (2021) among others) are inspired to Optimal
Experimental Design (OED). However, differently from the OED context - where researchers have
typically complete control over the predictors - in subsampling methods these, and the responses
as well, are passively observed. Thus if outliers are present in the ‘Big Data’, it is likely that they
could be included in the sample selected applying the D-criterion, being the D-optimal design
points on the boundary of the design space.
In regression analysis, outliers - and more in general influential points – could have a large impact
on the estimates; identify and exclude them in advance, especially in large datasets, is generally
not an easy task. In this study, we propose an exchange procedure to select a compromise-optimal
subset which is informative for the inferential goal and avoids outliers and ‘bad’ influential points
Lingua originale | English |
---|---|
Titolo della pubblicazione ospite | Programme and Abstracts, 22nd Annual ENBIS Conference, Trondheim, 26-30 June 2022 |
Pagine | 34-35 |
Numero di pagine | 2 |
Volume | 2022 |
Stato di pubblicazione | Pubblicato - 2022 |
Evento | ENBIS Conference - TRONDHEIM Durata: 26 giu 2022 → 30 giu 2022 |
Convegno
Convegno | ENBIS Conference |
---|---|
Città | TRONDHEIM |
Periodo | 26/6/22 → 30/6/22 |
Keywords
- active learning, data thinning, subsampling