Accounting for outliers in optimal subsampling methods

  • Laura Deldossi*
  • , Elena Pesce
  • , Chiara Tommasi
  • *Autore corrispondente per questo lavoro

Risultato della ricerca: Contributo in rivistaArticolopeer review

Abstract

Nowadays, in many different fields, massive data are available and for several rea-\r\nsons, it might be convenient to analyze just a subset of the data. The application of\r\nthe D-optimality criterion can be helpful to optimally select a subsample of observa-\r\ntions. However, it is well known that D-optimal support points lie on the boundary of\r\nthe design space and if they go hand in hand with extreme response values, they can\r\nhave a severe influence on the estimated linear model (leverage points with high influ-\r\nence). To overcome this problem, firstly, we propose a non-informative “exchange”\r\nprocedure that enables us to select a “nearly” D-optimal subset of observations with-\r\nout high leverage values. Then, we provide an informative version of this exchange\r\nprocedure, where besides high leverage points also the outliers in the responses (that\r\nare not necessarily associated to high leverage points) are avoided. This is possible\r\nbecause, unlike other design situations, in subsampling from big datasets the response\r\nvalues may be available. Finally, both the non-informative and informative selection\r\nprocedures are adapted to I-optimality, with the goal of getting accurate predictions.
Lingua originaleInglese
pagine (da-a)1119-1135
RivistaStatistical Papers
Numero di pubblicazione64
DOI
Stato di pubblicazionePubblicato - 2023

All Science Journal Classification (ASJC) codes

  • Statistica e Probabilità
  • Statistica, Probabilità e Incertezza

Keywords

  • D-optimality · I-optimality · Active learning · Subsampling

Fingerprint

Entra nei temi di ricerca di 'Accounting for outliers in optimal subsampling methods'. Insieme formano una fingerprint unica.

Cita questo