Abstract
Big Data are generally huge quantities of digital information accrued
automatically and/or merged from several sources and rarely result from
properly planned population surveys. A Big Dataset is herein conceived as
a collection of information concerning a nite population. Since the anal-
ysis of an entire Big Dataset can require enormous computational eort,
we suggest selecting a sample of observations and using this sampling
information to achieve the inferential goal. Instead of the design-based
survey sampling approach (which relates to the estimation of summary
nite population measures, such as means, totals, proportions) we con-
sider the model-based sampling approach, which involves inference about
parameters of a super-population model. This model is assumed to have
generated the nite population values, i.e. the Big Dataset. Given a
super-population model we can apply the theory of optimal design to
draw a sample from the Big Dataset which contains the majority of in-
formation about the unknown parameters of interest. In addition, since a
Big Dataset might provide poor information despite its size, from the def-
inition of eciency of a design we suggest a device to measure the quality
of the Big Data.
Original language | English |
---|---|
Title of host publication | Programme and Abstracts, 19th Annual ENBIS Conference, Budapest, 2-4 september 2019 |
Pages | 37 |
Number of pages | 1 |
Volume | 2019 |
Publication status | Published - 2019 |
Event | 19th Annual ENBIS Conference - Budapest (Ungheria) Duration: 2 Sept 2019 → 4 Sept 2019 |
Conference
Conference | 19th Annual ENBIS Conference |
---|---|
City | Budapest (Ungheria) |
Period | 2/9/19 → 4/9/19 |
Keywords
- Business and Industrial Statistics
- European Network