TY - JOUR
T1 - Estimation of spatial econometric linear models with large datasets: How big can spatial Big Data be?
AU - Arbia, Giuseppe
AU - Ghiringhelli, C.
AU - Ghiringhelli, Chiara
AU - Mira, A.
PY - 2019
Y1 - 2019
N2 - Spatial econometrics is currently experiencing the Big Data revolution both in terms of the volume
of data and the velocity with which they are accumulated. Regional data, employed traditionally in
spatial econometric modeling, can be very large, with information that are increasingly available at
a very fine resolution level such as census tracts, local markets, town blocks, regular grids or other
small partitions of the territory. When dealing with spatial microeconometric models referred to the
granular observations of the single economic agent, the number of observations available can be a lot
higher. This paper reports the results of a systematic simulation study on the limits of the current
methodologies when estimating spatial models with large datasets. In our study we simulate a Spatial
Lag Model (SLM), we estimate it using Maximum Likelihood (ML), Two Stages Least Squares (2SLS)
andBayesianestimator(B),andwetesttheirperformancesfordifferentsamplesizesanddifferentlevels
of sparsity of the weight matrices. We considered three performance indicators, namely: computing
time, storage required and accuracy of the estimators. The results show that using standard computer
capabilities the analysis becomes prohibitive and unreliable when the sample size is greater than 70,000
evenforlowlevelsofsparsity. Thisresultsuggeststhatnewapproachesshouldbeintroducedtoanalyze
the big datasets that are quickly becoming the new standard in spatial econometrics.
AB - Spatial econometrics is currently experiencing the Big Data revolution both in terms of the volume
of data and the velocity with which they are accumulated. Regional data, employed traditionally in
spatial econometric modeling, can be very large, with information that are increasingly available at
a very fine resolution level such as census tracts, local markets, town blocks, regular grids or other
small partitions of the territory. When dealing with spatial microeconometric models referred to the
granular observations of the single economic agent, the number of observations available can be a lot
higher. This paper reports the results of a systematic simulation study on the limits of the current
methodologies when estimating spatial models with large datasets. In our study we simulate a Spatial
Lag Model (SLM), we estimate it using Maximum Likelihood (ML), Two Stages Least Squares (2SLS)
andBayesianestimator(B),andwetesttheirperformancesfordifferentsamplesizesanddifferentlevels
of sparsity of the weight matrices. We considered three performance indicators, namely: computing
time, storage required and accuracy of the estimators. The results show that using standard computer
capabilities the analysis becomes prohibitive and unreliable when the sample size is greater than 70,000
evenforlowlevelsofsparsity. Thisresultsuggeststhatnewapproachesshouldbeintroducedtoanalyze
the big datasets that are quickly becoming the new standard in spatial econometrics.
KW - big spatial data, computational issues, spatial econometric models, maximum likelihood, bayesian estimator, dense matrix, spatial two stage estimator,
KW - big spatial data, computational issues, spatial econometric models, maximum likelihood, bayesian estimator, dense matrix, spatial two stage estimator,
UR - http://hdl.handle.net/10807/132731
U2 - 10.1016/j.regsciurbeco.2019.01.006
DO - 10.1016/j.regsciurbeco.2019.01.006
M3 - Article
SN - 0166-0462
VL - 2019
SP - N/A-N/A
JO - Regional Science and Urban Economics
JF - Regional Science and Urban Economics
ER -