TY - JOUR
T1 - Bias Correction in Clustered Underreported Data
AU - De Oliveira, Guilherme Lopes
AU - Argiento, Raffaele
AU - Loschi, Rosangela Helena
AU - Assunç˜Ao, Renato Martins
AU - Ruggeri, Fabrizio
AU - Branco, Márcia D’Elia
PY - 2020
Y1 - 2020
N2 - Data quality from poor and socially deprived regions have given rise
to many statistical challenges. One of them is the underreporting of vital events
leading to biased estimates for the associated risks. To deal with underreported
count data, models based on compound Poisson distributions have been commonly
assumed. To be identifiable, such models usually require extra and strong information
about the probability of reporting the event in all areas of interest, which
is not always available. We introduce a novel approach for the compound Poisson
model assuming that the areas are clustered according to their data quality. We
leverage these clusters to create a hierarchical structure in which the reporting
probabilities decrease as we move from the best group to the worst ones.We obtain
constraints for model identifiability and prove that only prior information about
the reporting probability in areas experiencing the best data quality is required.
Several approaches to model the uncertainty about the reporting probabilities are
presented, including reference priors. Different features regarding the proposed
methodology are studied through simulation. We apply our model to map the
early neonatal mortality risks in Minas Gerais, a Brazilian state that presents
heterogeneous characteristics and a relevant socio-economical inequality.
AB - Data quality from poor and socially deprived regions have given rise
to many statistical challenges. One of them is the underreporting of vital events
leading to biased estimates for the associated risks. To deal with underreported
count data, models based on compound Poisson distributions have been commonly
assumed. To be identifiable, such models usually require extra and strong information
about the probability of reporting the event in all areas of interest, which
is not always available. We introduce a novel approach for the compound Poisson
model assuming that the areas are clustered according to their data quality. We
leverage these clusters to create a hierarchical structure in which the reporting
probabilities decrease as we move from the best group to the worst ones.We obtain
constraints for model identifiability and prove that only prior information about
the reporting probability in areas experiencing the best data quality is required.
Several approaches to model the uncertainty about the reporting probabilities are
presented, including reference priors. Different features regarding the proposed
methodology are studied through simulation. We apply our model to map the
early neonatal mortality risks in Minas Gerais, a Brazilian state that presents
heterogeneous characteristics and a relevant socio-economical inequality.
KW - compound Poisson model, generalized Beta distribution, Jeffreys prior, model identifiability, neonatal mortality, underreporting
KW - compound Poisson model, generalized Beta distribution, Jeffreys prior, model identifiability, neonatal mortality, underreporting
UR - http://hdl.handle.net/10807/163433
UR - https://projecteuclid.org/euclid.ba/1600999224
U2 - 10.1214/20-BA1244
DO - 10.1214/20-BA1244
M3 - Article
SN - 1936-0975
SP - 1
EP - 32
JO - Bayesian Analysis
JF - Bayesian Analysis
ER -