TY - UNPB
T1 - Optimizing Tax Administration Policies with Machine Learning
AU - Battiston, P.
AU - Gamba, Simona
AU - Santoro, Alessandro
PY - 2020
Y1 - 2020
N2 - Tax authorities around the world are increasingly employing data
mining and machine learning algorithms to predict individual behaviours. Although the traditional literature on optimal tax administration provides useful tools for ex-post evaluation of policies, it disregards the problem of which taxpayers to target. This study identifies and characterises a loss function that assigns a social cost to
any prediction-based policy. We define such measure as the difference
between the social welfare of a given policy and that of an ideal policy unaffected by prediction errors. We show how this loss function
shares a relationship with the receiver operating characteristic curve,
a standard statistical tool used to evaluate prediction performance.
Subsequently, we apply our measure to predict inaccurate tax returns
issued by self-employed and sole proprietorships in Italy. In our application, a random forest model provides the best prediction: we
show how it can be interpreted using measures of variable importance
developed in the machine learning literature.
AB - Tax authorities around the world are increasingly employing data
mining and machine learning algorithms to predict individual behaviours. Although the traditional literature on optimal tax administration provides useful tools for ex-post evaluation of policies, it disregards the problem of which taxpayers to target. This study identifies and characterises a loss function that assigns a social cost to
any prediction-based policy. We define such measure as the difference
between the social welfare of a given policy and that of an ideal policy unaffected by prediction errors. We show how this loss function
shares a relationship with the receiver operating characteristic curve,
a standard statistical tool used to evaluate prediction performance.
Subsequently, we apply our measure to predict inaccurate tax returns
issued by self-employed and sole proprietorships in Italy. In our application, a random forest model provides the best prediction: we
show how it can be interpreted using measures of variable importance
developed in the machine learning literature.
KW - policy prediction problems, tax behaviour, big data, machine learning
KW - policy prediction problems, tax behaviour, big data, machine learning
UR - http://hdl.handle.net/10807/153429
M3 - Working paper
BT - Optimizing Tax Administration Policies with Machine Learning
ER -