TY - JOUR
T1 - Learning vs earning trade-off with missing or censored observations: The two-armed Bayesian nonparametric beta-Stacy bandit problem
AU - Peluso, Stefano
AU - Mira, Antonietta
AU - Muliere, Pietro
PY - 2017
Y1 - 2017
N2 - Existing Bayesian nonparametric methodologies for bandit problems focus on exact observations, leaving a gap in those bandit applications where censored observations are crucial. We address this gap by extending a\r\nBayesian nonparametric two-armed bandit problem to right-censored data, where each arm is generated from a beta-Stacy process as defined by Walker and Muliere (1997). We first show some properties of the expected advantage\r\nof choosing one arm over the other, namely the monotonicity in the arm response and, limited to the case of continuous state space, the continuity in the right-censored arm response. We partially characterize optimal\r\nstrategies by proving the existence of stay-with-a-winner and stay-witha-winner/switch-on-a-loser break-even points, under non-restrictive conditions that include the special cases of the simple homogeneous process and\r\nthe Dirichlet process. Numerical estimations and simulations for a variety of discrete and continuous state space settings are presented to illustrate the performance and flexibility of our framework.
AB - Existing Bayesian nonparametric methodologies for bandit problems focus on exact observations, leaving a gap in those bandit applications where censored observations are crucial. We address this gap by extending a\r\nBayesian nonparametric two-armed bandit problem to right-censored data, where each arm is generated from a beta-Stacy process as defined by Walker and Muliere (1997). We first show some properties of the expected advantage\r\nof choosing one arm over the other, namely the monotonicity in the arm response and, limited to the case of continuous state space, the continuity in the right-censored arm response. We partially characterize optimal\r\nstrategies by proving the existence of stay-with-a-winner and stay-witha-winner/switch-on-a-loser break-even points, under non-restrictive conditions that include the special cases of the simple homogeneous process and\r\nthe Dirichlet process. Numerical estimations and simulations for a variety of discrete and continuous state space settings are presented to illustrate the performance and flexibility of our framework.
KW - Bandit Problem
KW - Bayesian Nonparametrics
KW - Beta-Stacy Process
KW - Bandit Problem
KW - Bayesian Nonparametrics
KW - Beta-Stacy Process
UR - https://publicatt.unicatt.it/handle/10807/105662
UR - https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85030866307&origin=inward
UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85030866307&origin=inward
U2 - 10.1214/17-EJS1342
DO - 10.1214/17-EJS1342
M3 - Article
SN - 1935-7524
VL - 11
SP - 3368
EP - 3406
JO - Electronic Journal of Statistics
JF - Electronic Journal of Statistics
IS - N/A
ER -