TY - JOUR
T1 - Discovering causal structures in Bayesian Gaussian directed acyclic graph models
AU - Castelletti, Federico
AU - Consonni, Guido
PY - 2020
Y1 - 2020
N2 - Causal directed acyclic graphs (DAGs) are naturally tailored to represent biological signalling pathways. However, a causal DAG is only identifiable up to Markov equivalence if only observational data are available. Interventional data, based on exogenous perturbations of the system, can greatly improve identifiability. Since the gain of an intervention crucially depends on the intervened variables, a natural issue is devising efficient strategies for optimal causal discovery. We present a Bayesian active learning procedure for Gaussian DAGs which requires no subjective specification on the side of the user, explicitly takes into account the uncertainty on the space of equivalence classes (through the posterior distribution) and sequentially proposes the choice of the optimal intervention variable. In simulation experiments our method, besides surpassing designs based on a random choice of intervention nodes, shows decisive improvements over currently available algorithms and is competitive with the best alternative benchmarks. An important reason behind this strong performance is that, unlike non-Bayesian algorithms, our utility function naturally incorporates graph estimation uncertainty through the posterior edge inclusion probability. We also reanalyse the Sachs data on protein signalling pathways from an active learning perspective and show that DAG identification can be achieved by using only a subset of the available intervention samples.
AB - Causal directed acyclic graphs (DAGs) are naturally tailored to represent biological signalling pathways. However, a causal DAG is only identifiable up to Markov equivalence if only observational data are available. Interventional data, based on exogenous perturbations of the system, can greatly improve identifiability. Since the gain of an intervention crucially depends on the intervened variables, a natural issue is devising efficient strategies for optimal causal discovery. We present a Bayesian active learning procedure for Gaussian DAGs which requires no subjective specification on the side of the user, explicitly takes into account the uncertainty on the space of equivalence classes (through the posterior distribution) and sequentially proposes the choice of the optimal intervention variable. In simulation experiments our method, besides surpassing designs based on a random choice of intervention nodes, shows decisive improvements over currently available algorithms and is competitive with the best alternative benchmarks. An important reason behind this strong performance is that, unlike non-Bayesian algorithms, our utility function naturally incorporates graph estimation uncertainty through the posterior edge inclusion probability. We also reanalyse the Sachs data on protein signalling pathways from an active learning perspective and show that DAG identification can be achieved by using only a subset of the available intervention samples.
KW - Active learning
KW - Causal directed acyclic graph
KW - Essential graph
KW - Intervention
KW - Markov equivalence
KW - Objective Bayes method
KW - Active learning
KW - Causal directed acyclic graph
KW - Essential graph
KW - Intervention
KW - Markov equivalence
KW - Objective Bayes method
UR - http://hdl.handle.net/10807/146678
UR - https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssa.12550
U2 - 10.1111/rssa.12550
DO - 10.1111/rssa.12550
M3 - Article
SN - 1467-985X
VL - 183
SP - 1727
EP - 1745
JO - JOURNAL OF THE ROYAL STATISTICAL SOCIETY. SERIES A, STATISTICS IN SOCIETY
JF - JOURNAL OF THE ROYAL STATISTICAL SOCIETY. SERIES A, STATISTICS IN SOCIETY
ER -