TY - JOUR
T1 - Bayesian inference of graph-based dependencies from mixed-type data
AU - Galimberti, Chiara
AU - Peluso, Stefano
AU - Castelletti, Federico
PY - 2024
Y1 - 2024
N2 - Mixed data comprise measurements of different types, with both categorical and continuous variables, and can be found in various areas, such as in life science or industrial processes. Inferring conditional independencies from the data is crucial to understand how these variables relate to each other. To this end, graphical models provide an effective framework, which adopts a graph-based representation of the joint distribution to encode such dependence relations. This framework has been extensively studied in the Gaussian and categorical settings separately; on the other hand, the literature addressing this problem in presence of mixed data is still narrow. We propose a Bayesian model for the analysis of mixed data based on the notion of Conditional Gaussian (CG) distribution. Our method is based on a canonical parameterization of the CG distribution, which allows for posterior inference of parameters indexing the (marginal) distributions of continuous and categorical variables, as well as expressing the interactions between the two types of variables. We derive the limiting Gaussian distributions, centered on the correct unknown value and with vanishing variance, for the Bayesian estimators of the canonical parameters expressing continuous, discrete and mixed interactions. In addition, we implement the proposed method for structure learning purposes, namely to infer the underlying graph of conditional independencies. When compared to alternative frequentist methods, our approach shows favorable results both in a simulation setting and in real-data applications, besides allowing for a coherent uncertainty quantification around parameter estimates.
AB - Mixed data comprise measurements of different types, with both categorical and continuous variables, and can be found in various areas, such as in life science or industrial processes. Inferring conditional independencies from the data is crucial to understand how these variables relate to each other. To this end, graphical models provide an effective framework, which adopts a graph-based representation of the joint distribution to encode such dependence relations. This framework has been extensively studied in the Gaussian and categorical settings separately; on the other hand, the literature addressing this problem in presence of mixed data is still narrow. We propose a Bayesian model for the analysis of mixed data based on the notion of Conditional Gaussian (CG) distribution. Our method is based on a canonical parameterization of the CG distribution, which allows for posterior inference of parameters indexing the (marginal) distributions of continuous and categorical variables, as well as expressing the interactions between the two types of variables. We derive the limiting Gaussian distributions, centered on the correct unknown value and with vanishing variance, for the Bayesian estimators of the canonical parameters expressing continuous, discrete and mixed interactions. In addition, we implement the proposed method for structure learning purposes, namely to infer the underlying graph of conditional independencies. When compared to alternative frequentist methods, our approach shows favorable results both in a simulation setting and in real-data applications, besides allowing for a coherent uncertainty quantification around parameter estimates.
KW - Bayesian inference
KW - Conditional Gaussian distribution
KW - Undirected graph
KW - Structure learning
KW - Mixed variables
KW - Bayesian inference
KW - Conditional Gaussian distribution
KW - Undirected graph
KW - Structure learning
KW - Mixed variables
UR - http://hdl.handle.net/10807/291997
U2 - 10.1016/j.jmva.2024.105323
DO - 10.1016/j.jmva.2024.105323
M3 - Article
SN - 0047-259X
VL - 203
SP - N/A-N/A
JO - Journal of Multivariate Analysis
JF - Journal of Multivariate Analysis
ER -