TY - JOUR
T1 - A general framework for penalized mixed-effects multitask learning with applications on DNA methylation surrogate biomarkers creation
AU - Cappozzo, Andrea
AU - Ieva, Francesca
AU - Fiorito, Giovanni
PY - 2023
Y1 - 2023
N2 - Recent evidence highlights the usefulness of DNA methylation (DNAm) biomarkers as surrogates for exposure to risk factors for noncommunicable diseases in epidemiological studies and randomized trials. DNAm variability has been demonstrated to be tightly related to lifestyle behavior and expo-sure to environmental risk factors, ultimately providing an unbiased proxy of an individual state of health. At present, the creation of DNAm surrogates relies on univariate penalized regression models, with elastic-net regularizer being the gold standard when accomplishing the task. Nonetheless, more ad-vanced modeling procedures are required in the presence of multivariate out-comes with a structured dependence pattern among the study samples. In this work we propose a general framework for mixed-effects multitask learning in presence of high-dimensional predictors to develop a multivariate DNAm biomarker from a multicenter study. A penalized estimation scheme, based on an expectation-maximization algorithm, is devised in which any penalty criteria for fixed-effects models can be conveniently incorporated in the fit-ting process. We apply the proposed methodology to create novel DNAm surrogate biomarkers for multiple correlated risk factors for cardiovascular diseases and comorbidities. We show that the proposed approach, modeling multiple outcomes together, outperforms state-of-the-art alternatives both in predictive power and biomolecular interpretation of the results.
AB - Recent evidence highlights the usefulness of DNA methylation (DNAm) biomarkers as surrogates for exposure to risk factors for noncommunicable diseases in epidemiological studies and randomized trials. DNAm variability has been demonstrated to be tightly related to lifestyle behavior and expo-sure to environmental risk factors, ultimately providing an unbiased proxy of an individual state of health. At present, the creation of DNAm surrogates relies on univariate penalized regression models, with elastic-net regularizer being the gold standard when accomplishing the task. Nonetheless, more ad-vanced modeling procedures are required in the presence of multivariate out-comes with a structured dependence pattern among the study samples. In this work we propose a general framework for mixed-effects multitask learning in presence of high-dimensional predictors to develop a multivariate DNAm biomarker from a multicenter study. A penalized estimation scheme, based on an expectation-maximization algorithm, is devised in which any penalty criteria for fixed-effects models can be conveniently incorporated in the fit-ting process. We apply the proposed methodology to create novel DNAm surrogate biomarkers for multiple correlated risk factors for cardiovascular diseases and comorbidities. We show that the proposed approach, modeling multiple outcomes together, outperforms state-of-the-art alternatives both in predictive power and biomolecular interpretation of the results.
KW - Mixed-effects models
KW - multitask learning
KW - personalized medicine
KW - penalized estimation
KW - multivariate regression
KW - EM algorithm
KW - Mixed-effects models
KW - multitask learning
KW - personalized medicine
KW - penalized estimation
KW - multivariate regression
KW - EM algorithm
UR - http://hdl.handle.net/10807/303277
UR - https://projecteuclid.org/journals/annals-of-applied-statistics/volume-17/issue-4/a-general-framework-for-penalized-mixed-effects-multitask-learning-with/10.1214/23-aoas1760.short
U2 - 10.1214/23-AOAS1760
DO - 10.1214/23-AOAS1760
M3 - Article
SN - 1932-6157
VL - 17
SP - 3257
EP - 3282
JO - THE ANNALS OF APPLIED STATISTICS
JF - THE ANNALS OF APPLIED STATISTICS
ER -