TY - JOUR
T1 - An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data
AU - Argiento, Raffaele
AU - Wadsworth, W.Duncan
AU - Guindani, Michele
AU - Galloway-Pena, Jessica
AU - Shelbourne, Samuel A.
AU - Vannucci, Marina
PY - 2017
Y1 - 2017
N2 - Background: The Human Microbiome has been variously associated with the immune-regulatory mechanisms
involved in the prevention or development of many non-infectious human diseases such as autoimmunity, allergy
and cancer. Integrative approaches which aim at associating the composition of the human microbiome with other
available information, such as clinical covariates and environmental predictors, are paramount to develop a more
complete understanding of the role of microbiome in disease development.
Results: In this manuscript, we propose a Bayesian Dirichlet-Multinomial regression model which uses spike-and-slab
priors for the selection of significant associations between a set of available covariates and taxa from a microbiome
abundance table. The approach allows straightforward incorporation of the covariates through a log-linear regression
parametrization of the parameters of the Dirichlet-Multinomial likelihood. Inference is conducted through a Markov
Chain Monte Carlo algorithm, and selection of the significant covariates is based upon the assessment of posterior
probabilities of inclusions and the thresholding of the Bayesian false discovery rate. We design a simulation study to
evaluate the performance of the proposed method, and then apply our model on a publicly available dataset
obtained from the Human Microbiome Project which associates taxa abundances with KEGG orthology pathways. The
method is implemented in specifically developed R code, which has been made publicly available.
Conclusions: Our method compares favorably in simulations to several recently proposed approaches for similarly
structured data, in terms of increased accuracy and reduced false positive as well as false negative rates. In the
application to the data from the Human Microbiome Project, a close evaluation of the biological significance of our
findings confirms existing associations in the literature.
AB - Background: The Human Microbiome has been variously associated with the immune-regulatory mechanisms
involved in the prevention or development of many non-infectious human diseases such as autoimmunity, allergy
and cancer. Integrative approaches which aim at associating the composition of the human microbiome with other
available information, such as clinical covariates and environmental predictors, are paramount to develop a more
complete understanding of the role of microbiome in disease development.
Results: In this manuscript, we propose a Bayesian Dirichlet-Multinomial regression model which uses spike-and-slab
priors for the selection of significant associations between a set of available covariates and taxa from a microbiome
abundance table. The approach allows straightforward incorporation of the covariates through a log-linear regression
parametrization of the parameters of the Dirichlet-Multinomial likelihood. Inference is conducted through a Markov
Chain Monte Carlo algorithm, and selection of the significant covariates is based upon the assessment of posterior
probabilities of inclusions and the thresholding of the Bayesian false discovery rate. We design a simulation study to
evaluate the performance of the proposed method, and then apply our model on a publicly available dataset
obtained from the Human Microbiome Project which associates taxa abundances with KEGG orthology pathways. The
method is implemented in specifically developed R code, which has been made publicly available.
Conclusions: Our method compares favorably in simulations to several recently proposed approaches for similarly
structured data, in terms of increased accuracy and reduced false positive as well as false negative rates. In the
application to the data from the Human Microbiome Project, a close evaluation of the biological significance of our
findings confirms existing associations in the literature.
KW - Bayesian hierarchical model, Data integration, Dirichlet-multinomial, Microbiome data, Variable selection
KW - Bayesian hierarchical model, Data integration, Dirichlet-multinomial, Microbiome data, Variable selection
UR - http://hdl.handle.net/10807/148066
UR - https://www.scopus.com/inward/record.uri?eid=2-s2.0-85012100277&doi=10.1186/s12859-017-1516-0&partnerid=40&md5=180dc990f26a434225afe14009dd7f00
U2 - 10.1186/s12859-017-1516-0
DO - 10.1186/s12859-017-1516-0
M3 - Article
VL - 18
SP - 1
EP - 12
JO - BMC Bioinformatics
JF - BMC Bioinformatics
SN - 1471-2105
ER -