TY - JOUR
T1 - A machine-learning parsimonious multivariable predictive model of mortality risk in patients with Covid-19
AU - Murri, Rita
AU - Lenkowicz, Jacopo
AU - Masciocchi, Carlotta
AU - Iacomini, Chiara
AU - Fantoni, Massimo
AU - Damiani, Andrea
AU - Marchetti, Antonio
AU - Sergi, Paolo Domenico Angelo
AU - Arcuri, Giovanni
AU - Cesario, Alfredo
AU - Patarnello, Stefano
AU - Antonelli, Massimo
AU - Bellantone, Rocco Domenico Alfonso
AU - Bernabei, Roberto
AU - Boccia, Stefania
AU - Calabresi, Paolo
AU - Cambieri, Andrea
AU - Cauda, Roberto
AU - Colosimo, Cesare
AU - Crea, Filippo
AU - De Maria Marchiano, Ruggero
AU - De Stefano, Valerio
AU - Franceschi, Francesco
AU - Gasbarrini, Antonio
AU - Parolini, Ornella
AU - Richeldi, Luca
AU - Sanguinetti, Maurizio
AU - Urbani, Andrea
AU - Zega, Maurizio
AU - Scambia, Giovanni
AU - Valentini, Vincenzo
AU - Armuzzi, Alessandro
AU - Barba, Marta
AU - Baroni, Silvia
AU - Bellesi, Silvia
AU - Bentivoglio, Anna Rita
AU - Biasucci, Luigi Marzio
AU - Biscetti, Federico
AU - Candelli, Marcello
AU - Capalbo, Gennaro
AU - Cattani Franchi, Paola
AU - Chiusolo, Patrizia
AU - Cingolani, Antonella
AU - Corbo, Giuseppe Maria
AU - Covino, Marcello
AU - Cozzolino, Angela Maria
AU - D'Alfonso, Maria Elena
AU - De Angelis, Giulia
AU - De Pascale, Gennaro
AU - Frisullo, Giovanni
AU - Gabrielli, Maurizio
AU - Gambassi, Giovanni
AU - Garcovich, Matteo
AU - Gremese, Elisa
AU - Grieco, Domenico Luca
AU - Iaconelli, Amerigo
AU - Iorio, Raffaele
AU - Landi, Francesco
AU - Larici, Anna Rita
AU - Liuzzo, Giovanna
AU - Maviglia, Riccardo
AU - Miele, Luca
AU - Montalto, Massimo
AU - Natale, Luigi
AU - Nicolotti, Nicola
AU - Ojetti, Veronica
AU - Pompili, Maurizio
AU - Posteraro, Brunella
AU - Rapaccini, Gian Ludovico
AU - Rinaldi, Riccardo
AU - Rossi, Elena
AU - Santoliquido, Angelo
AU - Sica, Simona
AU - Tamburrini, Enrica
AU - Teofili, Luciana
AU - Testa, Antonia Carla
AU - Tosoni, Alberto
AU - Trani, Carlo
AU - Varone, Francesco
AU - Verme, Lorenzo Zileri Dal
PY - 2021
Y1 - 2021
N2 - The COVID-19 pandemic is impressively challenging the healthcare system. Several prognostic models have been validated but few of them are implemented in daily practice. The objective of the study was to validate a machine-learning risk prediction model using easy-to-obtain parameters to help to identify patients with COVID-19 who are at higher risk of death. The training cohort included all patients admitted to Fondazione Policlinico Gemelli with COVID-19 from March 5, 2020, to November 5, 2020. Afterward, the model was tested on all patients admitted to the same hospital with COVID-19 from November 6, 2020, to February 5, 2021. The primary outcome was in-hospital case-fatality risk. The out-of-sample performance of the model was estimated from the training set in terms of Area under the Receiving Operator Curve (AUROC) and classification matrix statistics by averaging the results of fivefold cross validation repeated 3-times and comparing the results with those obtained on the test set. An explanation analysis of the model, based on the SHapley Additive exPlanations (SHAP), is also presented. To assess the subsequent time evolution, the change in paO2/FiO2 (P/F) at 48 h after the baseline measurement was plotted against its baseline value. Among the 921 patients included in the training cohort, 120 died (13%). Variables selected for the model were age, platelet count, SpO2, blood urea nitrogen (BUN), hemoglobin, C-reactive protein, neutrophil count, and sodium. The results of the fivefold cross-validation repeated 3-times gave AUROC of 0.87, and statistics of the classification matrix to the Youden index as follows: sensitivity 0.840, specificity 0.774, negative predictive value 0.971. Then, the model was tested on a new population (n = 1463) in which the case-fatality rate was 22.6%. The test model showed AUROC 0.818, sensitivity 0.813, specificity 0.650, negative predictive value 0.922. Considering the first quartile of the predicted risk score (low-risk score group), the case-fatality rate was 1.6%, 17.8% in the second and third quartile (high-risk score group) and 53.5% in the fourth quartile (very high-risk score group). The three risk score groups showed good discrimination for the P/F value at admission, and a positive correlation was found for the low-risk class to P/F at 48 h after admission (adjusted R-squared = 0.48). We developed a predictive model of death for people with SARS-CoV-2 infection by including only easy-to-obtain variables (abnormal blood count, BUN, C-reactive protein, sodium and lower SpO2). It demonstrated good accuracy and high power of discrimination. The simplicity of the model makes the risk prediction applicable for patients in the Emergency Department, or during hospitalization. Although it is reasonable to assume that the model is also applicable in not-hospitalized persons, only appropriate studies can assess the accuracy of the model also for persons at home.
AB - The COVID-19 pandemic is impressively challenging the healthcare system. Several prognostic models have been validated but few of them are implemented in daily practice. The objective of the study was to validate a machine-learning risk prediction model using easy-to-obtain parameters to help to identify patients with COVID-19 who are at higher risk of death. The training cohort included all patients admitted to Fondazione Policlinico Gemelli with COVID-19 from March 5, 2020, to November 5, 2020. Afterward, the model was tested on all patients admitted to the same hospital with COVID-19 from November 6, 2020, to February 5, 2021. The primary outcome was in-hospital case-fatality risk. The out-of-sample performance of the model was estimated from the training set in terms of Area under the Receiving Operator Curve (AUROC) and classification matrix statistics by averaging the results of fivefold cross validation repeated 3-times and comparing the results with those obtained on the test set. An explanation analysis of the model, based on the SHapley Additive exPlanations (SHAP), is also presented. To assess the subsequent time evolution, the change in paO2/FiO2 (P/F) at 48 h after the baseline measurement was plotted against its baseline value. Among the 921 patients included in the training cohort, 120 died (13%). Variables selected for the model were age, platelet count, SpO2, blood urea nitrogen (BUN), hemoglobin, C-reactive protein, neutrophil count, and sodium. The results of the fivefold cross-validation repeated 3-times gave AUROC of 0.87, and statistics of the classification matrix to the Youden index as follows: sensitivity 0.840, specificity 0.774, negative predictive value 0.971. Then, the model was tested on a new population (n = 1463) in which the case-fatality rate was 22.6%. The test model showed AUROC 0.818, sensitivity 0.813, specificity 0.650, negative predictive value 0.922. Considering the first quartile of the predicted risk score (low-risk score group), the case-fatality rate was 1.6%, 17.8% in the second and third quartile (high-risk score group) and 53.5% in the fourth quartile (very high-risk score group). The three risk score groups showed good discrimination for the P/F value at admission, and a positive correlation was found for the low-risk class to P/F at 48 h after admission (adjusted R-squared = 0.48). We developed a predictive model of death for people with SARS-CoV-2 infection by including only easy-to-obtain variables (abnormal blood count, BUN, C-reactive protein, sodium and lower SpO2). It demonstrated good accuracy and high power of discrimination. The simplicity of the model makes the risk prediction applicable for patients in the Emergency Department, or during hospitalization. Although it is reasonable to assume that the model is also applicable in not-hospitalized persons, only appropriate studies can assess the accuracy of the model also for persons at home.
KW - Aged
KW - Aged, 80 and over
KW - Blood Cell Count
KW - Blood Chemical Analysis
KW - COVID-19
KW - Cohort Studies
KW - Female
KW - Hospital Mortality
KW - Humans
KW - Machine Learning
KW - Male
KW - Middle Aged
KW - Models, Statistical
KW - Multivariate Analysis
KW - Oxygen
KW - Pandemics
KW - ROC Curve
KW - Risk Factors
KW - Rome
KW - SARS-CoV-2
KW - Aged
KW - Aged, 80 and over
KW - Blood Cell Count
KW - Blood Chemical Analysis
KW - COVID-19
KW - Cohort Studies
KW - Female
KW - Hospital Mortality
KW - Humans
KW - Machine Learning
KW - Male
KW - Middle Aged
KW - Models, Statistical
KW - Multivariate Analysis
KW - Oxygen
KW - Pandemics
KW - ROC Curve
KW - Risk Factors
KW - Rome
KW - SARS-CoV-2
UR - http://hdl.handle.net/10807/196954
U2 - 10.1038/s41598-021-99905-6
DO - 10.1038/s41598-021-99905-6
M3 - Article
SN - 2045-2322
VL - 11
SP - 21136-N/A
JO - Scientific Reports
JF - Scientific Reports
ER -