Predictive Abilities of Machine Learning Techniques May Be Limited by Dataset Characteristics: Insights From the UNOS Database
Affiliations
- PMID: 30738152
- DOI: 10.1016/j.cardfail.2019.01.018
Abstract
Background: Traditional statistical approaches to prediction of outcomes have drawbacks when applied to large clinical databases. It is hypothesized that machine learning methodologies might overcome these limitations by considering higher-dimensional and nonlinear relationships among patient variables.
Methods and results: The Unified Network for Organ Sharing (UNOS) database was queried from 1987 to 2014 for adult patients undergoing cardiac transplantation. The dataset was divided into 3 time periods corresponding to major allocation adjustments and based on geographic regions. For our outcome of 1-year survival, we used the standard statistical methods logistic regression, ridge regression, and regressions with LASSO (least absolute shrinkage and selection operator) and compared them with the machine learning methodologies neural networks, naïve-Bayes, tree-augmented naïve-Bayes, support vector machines, random forest, and stochastic gradient boosting. Receiver operating characteristic curves and C-statistics were calculated for each model. C-Statistics were used for comparison of discriminatory capacity across models in the validation sample. After identifying 56,477 patients, the major univariate predictors of 1-year survival after heart transplantation were consistent with earlier reports and included age, renal function, body mass index, liver function tests, and hemodynamics. Advanced analytic models demonstrated similarly modest discrimination capabilities compared with traditional models (C-statistic ≤0.66, all). The neural network model demonstrated the highest C-statistic (0.66) but this was only slightly superior to the simple logistic regression, ridge regression, and regression with LASSO models (C-statistic = 0.65, all). Discrimination did not vary significantly across the 3 historically important time periods.
Conclusions: The use of advanced analytic algorithms did not improve prediction of 1-year survival from heart transplant compared with more traditional prediction models. The prognostic abilities of machine learning techniques may be limited by quality of the clinical dataset.
Keywords: Advanced analytics; heart transplantation; prediction algorithms.
Copyright © 2019 Elsevier Inc. All rights reserved.
Comment in
- The Promise of Machine Learning: When Will It Be Delivered?O Akbilgic et al. J Card Fail 25 (6), 484-485. PMID 30978508.Most of the performance-focused critics on machine learning are because the bar is set unfairly too high for machine learning. In most cases, machine learning methods pro …
Similar articles
- Prediction of 30-Day All-Cause Readmissions in Patients Hospitalized for Heart Failure: Comparison of Machine Learning and Other Statistical ApproachesJD Frizzell et al. JAMA Cardiol 2 (2), 204-209. PMID 27784047.Use of a number of ML algorithms did not improve prediction of 30-day heart failure readmissions compared with more traditional prediction models. Although there will lik …
- A Machine Learning-Based Approach to Prognostic Analysis of Thoracic TransplantationsD Delen et al. Artif Intell Med 49 (1), 33-42. PMID 20153956.This study demonstrated that the integrated machine learning method to select the predictor variables is more effective in developing the Cox survival models than the tra …
- Can Machine Learning Algorithms Accurately Predict Discharge to Nonhome Facility and Early Unplanned Readmissions Following Spinal Fusion? Analysis of a National Surgical RegistryA Goyal et al. J Neurosurg Spine 1-11. PMID 31174185.In an analysis of patients undergoing spinal fusion, multiple machine learning algorithms were found to reliably predict nonhome discharge with modest performance noted f …
- Development and Validation of 15-month Mortality Prediction Models: A Retrospective Observational Comparison of Machine-Learning Techniques in a National Sample of Medicare RecipientsGD Berg et al. BMJ Open 9 (7), e022935. PMID 31315852.Improved means for identifying individuals in the last 15 months of life is needed to improve the patient experience of care and reducing the per capita cost of healthcar …
- An Extensive Experimental Survey of Regression MethodsM Fernández-Delgado et al. Neural Netw 111, 11-34. PMID 30654138. - ReviewRegression is a very relevant problem in machine learning, with many different available approaches. The current work presents a comparison of a large collection composed …
No hay comentarios:
Publicar un comentario