Patient specific predictions in the intensive care unit using a Bayesian ensemble
Johnson AEW., Dunkley N., Mayaud L., Tsanas A., Kramer AA., Clifford GD.
An intensive care unit mortality prediction model for the PhysioNet/Computing in Cardiology Challenge 2012 using a novel Bayesian ensemble learning algorithm is described. Methods: Data pre-processing was automatically performed based upon domain knowledge to remove artefacts and erroneous recordings, e.g. physiologically invalid entries and unit conversion errors. A range of diverse features was extracted from the original time series signals including standard statistical descriptors such as the minimum, maximum, median, first, last, and the number of values. A new Bayesian ensemble scheme comprising 500 weak learners was then developed to classify the data samples. Each weak learner was a decision tree of depth two, which randomly assigned an intercept and gradient to a randomly selected single feature. The parameters of the ensemble learner were determined using a custom Markov chain Monte Carlo sampler. Results: The model was trained using 4000 observations from the training set, and was evaluated by the organisers of the competition on two new datasets with 4000 observations each (set b and set c). The outcomes of the datasets were unavailable to the competitors. The competition was judged on two events by two scores. Score 1 was the minimum of the positive predictive value and sensitivity for binary model predictions, and the model achieved 0.5310 and 0.5353 on the unseen datasets. Score 2, a range-normalized Hosmer-Lemeshow C statistic, evaluated to 26.44 and 29.86. The model was re-developed using the updated data sets from phase 2 after the competition, and achieved a score 1 of 0.5374 and a score 2 of 18.20 on set c. Conclusion: The proposed prediction model performs favourably on both the provided and hidden data sets (set A and set B), and has the potential to be used effectively for patient-specific predictions. © 2012 CCAL.