Article Text

Download PDFPDF

Development and validation of a machine learning-based prediction model for near-term in-hospital mortality among patients with COVID-19
  1. Prathamesh Parchure1,
  2. Himanshu Joshi1,2,
  3. Kavita Dharmarajan1,3,4,
  4. Robert Freeman1,5,
  5. David L Reich5,6,
  6. Madhu Mazumdar1,2,
  7. Prem Timsina1 and
  8. Arash Kia1
  1. 1 Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, New York, USA
  2. 2 Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, New York, USA
  3. 3 Department of Geriatrics and Palliative Care, Icahn School of Medicine at Mount Sinai, New York, New York, USA
  4. 4 Department of Radiation Oncology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
  5. 5 Hospital Administration, Icahn School of Medicine at Mount Sinai, New York, New York, USA
  6. 6 Department of Anesthesiology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
  1. Correspondence to Dr Madhu Mazumdar; madhu.mazumdar{at}


Objectives To develop and validate a model for prediction of near-term in-hospital mortality among patients with COVID-19 by application of a machine learning (ML) algorithm on time-series inpatient data from electronic health records.

Methods A cohort comprised of 567 patients with COVID-19 at a large acute care healthcare system between 10 February 2020 and 7 April 2020 observed until either death or discharge. Random forest (RF) model was developed on randomly drawn 70% of the cohort (training set) and its performance was evaluated on the rest of 30% (the test set). The outcome variable was in-hospital mortality within 20–84 hours from the time of prediction. Input features included patients’ vital signs, laboratory data and ECG results.

Results Patients had a median age of 60.2 years (IQR 26.2 years); 54.1% were men. In-hospital mortality rate was 17.0% and overall median time to death was 6.5 days (range 1.3–23.0 days). In the test set, the RF classifier yielded a sensitivity of 87.8% (95% CI: 78.2% to 94.3%), specificity of 60.6% (95% CI: 55.2% to 65.8%), accuracy of 65.5% (95% CI: 60.7% to 70.0%), area under the receiver operating characteristic curve of 85.5% (95% CI: 80.8% to 90.2%) and area under the precision recall curve of 64.4% (95% CI: 53.5% to 75.3%).

Conclusions Our ML-based approach can be used to analyse electronic health record data and reliably predict near-term mortality prediction. Using such a model in hospitals could help improve care, thereby better aligning clinical decisions with prognosis in critically ill patients with COVID-19.

  • end of life care
  • hospital care
  • prognosis
  • supportive care
  • terminal care

Data availability statement

Data are available upon reasonable request. Raw data were generated at the Mount Sinai Health System. Derived data supporting the findings of this study are available from the corresponding author (MM) on request.

This article is made freely available for personal use in accordance with BMJ’s website terms and conditions for the duration of the covid-19 pandemic or until otherwise determined by BMJ. You may use, download and print the article for any lawful, non-commercial purpose (including text and data mining) provided that all copyright notices and trade marks are retained.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


The surge in hospitalisations and intensive care unit (ICU) admissions due to patients with severe COVID-191 has shown the need for effective prognostication, so that clinicians working with limited resources can formulate appropriate goals of care based on patients’ varying risk of deterioration. Timely and targeted delivery of palliative care services is an important component of COVID-19 management. With such data, providers could help manage symptoms of severe infection and foster shared decision making with patients and families well before clinical deterioration and death. However, the appropriate timing of initiation of goals-of-care and/or palliative care consultations is difficult to gauge given the largely unpredictable disease trajectory of COVID-19 and how suddenly patients’ conditions can deteriorate.2 Age and pre-existing high-risk conditions3 can be baseline predictors of mortality. However, a number of patients with COVID-19 without underlying high-risk conditions have needed hospitalisation, required ICU care or died (26%, 23% and 5%, respectively).4 These data indicate that using baseline risk factors to assess mortality risk may have limited clinical utility in the context of COVID-19.

Manual assessments are performed to evaluate patients’ overall clinical condition, assess the need for interventions and identify those at higher risk of poor prognosis. As an aid to the manual assessments, score-based approaches5–7 have been proposed to improve the process of patient prognostication. However, validity of these approaches remains to be established for COVID-19 hospitalisations. Moreover, the repeated elicitation of scores during hospitalisation can be laborious. Supervised machine learning (ML) can provide an opportunity to frequently assess large number of relevant variables, their temporal changes and the known as well as unknown interactions among variables with respect to the prognostic outcome.

We aimed to develop a novel supervised ML-based prediction tool to help clinical teams identify inpatients with COVID-19 at higher risk of near-term in-hospital mortality, and to assist the palliative care clinicians in determining when to hold emotionally charged conversations regarding prognosis and care for these patients. We used inpatient time-series data from the institution’s electronic health record (EHR) system and applied a random forest (RF) approach. Here, we describe the development, validation and interpretation of this model.

Materials and methods

The study was approved by the institutional research board, which waived the need for informed consent.

Data source

We compiled retrospective cohort data from the Mount Sinai Health System COVID-19 registry, which included admission–discharge–transfer events, administrative data, time-series data of clinical assessments, and laboratory and ECG results. Our study complies with the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) reporting guideline for development and validation of prediction models.

Study population

The cohort included adults ≥18 years old with a COVID-19 diagnosis who were admitted to the Mount Sinai Hospital between 10 February 2020 and 7 April 2020. We defined a COVID-19 diagnosis based on either of the following criteria: (a) positive detection of SARS-CoV-2 by reverse transcriptase-PCR assay or (b) clinical diagnosis of COVID-19 made by an infectious disease specialist.

Selection of variables

Data compiled for this study included patient demographics (eg, age, sex); relevant hospital administrative variables (eg, admission type, source of admission); data from nursing flowsheets (eg, vital signs, respiratory pattern); relevant laboratory results and ECG-derived variables (eg, P wave axis, PR interval or QRS duration). We sought to train a model with the best discriminatory ability based on all clinically relevant variables. Variables used in this study were identified from previously published COVID-19 and critical care evidence.

Sampling strategy

For observational variables, we used the three most recent recorded assessments from time-series data that were available when each feature vector was created. For each patient, we generated daily feature vectors starting from admission date until the date of discharge or death.

Patterns of missingness and imputation

The missingness (ie, all three observations within a feature vector were missing) of laboratory variables ranged between 53.5% and 89.0%. Missingness is largely an indicator that a particular test was not considered necessary and/or relevant by clinical judgement. For each numerical variable, we used median values of non-missing data across the cohort for imputing missing values. For discrete variables, missing values were retained.

Primary outcome

The primary outcome of interest was in-hospital death within 20–84 hours from the time of prediction. This outcome was translated into a label for training the model using discharge disposition and time of feature vector creation. The interval between the time of discharge and the time of generating each feature vector was generated daily for each patient. If the discharge disposition was ‘Expired (ie, dead)’ and the interval was between 20 and 84 hours, we labelled the feature vectors as positive. If the discharge disposition was ‘Not Expired’ and the interval was between 20 and 84 hours, we labelled the feature vectors negative. We excluded the remaining feature vectors from our cohort.

The labelling strategy is illustrated in figure 1.

Figure 1

Feature vector labelling strategy. (A) positive labels; (B) negative labels. V1–V3: values of observations used for creating the feature vector.

Justification of time window of outcomes

For pragmatic reasons, we defined near-term outcomes as those occurring between 20 and 84 hours from the prediction time. Approximately a day (20 hours) would allow time for providers for manual assessment by clinicians, trying interventions to prevent further deterioration and for performing the goals-of-care and/or palliative care consultations to develop an individualised plan of care after their clinical assessment. The 3 days horizon (72 hours) was extended by 12 hours for operational reasons of accommodating a complete day at a hospital until evening.

Model training and development

Cohort data were randomly split into training (~70%) and test (~30%) sets. Because the distribution of positive and negative feature vectors was imbalanced (17.9% vs 82.1%), we performed random under-sampling on the majority class in the training set until both classes became balanced. An RF algorithm was chosen for training the model with 10-fold cross-validation, using the open source Apache Spark project ML library. The resultant fitted model was then applied to the test set.

Importance of variables

Features included in the final RF model were ranked using the Gini importance criteria. In this study, each continuous variable had three features, representing the three most recent observations. We calculated the overall importance of each variable by the aggregated sum of Gini importance values of all its underlying features.

Statistical analysis

A default threshold of 0.5 was used on model-derived class probabilities for assigning positive and negative predictions. Model performance statistics included assessment of sensitivity, specificity, accuracy and positive and negative predictive values. We also plotted receiver operating characteristic (ROC) and the precision recall (PR) curves, area under the ROC curve (AUROC) and area under the PR curve (AUPRC). A 95% CI of all performance statistics was obtained. Performance metrics were computed in the R environment by using custom scripts and the R packages—PRROC (V.1.3.1), pROC (V.1.16.9000) and epiR (V.1.0.4).

Clinical utility, need assessment, temporal validation

We applied our model to 910 individuals hospitalised with COVID-19 diagnosis at our institution during the study period and identified those who died in the hospital. RF model predictions were obtained prospectively for each day a patient stayed in the hospital. Positive predictions with high risk of near-term mortality were identified using a threshold of 0.5. Among hospitalisations that ended in death, we identified a subset of patients whom our model predicted as being at high risk of near-term mortality.

We queried the EHR system for documentation of patients with COVID-19 seen by a palliative care service provider, and then categorised this subset (with patients at high model-predicted mortality risk) into two subgroups: (a) patients who received a goals-of-care and/or palliative care consultations before death, either in the emergency department before admission or during their inpatient stay and (b) patients who did not receive either type of consultation before death. From group (b), we identified a subset who could have benefited from such as consultation, based on review of case notes performed by a clinician. These patients who were predicted by our tool as having high risk of mortality are considered as having an unmet need for goals-of-care and/or palliative care consultation.


The model for this cohort consisted of 1360 feature vectors, representing inpatient data from 567 unique individuals hospitalised at our institution with COVID-19. When split into training and test sets, and after under-sampling of the training set, the two sets had 338 and 414 feature vectors, respectively.

Cohort characteristics

Patient characteristics at baseline are shown in table 1. In both the study cohort and test set, more patients were men and the highest proportion of patients were in the 45–65-year age group. In-hospital mortality rates in the overall cohort and test set were similar, as was the median time to death (6.5 days [range: 1.3- 23.0]). No significant differences were observed between the overall and test sets in distributions of race, ethnicity, relevant comorbidities, smoking habits, patients’ residential origin and proportion of patients who received ICU care.

Table 1

Characteristics of patients admitted in the study cohort and test set

Predictors and their importance

Data included in this study were basic demographic and hospital variables, structured clinical assessments including vital signs,7 complete blood count, serum biochemical tests,8 coagulation profile,9 parameters for respiratory function and other complications,10 markers of inflammation11 12 and electrocardiography parameters.13 Fifty-five variables (comprising 175 features) were included in the final version of trained RF model (online supplemental eTable 1). The model hyperparameters used for training the best cross-validated model are listed in online supplemental eTable 2.

Supplemental material

Implicit feature selection performed by the RF algorithm reflected by non-zero Gini importance values recognised 49 variables corresponding to 99 features accounted in the final model (online supplemental eFigure 1).

Supplemental material

Predictive performance of the model

Predictive performance of the model is shown in table 2. ROC and PR curves of the RF classifier model are shown in figure 2.

Figure 2

Receiver operating characteristics and precision recall curves of the RF classifier model in our test set. AUPRC, area under the precision recall curve; AUROC, area under the receiver operating characteristic curve; RF, random forest.

Table 2

Predictive performance of near-term mortality by random forest classifier

Assessment of use and unmet need of goals-of-care/palliative care consultation

Our model predicted 95.2% of in-hospital deaths; among these, 11.6% of patients did not receive a goals-of-care or palliative care consultation. Among those who did, 65.8% received the consultation during their inpatient stay and 15.1% received it in the emergency department. A clinician’s manual review of case notes determined that 53% of those who did not receive a goals-of-care and/or palliative care consultation were appropriate candidates for it, representing 6.2% of the entire cohort (figure 3).

Figure 3

Assessment of use and potentially unmet need for goals-of-care or palliative care (GOC/PC) consultation among patients with COVID-19 who died in the hospital (6.2% of all deaths in the cohort), shown as red slice in centre.


We developed an RF-based model for predicting near-term in-hospital mortality of adult patients with COVID-19 by using time-series inpatient data. The model provided adequate discrimination (AUROC 85.5%) without the need for manual preprocessing of data. In contrast to using static variables, our model translates the variability in patients’ conditions into mortality risk predictions.

Modelling of near-term mortality with dynamic risk quantification should incorporate variables that capture the progression of COVID-19 along the common pathways underlying mortality, such as respiratory or multi-organ failure, septic shock and cardiogenic shock from acute myocardial injury and myocarditis.14 Unlike other studies15–18 of predictive models for mortality among patients with COVID-19, ours is the first model to demonstrate feasibility of using multiple clinical variables as time-series data from the EHR. The narrow prediction time-window captured by this model is a key in scenario of acute illness with high risk of in-hospital mortality, as seen with the COVID-19. Some limitations of other predictive models of ICU and in-hospital mortality7 19–24 include accounting for patient characteristics on admission or access to limited variables from inpatient data. Our model addresses these limitations and accounts for the context of COVID-19 management during the pandemic.

The prediction probability generated by the model can be calibrated according to the expected mortality rate. While uncertainty of prognostication in COVID-19 can be a reason of missing the early opportunities of goals-of-care/palliative care consultations, the near-term estimated risk of mortality can be used to create alternative prognostic risk scenarios in order to build the goals-of-care and palliative care consultations with increased confidence. During the COVID-19 crisis, when critical care team may include re-deployed personnel, not specialised in critical care, the prognostication by ML-based model can assist the clinicians. The goals-of-care consultations are performed by any care team member and aim at ensuring that the care team is aware of patients’ values and preferences and that these preferences can be incorporated into hospital care, given the current clinical condition. It involves discussion of prognosis, goals, and noting any patient concerns25 and shared decision on the use of life-supporting interventions.26 27 The palliative care consults are performed by the palliative care specialists and aim at delivering a comprehensive, interdisciplinary care with a focus on improving the quality of life for patients in imminent or established critical condition. These consultations can include goals-of-care discussions. In addition to helping the patients, it also helps to provide much needed support to patient’s family members.28

The model variables of interest

Apart from conventional static prognostic variables, such as age1 emerging as a top variable of significance, variables reflecting renal function status (eg, blood urea nitrogen and serum creatinine) appeared significant, emphasising the importance of pre-existing chronic kidney disease, abnormal renal function and acute kidney injury in COVID-1929 30 and its association with mortality.29 Among the arterial and venous blood gas analysis variables, the model ranked anion gap and PaCO2 higher than markers of hypoxemia (PaO2 and PvO2), supporting the significance of respiratory acidosis31 and hypercapnia in those who died from COVID-19 despite the improvement in oxygenation.32 33 Key markers of inflammation—C reactive protein and complement C4 together with the markers of coagulability—D-dimer, platelet count and activated partial thromboplastin time being predictive in our model, is consistent with the suggestion of severe COVID-19 being coagulopathy with severe immune-inflammatory state.34 Predictive value of ECG variables, such as the axis of T wave, PR interval, indicates importance of cardiac complications in COVID-19 in relation to patient outcome.35 36 Contradictory to the expectation, the N-terminal pro-B-type natriuretic peptide (NT-proBNP), a predictor of mortality in COVID-19,37 ranked low in our model. Among the haematologic variables, lymphocyte count appeared to be predictive. Lymphocyte count has a significance in COVID-19 prognostication11 and COVID-19-associated sepsis.38

We chose an RF algorithm because of its ability to handle complex multi-modal clinical data and elucidate high-order interactions among input variables without compromising the predictive accuracy.39 As another strength of our model, the data elements used in our model include commonly used variables in the clinical management of COVID-19.

Limitations of the model

A small subset of patients was involved in both training and test sets, introducing the potential for contamination between the two sets. However, in general, the daily feature vectors originating from the time-series data differ, even for the same patient encounter. The model requires validation in external and prospective settings, given the variability in various aspects such as patient demographics, care resources and protocols for disease management. The treatment and intervention variables can have potential implications on the disease trajectory and mortality. However, the current version of the model does not include treatment/intervention variables and therefore, generalisability can be limited given the variability in treatment and intervention guidelines by institutions. The model was trained on COVID-19 hospitalisations regardless of patients’ current level of care; in actual practice, the frequency of obtaining clinical assessments may vary by the level of care. Due to our small sample size, we could not train separate models for each specific level of care or age group. Such customisation, however, could elucidate more predictors of near-term mortality. The estimated unmet need for goals-of-care or palliative care consultations is based on a small cohort. Future work is necessary to evaluate the impact of the model’s implementation on the coverage of palliative care and other aspects of clinical workflow.

Practice implications

A tool that accurately predicts mortality risk and is incorporated in the EHR can complement the manual clinical review and help the care providers in various ways. These include expanding the coverage of appropriately timed goals-of-care or palliative care consultations, and escalation or de-escalation of care for facilitating efficient resource utilisation.

Our tool predicted 95% of all COVID-19 hospitalisations associated with in-hospital deaths in the prospective cohort. We also observed that more than half of hospitalised patients who were clinically eligible for goals-of-care or palliative care consultations did not receive them. Prognostication augmented with our model-derived predictions could potentially address this unmet need. At our institution, palliative care services are embedded within various departments and were often proactively provided to the high-risk patients with COVID-19. The extent of unmet need of these consultations could be considerably greater at centres with reduced availability of palliative care services, further highlighting the importance of ML-based mortality prediction.

Our model’s high negative predictive value (95.8%, 95% CI: 92.2% to 98.1%) in assessing near-term mortality risk could be helpful for guiding providers when hospital or staffing resources are strained. Among other potential implications, our model could facilitate timely escalation of care, which can help to reduce the rate of invasive mechanical ventilation.40 This could be especially important in situations where such resources are limited.

In a fast-paced COVID-19 crisis, increased needs of trained manpower along with the need of critical care resources require careful planning. Acute need of critical care resources can be forecasted by using the near-term prognostication by the model together with patients’ critical care preferences. This can help hospitals plan not only for the imminent surge in demand of care but can also assist in making decisions on new admissions, transfers, logistics and staffing.


The ability of our proposed model to predict the near-term mortality in patients with COVID-19 demonstrates its potential to be adapted according to clinical utilities, including its utility for the palliative care services.

Data availability statement

Data are available upon reasonable request. Raw data were generated at the Mount Sinai Health System. Derived data supporting the findings of this study are available from the corresponding author (MM) on request.

Ethics statements

Patient consent for publication



  • Twitter @RFreeman_RN

  • PT and AK contributed equally.

  • Contributors AK and PT conceived the study and supervision. PP and PT contributed to data curation. PP, PT, AK and HJ contributed to formal analysis. HJ and KD contributed to formal validation of data and results. HJ and MM contributed to drafting of the original manuscript. All authors contributed to critical revision of the manuscript for important intellectual content.

  • Funding This study was funded by National Institute of Aging (P30AG028741), Division of Cancer Prevention, National Cancer Institute (P30CA196521).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.