Article Text

The Surprise Question and clinician-predicted prognosis: systematic review and meta-analysis
  1. Ankit Gupta1,
  2. Ruth Burgess2,
  3. Michael Drozd3,
  4. John Gierula3,
  5. Klaus Witte3 and
  6. Sam Straw3
  1. 1Leeds Institute of Medical Education, University of Leeds, Leeds, UK
  2. 2Leeds Teaching Hospitals NHS Trust, Leeds, UK
  3. 3Leeds Institute of Cardiovascular and Metabolic Medicine, University of Leeds, Leeds, UK
  1. Correspondence to Dr Sam Straw, Leeds Institute of Cardiovascular and Metabolic Medicine, University of Leeds, Leeds, UK; s.straw{at}leeds.ac.uk

Abstract

Background The Surprise Question, ‘Would you be surprised if this person died within the next year?’ is a simple tool that can be used by clinicians to identify people within the last year of life. This review aimed to determine the accuracy of this assessment, across different healthcare settings, specialties, follow-up periods and respondents.

Methods Searches were conducted of Medline, Embase, AMED, PubMed and the Cochrane Central Register of Controlled Trials, from inception until 01 January 2024. Studies were included if they reported original data on the ability of the Surprise Question to predict survival. For each study (including subgroups), sensitivity, specificity, positive and negative predictive values and accuracy were determined.

Results Our dataset comprised 56 distinct cohorts, including 68 829 patients. In a pooled analysis, the sensitivity of the Surprise Question was 0.69 ((0.64 to 0.74) I2=97.2%), specificity 0.69 ((0.63 to 0.74) I2=99.7%), positive predictive value 0.40 ((0.35 to 0.45) I2=99.4%), negative predictive value 0.89 ((0.87 to 0.91) I2=99.7%) and accuracy 0.71 ((0.68 to 0.75) I2=99.3%). The prompt performed best in populations with high event rates, shorter timeframes and when posed to more experienced respondents.

Conclusions The Surprise Question demonstrated modest accuracy with considerable heterogeneity across the population to which it was applied and to whom it was posed. Prospective studies should test whether the prompt can facilitate timely access to palliative care services, as originally envisioned.

PROSPERO registration number CRD32022298236.

  • Palliative Care
  • Prognosis

Data availability statement

No data are available. Not applicable.

https://creativecommons.org/licenses/by/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • The Surprise Question is a simple tool that could help identify people within the last year of life.

  • Current evidence suggests that the Surprise Question has reasonable accuracy for identifying patients at higher risk of mortality, potentially aiding clinicians to initiate timely discussions about palliative and end-of-life care.

WHAT THIS STUDY ADDS

  • The Surprise Question has modest accuracy for identifying those nearing the end of life with some inconsistency across settings, specialities and follow-up times.

  • The prompt performs best when used in populations with high event rates, when posed over shorter timeframes, and when used in inpatient settings.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • Our meta-analysis helps further refine the role of the Surprise Question as a prognostic tool in acute and chronic illnesses.

  • Future research should address whether integrating the Surprise Question into routine clinical care improves access to palliative care services, facilitates advance care planning and is acceptable to the healthcare team.

Introduction

The Surprise Question, ‘Would you be surprised if this person were to die within the next year?’ is a simple prompt, originally developed to help healthcare professionals identify patients who are nearing the end of life who might require additional support and access to palliative care services.1 Anticipated prognosis is a major driver of these decisions, such that the ability of the Surprise Question to identify those within the last year of life has been assessed across a diverse range of healthcare settings. Although clinician-predicted prognosis is simple and convenient, it may lack accuracy due to a tendency to overestimate survival.2 The Surprise Question aims to address this tendency by posing a reflective question as to whether death is possible, rather than likely.3

The Surprise Question is a core component of the Gold Standards Framework tool in the United Kingdom which is recommended for use across primary and secondary healthcare settings to identify those nearing the end of life.4 The use of the Surprise Question to identify those in the last year of life is also endorsed in position statements from both the American Heart Association5 and Japanese Cardiology Society/Heart Failure Society.6 Despite its widespread use, the prognostic accuracy of the Surprise Question is uncertain and may depend on the setting in which it is applied, the disease studied, timeframe chosen, event (death) rate in the population and to whom the question is posed.7

Previous meta-analyses8 9 have not included studies using shorter timeframes, or have not considered the accuracy of the Surprise Question when utilised in different healthcare settings.10 We first aimed to provide an updated systematic review and meta-analysis of the accuracy of the Surprise Question. Second, we aimed to assess the accuracy of the Surprise Question across populations with different event rates, healthcare settings, specialties, timeframes and when posed to different healthcare professionals.

Methods

In accordance with the Preferred Reporting Items for Systematic Review and Meta-analysis study guidelines, our study aimed to assess the accuracy of the Surprise Question.11

Search strategy

The study protocol was registered with PROSPERO (online supplemental file 1). We searched for articles indexed in Medline, Embase, Allied and Complimentary Medicine Database (AMED), PubMed, Cochrane Database of Systematic Reviews, and the Cochrane Central Register of Controlled Trials from inception until 01 January 2024, including articles being processed at that time. The full search strategy is available in table 1. Briefly, we searched the literature using variations of the search terms “Surprise Question” and “mortality” or “Gold standards Framework” and “mortality”. Additionally, the references of all included articles and review articles were assessed manually to identify any additional relevant publications. We limited our search to studies on human subjects, including both adult and paediatric populations, and to articles published in English, or for which an English translation was available. No other filters were applied.

Supplemental material

Table 1

Search strategy

Study selection

AG performed the search and SS adjudicated the search strategy before and during the time it was applied to the respective databases. Studies identified from database searches were screened independently by AG and SS. The first selection criterion was that the title or abstract included either ‘Surprise Question’ or ‘Gold Standards Framework’ with any study not meeting this criterion excluded. We placed no restrictions on study design, although at full review we required studies to report mortality data divided by whether patients received a ‘surprised’ or a ‘not surprised’ response from a healthcare professional and, therefore, all studies were prospective and observational in nature. We excluded studies where it was not possible to determine the sensitivity, specificity, positive and negative predictive values (NPV) and accuracy of the Surprise Question for the population studied. Where these data were unavailable, but the article appeared potentially relevant, applications for raw data were made to corresponding authors. No restrictions were placed on the setting, disease studied, timeframe evaluated or healthcare professional providing the response. Any discrepancies were resolved by meeting between AG and SS. The option for unresolved discrepancies to be adjudicated by a third reviewer (KW) was never required.

Quality assessment of studies

Each study was assessed independently by AG and SS who met and discussed the study designs. As studies were observational in nature, each rater independently completed the Newcastle-Ottawa Scale. Any discrepancies could be adjudicated by a third reviewer (KW), although none was. The Newcastle-Ottawa Scale rates observational studies based on three domains: selection, comparability between the exposed and unexposed groups and exposure/outcome assessment. The scale assigns a maximum of four stars for selection, two for comparability and three for exposure/outcome. In line with the Agency for Healthcare Research and Quality standards, the quality of the studies was categorised into either good, fair or poor. Good-quality articles were those which received three or four stars in the selection domain, and one or two stars in the comparability domain and two or three stars in the exposure/outcome domain. Fair-quality articles received two stars in the selection domain, and one or two stars in the comparability domain and two or three stars in the exposure/outcome domain. Finally, those which were of poor quality received either 0 or one star in the selection domain, or 0 stars in the comparability, or 0 or one stars in the exposure/outcome domain.

Data extraction

Data from included studies were extracted independently by AG and SS, who recorded study design, setting (primary care, outpatient, emergency department or inpatient), medical or surgical specialty, number of patients, timeframe assessed and type of healthcare professionals providing responses. Where data were reported from separate participants, responses were pooled into an overall estimate, with responses from other healthcare professionals then assessed separately in subgroup analyses. Where data were reported from separate time points from the same cohort, these were analysed separately.

Two-by-two tables were compiled for each study and relevant subgroups to determine the predictive value of the Surprise Question. The sensitivity was the proportion of patients who received a ‘not surprised’ response and subsequently died, whereas the specificity was the proportion of patients who received a ‘surprised’ response and subsequently died. The positive predictive value (PPV) was the proportion of patients who received a ‘not surprised’ response and subsequently died, and the NPV was the proportion of patients who received a ‘surprised’ response and subsequently survived. Accuracy was the proportion of patients correctly predicted by the Surprise Question. These are presented alongside 95% CIs for the overall comparisons of each study and subgroup analyses, with heterogeneity estimated by the I2 statistic. Event (death) rates were calculated for each study by dividing the total number of deaths by the total cohort size, expressed as a percentage.

Data analysis

We synthesised estimates of the accuracy of the Surprise Question using a random effects meta-analysis model using the meta-analysis function in STATA V.16 (StataCorp LLC, College Station, Texas). We used the restricted maximum-likelihood model, which was used for calculating τ2. Overall comparisons were calculated, and then where appropriate, studies were divided by event rate, setting, specialty, timeframe of follow-up and healthcare professional. Where the Surprise Question was reported separately from different healthcare professional groups, we pooled responses to calculate an overall estimate (if this was not provided in the manuscript) with individual group responses recorded separately. Where studies reported responses from timeframes other than 1 year, these estimates were not included in the overall comparisons and were reported separately.

Results

The search of four electronic databases identified 4062 records, with 2575 articles remaining after the removal of duplicates. Of these, 2494 were excluded after the screening of the titles and abstracts, usually because they were not relevant or did not report original data (figure 1). Of the 81 retrieved articles, 26 were excluded after full-text review because data were not available to calculate the accuracy of the Surprise Question even after request to the corresponding author. Two articles were identified from the references of included studies. A total of 57 studies met the full inclusion criteria, however two reported data from the same population but using different timeframes. Our final dataset, therefore, consisted of 56 distinct cohorts, including a total of 68 829 unique patients.

Figure 1

PRISMA flow diagram of article screening process. PRISMA, Preferred Reporting Items for Systematic Review and Meta-analysis.

Study characteristics

The characteristics of the individual studies are displayed in table 2, all of which were prospective, observational cohort studies. Most studies reported data from adult patients (although the age and sex were often not reported). One study was conducted in a paediatric population. The majority of studies reported data from Europe or the USA, but the dataset included studies from all global regions. Forty-two studies chose a timeframe of 1 year to assess the prognostic accuracy of the Surprise Question, with other studies reporting the accuracy between 1 day and 3.3 years. Twenty-three (41.1%) studies were conducted in outpatient settings, 16 (28.6%) in hospitalised patients, 7 (12.5%) in primary care, 5 (8.9%) in the emergency department and 5 (8.9%) in community settings.

Table 2

Characteristics of studies included in systematic review

In the pooled comparison across all 56 studies, the prognostic accuracy of the Surprise Question was modest, with high heterogeneity between studies. The accuracy of individual studies is shown in table 3. The sensitivity of a ‘not surprised’ response was 0.69 ((0.64 to 0.74) I2=97.2%), the specificity was 0.69 ((0.63 to 0.74) I2=99.7%), the PPV was 0.40 ((0.35 to 0.45) I2=99.4%), the NPV was 0.89 ((0.87 to 0.91) I2=99.7%), and the accuracy was 0.71 ((0.68 to 0.75) I2=99.3%).

Table 3

Accuracy of individual studies

Accuracy according to event rates

The overall mortality rates ranged from 2.6% to 81.5%. We divided studies into three groups to explore how the accuracy of the Surprise Question varied by mortality rate. Nineteen studies reported <14% of their patients dying. In these studies, respondents identified in 62% of cases those that died (sensitivity 0.62 (0.53 to 0.72) I2=96.3%) and those that did not in 74% of cases (specificity 0.74 (0.64 to 0.83) I2=99.9%). Eighteen studies reported a mortality rate of 14%–23%. In these studies, respondents successfully predicted death in 61% of cases (sensitivity 0.61 (0.55 to 0.66) I2=84.3%) and survival in 77% of cases (specificity 0.77 (0.72 to 0.81) I2=97.0%). More than 23% of patients died in the remaining 19 studies. In these studies, respondents performed best when identifying patients who were likely to die (sensitivity 0.83 ((0.79 to 0.87) I2=92.3%). They successfully identified those that would not die in 56% of cases (specificity 0.56 (0.46 to 0.65) I2=98.4%).

Where respondents predicted death, the proportion of patients that actually died was greatest in those studies with an event rate >23% (PPV 0.58 [0.49 to 0.66] I2=99.4%) and lowest in those with an event rate<14% (PPV 0.22 [0.18 to 0.27] I2=98.0). Conversely, when respondents predicted survival, the proportion of patients that survived was greatest in studies with lower event rates (NPV 0.96 (0.95 to 0.97) I2=98.3) and lowest in those studies with larger event rates (0.80 (0.75 to 0.86) I2=96.9).

Accuracy according to setting

Respondents were most reliably able to identify those patients at risk of dying within the follow-up time in community settings (sensitivity 0.83 (0.71 to 0.95), I2=97.4%), whereas in studies performed in primary care settings, the sensitivity was lowest (0.63 (0.43 to 0.83)], I2=97.0%) (figure 2). Conversely, respondents in studies set in primary care were most successful in being able to identify those that would survive (specificity 0.77 (0.63 to 0.91), I2=99.7%), whereas respondents in community care settings including nursing homes and hospices were least able to identify those that survived (specificity 0.52 (0.35 to 0.68), I2=98.5%) (figure 3). When respondents predicted death, this was correct most commonly in intensive care units at (PPV 0.57 (0.46 to 0.68), I2=95.2%) and incorrect most commonly in the outpatient setting (0.34 (0.28 to 0.40) I2=97.1%). When respondents predicted survival, this was most commonly the case in primary care patients (NPV 0.93 (0.88 to 0.98), I2=99.4%) and incorrect most commonly for hospital inpatients (NPV 0.83 (0.75 to 0.91), I2=99.2%).

Figure 2

Sensitivity of the Surprise Question by setting. REML, restricted maximum-likelihood.

Figure 3

Specificity of the Surprise Question by setting. REML, restricted maximum-likelihood.

Accuracy according to specialty

We observed significant heterogeneity in the performance of the Surprise Question according to specialty. Respondents were most successful at predicting death in the paediatric cohort (sensitivity 0.88 (0.75 to 1.02)) and were incorrect mostly in respiratory cohorts (sensitivity 0.56 (0.36 to 0.77) I2=72.9%) (figure 4). The Surprise Question performed best in acute medical patients when identifying those that were not at risk of dying (specificity 0.92 (0.89 to 0.95)) and worst in oncology patients (0.57 (0.41 to 0.73) I2=99.1%) (figure 5). The proportion of patients who died when the respondent predicted death ranged from 30% (PPV 0.30 (0.19 to 0.41) I2=97.6%) in cardiology patients to 68% (PPV 0.68 (0.60 to 0.76)) in acute medical patients. The proportion of patients who survived when the respondent predicted survival was generally consistent across all specialties but was lowest in oncological patients with a value of 76% (NPV 0.76 (0.60 to 0.91) I2=99.5%). The accuracy was greatest in acute medical patients (0.86 (0.82 to 0.89)) and lowest in cardiology and general medical patients (0.63 (0.54 to 0.71) I2=91.9% and 0.69 (0.55 to 0.83) I2=97.4%, respectively).

Figure 4

Sensitivity of the Surprise Question by specialty. REML, restricted maximum-likelihood.

Figure 5

Specificity of the Surprise Question by specialty. REML, restricted maximum-likelihood.

Accuracy according to follow-up period

Thirteen studies assessed the performance of the Surprise Question over time periods shorter than 1 year.12–24 In these studies, the prompt successfully predicted death in 69% of cases (sensitivity 0.69 (0.56 to 0.82) I2=99.3%) and successfully predicted survival in 65% (specificity 0.65 ((0.49 to 0.81) I2=99.9%). The proportion of patients who died when the respondent predicted death was greatest in this subgroup (PPV 0.44 (0.29 to 0.59) I2=99.9%), compared with at 1 year (PPV 0.38 (0.32 to 0.44) I2=99.0%) and over 1 year (PPV 0.36 (0.30 to 0.41) I2=93.4%).

Fourty-two studies assessed the prognostic accuracy of the Surprise Question at 1 year.7 14 17 20 25–62 In a pooled comparison, the proportion of correct predictions among all cases was 71% (accuracy 0.71 (0.67 to 0.75) I2=99.2%). The sensitivity of a ‘not surprised’ response was 0.68 ((0.63 to 0.74) I2=95.0%) and the specificity was 0.69 ((0.63 to 0.75) I2=99.7%).

Five studies used a timeframe of greater than 1 year.25 50 63–66 Respondents successfully identified those at risk of dying and those surviving in 71% (sensitivity 0.71 (0.60 to 0.82) I2=93.4%) and 61% (specificity 0.61 (0.43 to 0.78) I2=99.1%), respectively (figures 6 and 7). This was relatively comparable to the other subgroups.

Figure 6

Sensitivity of the Surprise Question by timeframe. REML, restricted maximum-likelihood.

Figure 7

Specificity of the Surprise Question by timeframe. REML, restricted maximum-likelihood.

Accuracy according to respondent

Four studies reported responses to the Surprise Question from different healthcare professionals.7 37 42 63 Studies, where physicians and nurses together provided responses to the Surprise Question, resulted in the highest proportion of patients successfully identified at risk of dying (sensitivity 0.71 (0.61 to 0.81) I2=98.7%). Trainee physicians performed worst in identifying those that did not die within the follow-up period (0.57 (0.49 to 0.64) I2=33.4%). Of those predicted to die within the follow-up period by nurses, 39% of them did die (PPV 0.39 (0.33 to 0.45) I2=94.3%), compared with 52% when trainee physicians predicted death (PPV 0.52 (0.49 to 0.56) I2=0.0%). Conversely, 83% of patients survived when trainee physicians predicted survival (NPV 0.83 (0.79 to 0.87) I2=0.0%), compared with 93% in advanced practice providers (NPV 0.93 (0.88 to 0.97)). The pooled accuracy was similar between physicians (0.71 (0.67 to 0.75) I2=99.1%) and nurses (0.70 (0.60 to 0.81) I2=99.0%) (figure 8).

Figure 8

Accuracy of the Surprise Question by respondent. REML, restricted maximum-likelihood.

Risk of bias of included studies

Full details of the risk of bias assessment are displayed in table 4. Overall, 41 studies were rated ‘good’ quality, with the remaining 15 studies being rated as ‘poor’ quality. The most common reasons for bias were failure to control for age and/or sex (n=14, 25.0%), failure to control for any other additional factors (n=13, 23.2%) or the method for ascertainment of outcomes not being documented (n=10, 17.9%). However, given the aim of the current study, study quality is unlikely to have had a significant impact on our analysis.

Table 4

Newcastle-Ottawa scale for included studies

Discussion

In this meta-analysis, the accuracy of the Surprise Question was assessed across a diverse range of studies including a total of 68 829 unique patients. In the overall pooled comparison, the accuracy of the Surprise Question was modest, in keeping with prior meta-analyses.8 9 We found its performance varied considerably according to the event rate of the population in which the prompt was applied, the healthcare setting, specialty, follow-up period chosen, and to whom the Surprise Question was posed.

We found that in studies where a greater proportion of the cohort died, clinicians were more reliably able to recognise this, in keeping with previous findings.67 One possible explanation is that where death is common, healthcare providers may become more realistic regarding patient prognosis, or more cognisant of the known predictors of poor outcomes for these patient groups. Prior meta-analyses have shown the Surprise Question to be more accurate in the setting of oncology compared with other disease groups.8–10 In our study, the ability of the Surprise Question to successfully identify those patients who were dying in oncology settings was excellent (PPV 0.85 (0.79 to 0.91) I2=95.7%), and higher than most other disease groups with the exception of the paediatric cohort (PPV 0.88 (0.75 to 1.02)), for which it was similar. It may be the case that patients diagnosed with malignancies exhibit a more predictable and consistent disease progression compared with those with other chronic conditions such as heart failure or chronic respiratory disease, where disease trajectories often display greater variability and unpredictability.68

When studies were divided by timeframe, the rate of identifying patients that were dying were similar (sensitivity <1 year=0.69 (0.56 to 0.82) I2=99.3%; 1 year=0.68 (0.63 to 0.74) I2=95.0%; >1 year=0.71 (0.60 to 0.82) I2=93.4%). A prior meta-analysis found that there were no differences in the accuracy of the Surprise Question when study timeframes shorter than 1 year were included, although in a limited sample of studies.9 The ability of the prompt to identify those that were not at risk of death was lower for timeframes above 1 year (specificity 0.61 (0.43 to 0.78) I2=99.1%) compared with 1 year and <1 year (specificity 0.69 (0.63 to 0.75) I2=99.7% and 0.65 (0.49 to 0.81) I2=99.9%, respectively). The reduced specificity for timeframes exceeding 1 year implies its potential inaccuracy for identifying patients unlikely to die over longer periods. This may raise concerns about overestimating the need for end-of-life care, potentially leading to unnecessary interventions for patients not in immediate need. Similar challenges have been observed in other prognostication models, highlighting the importance of cautious interpretation and further refinement in predicting longer term outcomes.69 Patients’ health conditions and anticipated prognoses may change over time, leading to uncertainties in predicting their need for end-of-life care. Additionally, healthcare providers may find it more challenging to accurately assess and predict patients’ needs for end-of-life care further into the future, as it involves a greater degree of uncertainty and more comprehensive assessments.

We found that trainee physicians performed comparatively worse compared with qualified physicians when identifying those that are unlikely to die, in line with other data suggesting that more experienced assessors are more accurate.70–72 A study of paediatrician’s survival predictions for premature new-born babies investigated whether physician’s self-rated attitude of being an optimist or a pessimist affected prediction accuracy. This study found that those physicians who rated themselves as optimistic, produced survival estimates which were accurate and comparable to true survival rates, while pessimists’ estimates consistently underestimated true survival rates.73 A further study of neonatologists in Italy concurred.74 This discrepancy may stem from the tendency of junior physicians to harbour more pessimistic attitudes, potentially affecting their predictive accuracy when compared with their senior counterparts, who tend to be less pessimistic and more precise in their assessments.

The Surprise Question is a core component of the Gold Standards Framework tool in the United Kingdom, which is recommended for use across primary and secondary healthcare settings to identify those nearing the end-of-life.4 Additionally, the Surprise Question has recently been endorsed in position statements by both the American Heart Association5 and Japanese Cardiology Society/Heart Failure Society.6 Recently, the Centre to Advance Palliative Care convened a consensus panel, which recommended that a ‘not surprised’ response to the Surprise Question should trigger assessment for unmet palliative care needs.67

While the Surprise Question is becoming more widespread and is widely endorsed, there are important practical considerations. One limitation lies in its reliance on the subjective judgement of healthcare practitioners, whose prognostic assessments may vary based on individual experiences and perceptions.75 One way of addressing this is by attempting to reach consensus. One study looked at the performance of the Surprise Question when utilised by a multidisciplinary team. When compared with a consensus that was restricted to either 100% or 75%–100% agreement among the multidisciplinary team, the analyses demonstrated that using a consensus opinion did result in a slightly lower overall accuracy, yet it did not significantly affect the prognostication results.26 A further study analysed the agreement of responses to the Surprise Question between different healthcare professionals for patients with heart failure. The study found the greatest agreement to be between cardiologists and heart failure nurse specialists, perhaps reflecting greater expertise and experience for these healthcare professionals compared with non-specialists.7 A further consideration is that the Surprise Question tends to result in an over classification of patients as ‘not surprised’. The Surprise Question could, therefore, be a valuable prognostic tool to identify those unlikely to die, and as a prompt to consider advanced care planning and referral to specialist palliative care services in populations where a nocebo effect from palliative care interventions is not considered likely.

A high false-positive rate may not be necessarily viewed as detrimental to patient care, as this may encourage clinicians to consider an early integration of palliative care into the patient pathway for those in whom death is possible, however it may have implications for service delivery. A holistic patient assessment is integral in the decision to refer to palliative care services, as opposed to a prognostic estimate alone, which is only one consideration. The possibility of a nocebo effect may be a concern to some, however a palliative approach is unlikely to be detrimental to patient outcomes where it is implemented alongside usual care and is complimentary to it.68 Therefore, the Surprise Question may be useful for identifying patients who may benefit from an early integration of palliative care, it should not be used as the sole determinant of treatment decisions.

Strengths and limitations

Our data have several strengths over previous meta-analyses investigating the accuracy of the Surprise Question. Foremost, we include additional studies due to utilising a broad search strategy, including articles published or in press by 1 January 2024 as well as making requests to corresponding authors for unpublished data. Second, our analysis offers insights across a spectrum of healthcare settings, populations, follow-up intervals, respondents and event rates. Furthermore, each stage of the review process was conducted independently by two reviewers and the study protocol was registered prospectively.

Some limitations should be noted. First, in 12 studies, the respondents to the Surprise Question were ‘physicians and nurses’14 20 26 35 38 44 47 49 50 52 64 65 and data were not available to separately calculate the accuracy of each healthcare professional. However, it should be noted that there is evidence to suggest that multiprofessional predictions on prognosis are more accurate than single-professional estimates.26 76 Second, 26 studies were excluded after full-text review due to unavailability of raw data following requests to corresponding authors, which may result in sample bias.

Conclusions

Our meta-analysis helps define the potential role of the Surprise Question as a prognostic tool in acute and chronic illness. We found that the overall accuracy of Surprise Question was modest, and that it performs best in populations where death is common, when posed over a shorter follow-up period, and to more experienced respondents. Despite its limitations, it may be the case that when considering supportive care, prognostication is less important, as those patients identified by the Surprise Question who do not subsequently die may still benefit from an early integration of a palliative approach into their care. Future studies should address whether integrating the Surprise Question into routine clinical care improves access to palliative care services, facilitates advance care planning and is acceptable to the healthcare team.

Supplemental material

Data availability statement

No data are available. Not applicable.

Ethics statements

Patient consent for publication

Ethics approval

Not applicable.

Acknowledgments

The authors acknowledge the support of the National Institute of Health Research Leeds Cardiovascular Clinical Research Facility.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • X @ankkgup, @DrMikeDrozd, @JohnGierula, @KlausKWitte, @DrSamStraw

  • Contributors AG and SS collected the data. AG and SS analysed the data. AG produced the first draft of the manuscript. All other authors provided critical revision of the manuscript. SS is the guarantor of the work and takes overall respoonsibility for its content.

  • Funding The study was supported by a British Heart Foundation Clinical Research Training Fellowship awarded to Dr Sam Straw (FS/CRTF/20/24071).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer-reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.