The Surprise Question and clinician-predicted prognosis: systematic review and meta-analysis

Background The Surprise Question, ‘Would you be surprised if this person died within the next year?’ is a simple tool that can be used by clinicians to identify people within the last year of life. This review aimed to determine the accuracy of this assessment, across different healthcare settings, specialties, follow-up periods and respondents. Methods Searches were conducted of Medline, Embase, AMED, PubMed and the Cochrane Central Register of Controlled Trials, from inception until 01 January 2024. Studies were included if they reported original data on the ability of the Surprise Question to predict survival. For each study (including subgroups), sensitivity, specificity, positive and negative predictive values and accuracy were determined. Results Our dataset comprised 56 distinct cohorts, including 68 829 patients. In a pooled analysis, the sensitivity of the Surprise Question was 0.69 ((0.64 to 0.74) I2=97.2%), specificity 0.69 ((0.63 to 0.74) I2=99.7%), positive predictive value 0.40 ((0.35 to 0.45) I2=99.4%), negative predictive value 0.89 ((0.87 to 0.91) I2=99.7%) and accuracy 0.71 ((0.68 to 0.75) I2=99.3%). The prompt performed best in populations with high event rates, shorter timeframes and when posed to more experienced respondents. Conclusions The Surprise Question demonstrated modest accuracy with considerable heterogeneity across the population to which it was applied and to whom it was posed. Prospective studies should test whether the prompt can facilitate timely access to palliative care services, as originally envisioned. PROSPERO registration number CRD32022298236.


INTRODUCTION
The Surprise Question, 'Would you be surprised if this person were to die within the next year?' is a simple prompt, originally developed to help healthcare professionals identify patients who are nearing the end of life who might require additional support and access to palliative care services. 1Anticipated prognosis is a major driver of these decisions, such that the ability of the Surprise Question to identify those within the last year of life has been assessed across a diverse range of healthcare settings.Although clinician-predicted prognosis is simple and convenient, it may lack accuracy due

WHAT IS ALREADY KNOWN ON THIS TOPIC
⇒ The Surprise Question is a simple tool that could help identify people within the last year of life.⇒ Current evidence suggests that the Surprise Question has reasonable accuracy for identifying patients at higher risk of mortality, potentially aiding clinicians to initiate timely discussions about palliative and end-of-life care.

WHAT THIS STUDY ADDS
⇒ The Surprise Question has modest accuracy for identifying those nearing the end of life with some inconsistency across settings, specialities and follow-up times.⇒ The prompt performs best when used in populations with high event rates, when posed over shorter timeframes, and when used in inpatient settings.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
⇒ Our meta-analysis helps further refine the role of the Surprise Question as a prognostic tool in acute and chronic illnesses.⇒ Future research should address whether integrating the Surprise Question into routine clinical care improves access to palliative care services, facilitates advance care planning and is acceptable to the healthcare team.

Systematic review
to a tendency to overestimate survival. 2The Surprise Question aims to address this tendency by posing a reflective question as to whether death is possible, rather than likely. 3he Surprise Question is a core component of the Gold Standards Framework tool in the United Kingdom which is recommended for use across primary and secondary healthcare settings to identify those nearing the end of life. 4The use of the Surprise Question to identify those in the last year of life is also endorsed in position statements from both the American Heart Association 5 and Japanese Cardiology Society/Heart Failure Society. 6Despite its widespread use, the prognostic accuracy of the Surprise Question is uncertain and may depend on the setting in which it is applied, the disease studied, timeframe chosen, event (death) rate in the population and to whom the question is posed. 7revious meta-analyses 8 9 have not included studies using shorter timeframes, or have not considered the accuracy of the Surprise Question when utilised in different healthcare settings. 10We first aimed to provide an updated systematic review and metaanalysis of the accuracy of the Surprise Question.Second, we aimed to assess the accuracy of the Surprise Question across populations with different event rates, healthcare settings, specialties, timeframes and when posed to different healthcare professionals.

METHODS
In accordance with the Preferred Reporting Items for Systematic Review and Meta-analysis study guidelines, our study aimed to assess the accuracy of the Surprise Question. 11

Search strategy
The study protocol was registered with PROSPERO (online supplemental file 1).We searched for articles indexed in Medline, Embase, Allied and Complimentary Medicine Database (AMED), PubMed, Cochrane Database of Systematic Reviews, and the Cochrane Central Register of Controlled Trials from inception until 01 January 2024, including articles being processed at that time.The full search strategy is available in table 1.Briefly, we searched the literature using variations of the search terms "Surprise Question" and "mortality" or "Gold standards Framework" and "mortality".Additionally, the references of all included articles and review articles were assessed manually to identify any additional relevant publications.We limited our search to studies on human subjects, including both adult and paediatric populations, and to articles published in English, or for which an English translation was available.No other filters were applied.

Study selection
AG performed the search and SS adjudicated the search strategy before and during the time it was applied to the respective databases.Studies identified from database searches were screened independently by AG and SS.The first selection criterion was that the title or abstract included either 'Surprise Question' or 'Gold Standards Framework' with any study not meeting this criterion excluded.We placed no restrictions on study design, although at full review we required studies to report mortality data divided by whether patients received a 'surprised' or a 'not surprised' response from a healthcare professional and, therefore, all studies were prospective and observational in nature.We excluded studies where it was not possible to determine the sensitivity, specificity, positive and negative predictive values (NPV) and accuracy of the Surprise Question for the population studied.Where these data were unavailable, but the article appeared potentially relevant, applications for raw data were Systematic review made to corresponding authors.No restrictions were placed on the setting, disease studied, timeframe evaluated or healthcare professional providing the response.Any discrepancies were resolved by meeting between AG and SS.The option for unresolved discrepancies to be adjudicated by a third reviewer (KW) was never required.

Quality assessment of studies
Each study was assessed independently by AG and SS who met and discussed the study designs.As studies were observational in nature, each rater independently completed the Newcastle-Ottawa Scale.Any discrepancies could be adjudicated by a third reviewer (KW), although none was.The Newcastle-Ottawa Scale rates observational studies based on three domains: selection, comparability between the exposed and unexposed groups and exposure/outcome assessment.The scale assigns a maximum of four stars for selection, two for comparability and three for exposure/outcome.In line with the Agency for Healthcare Research and Quality standards, the quality of the studies was categorised into either good, fair or poor.Good-quality articles were those which received three or four stars in the selection domain, and one or two stars in the comparability domain and two or three stars in the exposure/ outcome domain.Fair-quality articles received two stars in the selection domain, and one or two stars in the comparability domain and two or three stars in the exposure/outcome domain.Finally, those which were of poor quality received either 0 or one star in the selection domain, or 0 stars in the comparability, or 0 or one stars in the exposure/outcome domain.

Data extraction
Data from included studies were extracted independently by AG and SS, who recorded study design, setting (primary care, outpatient, emergency department or inpatient), medical or surgical specialty, number of patients, timeframe assessed and type of healthcare professionals providing responses.Where data were reported from separate participants, responses were pooled into an overall estimate, with responses from other healthcare professionals then assessed separately in subgroup analyses.Where data were reported from separate time points from the same cohort, these were analysed separately.Two-by-two tables were compiled for each study and relevant subgroups to determine the predictive value of the Surprise Question.The sensitivity was the proportion of patients who received a 'not surprised' response and subsequently died, whereas the specificity was the proportion of patients who received a 'surprised' response and subsequently died.The positive predictive value (PPV) was the proportion of patients who received a 'not surprised' response and subsequently died, and the NPV was the proportion of patients who received a 'surprised' response and subsequently survived.Accuracy was the proportion of patients correctly predicted by the Surprise Question.These are presented alongside 95% CIs for the overall comparisons of each study and subgroup analyses, with heterogeneity estimated by the I 2 statistic.Event (death) rates were calculated for each study by dividing the total number of deaths by the total cohort size, expressed as a percentage.

Data analysis
We synthesised estimates of the accuracy of the Surprise Question using a random effects meta-analysis model using the meta-analysis function in STATA V.16 (StataCorp LLC, College Station, Texas).We used the restricted maximum-likelihood model, which was used for calculating τ 2 .Overall comparisons were calculated, and then where appropriate, studies were divided by event rate, setting, specialty, timeframe of follow-up and healthcare professional.Where the Surprise Question was reported separately from different healthcare professional groups, we pooled responses to calculate an overall estimate (if this was not provided in the manuscript) with individual group responses recorded separately.Where studies reported responses from timeframes other than 1 year, these estimates were not included in the overall comparisons and were reported separately.

RESULTS
The search of four electronic databases identified 4062 records, with 2575 articles remaining after the removal of duplicates.Of these, 2494 were excluded after the screening of the titles and abstracts, usually because they were not relevant or did not report original data (figure 1).Of the 81 retrieved articles, 26 were excluded after full-text review because data were not available to calculate the accuracy of the Surprise Question even after request to the corresponding author.Two articles were identified from the references of included studies.A total of 57 studies met the full inclusion criteria, however two reported data from the same population but using different timeframes.Our final dataset, therefore, consisted of 56 distinct cohorts, including a total of 68 829 unique patients.

Study characteristics
The characteristics of the individual studies are displayed in table 2, all of which were prospective, observational cohort studies.Most studies reported data from adult patients (although the age and sex were often not reported).One study was conducted in a paediatric population.The majority of studies reported data from Europe or the USA, but the dataset included studies from all global regions.Forty-two studies chose a timeframe of 1 year to assess the prognostic accuracy of the Surprise Question, with other studies reporting the accuracy between 1 day and 3.3 years.Twenty-three (41.1%) studies were conducted in Systematic review outpatient settings, 16 (28.6%) in hospitalised patients, 7 (12.5%) in primary care, 5 (8.9%) in the emergency department and 5 (8.9%) in community settings.

Accuracy according to event rates
The overall mortality rates ranged from 2.6% to 81.5%.We divided studies into three groups to explore how the accuracy of the Surprise Question varied by mortality rate.Nineteen studies reported <14% of their patients dying.In these studies, respondents identified in 62% of cases those that died (sensitivity 0.62 (0.53 to 0.72) I 2 =96.3%) and those that did not in 74% of cases (specificity 0.74 (0.64 to 0.83) I 2 =99.9%).Eighteen studies reported a mortality rate of 14%-23%.In these studies, respondents successfully predicted death in 61% of cases (sensitivity 0.61 (0.55 to 0.66) I 2 =84.3%) and survival in 77% of cases (specificity 0.77 (0.72 to 0.81) I 2 =97.0%).More than 23% of patients died in the remaining 19 studies.In these studies, respondents performed best when identifying patients who were likely to die (sensitivity 0.83 ((0.79 to 0.87) I 2 =92.3%).They successfully identified those that would not die in 56% of cases (specificity 0.56 (0.46 to 0.65) I 2 =98.4%).
Where respondents predicted death, the proportion of patients that actually died was greatest in those studies with an event rate >23% (PPV 0.58 [0.49 to 0.66] I 2 =99.4%) and lowest in those with an event rate<14% (PPV 0.22 [0.18 to 0.27] I 2 =98.0).Conversely, when respondents predicted survival, the proportion of patients that survived was greatest in studies with lower event rates (NPV 0.96 (0.95 to

Accuracy according to setting
Respondents were most reliably able to identify those patients at risk of dying within the follow-up time in community settings (sensitivity 0.83 (0.71 to 0.95), I 2 =97.4%), whereas in studies performed in primary care settings, the sensitivity was lowest (0.63 (0.43 to 0.83)], I 2 =97.0%) (figure 2).Conversely, respondents in studies set in primary care were most successful in being able to identify those that would survive (specificity 0.77 (0.63 to 0.91), I 2 =99.7%), whereas respondents in community care settings including nursing homes and hospices were least able to identify those that survived (specificity 0.52 (0.35 to 0.68), I 2 =98.5%) (figure 3).When respondents predicted death, this was correct most commonly in intensive care units at (PPV 0.57 (0.46 to 0.68), I 2 =95.2%) and incorrect most commonly in the outpatient setting (0.34 (0.28 to 0.40) I 2 =97.1%).When respondents predicted survival, this was most commonly the case in primary care patients (NPV 0.93 (0.88 to 0.98), I 2 =99.4%) and incorrect most commonly for hospital inpatients (NPV 0.83 (0.75 to 0.91), I 2 =99.2%).

Accuracy according to specialty
We observed significant heterogeneity in the performance of the Surprise Question according to specialty.Respondents were most successful at predicting death in the paediatric cohort (sensitivity 0.88 (0.75 to 1.02)) and were incorrect mostly in respiratory cohorts (sensitivity 0.56 (0.36 to 0.77) I 2 =72.9%) (figure 4).The Surprise Question performed best in acute medical patients when identifying those that were not at risk of dying (specificity 0.92 (0.89 to 0.95)) and worst in oncology patients (0.57 (0.41 to 0.73) I 2 =99.1%) (figure 5).The proportion of patients who died when the respondent predicted death ranged from 30% (PPV 0.30 (0.19 to 0.41) I 2 =97.6%) in cardiology patients to 68% (PPV 0.68 (0.60 to 0.76)) in acute medical patients.The proportion of patients who survived when the respondent predicted survival was generally consistent across all specialties but was lowest in oncological patients with a value of 76% (NPV 0.76 (0.60 to 0.91) I 2 =99.5%).The accuracy was greatest in acute medical patients (0.86 (0.82 to 0.89)) and lowest in cardiology and general medical patients (0.63 (0.54 to 0.71) I 2 =91.9% and 0.69 (0.55 to 0.83) I 2 =97.4%, respectively).

Accuracy according to follow-up period
3][14][15][16][17][18][19][20][21][22][23][24] In these studies, the prompt successfully predicted death in 69% of cases (sensitivity 0.69 (0.56 to 0.82) I 2 =99.3%) and successfully predicted survival in 65% (specificity 0.65 ((0.49Sensitivity (the ability of the prompt to successfully identify those patients who were dying); specificity (the ability of the prompt to successfully identify those who were not dying); positive predictive value (the proportion of patients who died when the respondent predicted death); negative predictive value (the proportion of the patients who survived when the respondent predicted survival); accuracy (the proportion of correct predictions among all cases).

Accuracy according to respondent
Four studies reported responses to the Surprise Question from different healthcare professionals. 7 37 42 63tudies, where physicians and nurses together provided

Risk of bias of included studies
Full details of the risk of bias assessment are displayed in table 4. Overall, 41 studies were rated 'good' quality, with the remaining 15 studies being rated as 'poor' quality.The most common reasons for bias were failure to control for age and/or sex (n=14, 25.0%), failure to control for any other additional factors (n=13, 23.2%) or the method for ascertainment of outcomes not being documented (n=10, 17.9%).However, given the aim of the current study, study quality is unlikely to have had a significant impact on our analysis.

DISCUSSION
In this meta-analysis, the accuracy of the Surprise Question was assessed across a diverse range of studies including a total of 68 829 unique patients.In the overall pooled comparison, the accuracy of the Surprise Question was modest, in keeping with prior meta-analyses. 8 9We found its performance varied considerably according to the event rate of the population in which the prompt was applied, the healthcare setting, specialty, follow-up period chosen, and to whom the Surprise Question was posed.
We found that in studies where a greater proportion of the cohort died, clinicians were more reliably able to recognise this, in keeping with previous findings. 67One possible explanation is that where death is common, healthcare providers may become more realistic regarding patient prognosis, or more cognisant of the known predictors of poor outcomes for these patient groups.[10] In our study, the ability of the Surprise Question to successfully identify those patients who were dying in oncology settings was excellent (PPV 0.85 (0.79 to 0.91) I 2 =95.7%), and higher than most other disease groups with the exception of the paediatric cohort (PPV 0.88 (0.75 to 1.02)), for which it was similar.It may be the case that patients diagnosed with malignancies exhibit a more predictable and consistent disease progression compared with those with other chronic conditions such as heart failure or chronic respiratory disease, where disease trajectories often display greater variability and unpredictability. 68hen studies were divided by timeframe, the rate of identifying patients that were dying were similar (sensitivity <1 year=0.69(0.56 to 0.82) I 2 =99.3%; 1 year=0.68(0.63 to 0.74) I 2 =95.0%; >1 year=0.71(0.60 to 0.82) I 2 =93.4%).A prior meta-analysis found that there were no differences in the accuracy of the Surprise Question when study timeframes shorter than 1 year were included, although in a limited sample of studies. 9The ability of the prompt to identify those that were not at risk of death was lower for timeframes above 1 year (specificity 0.61 (0.43 to 0.78) I 2 =99.1%) compared with 1 year and <1 year (specificity 0.69 (0.63 to 0.75) I 2 =99.7% and 0.65 (0.49 to 0.81) I 2 =99.9%, respectively).The reduced specificity for timeframes exceeding 1 year implies its potential inaccuracy for identifying patients unlikely to die over

Table 4 Continued
Systematic review longer periods.This may raise concerns about overestimating the need for end-of-life care, potentially leading to unnecessary interventions for patients not in immediate need.Similar challenges have been observed in other prognostication models, highlighting the importance of cautious interpretation and further refinement in predicting longer term outcomes. 69Patients' health conditions and anticipated prognoses may change over time, leading to uncertainties in predicting their need for end-of-life care.Additionally, healthcare providers may find it more challenging to accurately assess and predict patients' needs for end-of-life care further into the future, as it involves a greater degree of uncertainty and more comprehensive assessments.
1][72] A study of paediatrician's survival predictions for premature new-born babies investigated whether physician's self-rated attitude of being an optimist or a pessimist affected prediction accuracy.This study found that those physicians who rated themselves as optimistic, produced survival estimates which were accurate and comparable to true survival rates, while pessimists' estimates consistently underestimated true survival rates. 73A further study of neonatologists in Italy concurred. 74This discrepancy may stem from the tendency of junior physicians to harbour more pessimistic attitudes, potentially affecting their predictive accuracy when compared with their senior counterparts, who tend to be less pessimistic and more precise in their assessments.
The Surprise Question is a core component of the Gold Standards Framework tool in the United Kingdom, which is recommended for use across primary and secondary healthcare settings to identify those nearing the end-of-life. 4Additionally, the Surprise Question has recently been endorsed in position statements by both the American Heart Association 5 and Japanese Cardiology Society/Heart Failure Society. 6Recently, the Centre to Advance Palliative Care convened a consensus panel, which recommended that a 'not surprised' response to the Surprise Question should trigger assessment for unmet palliative care needs. 67hile the Surprise Question is becoming more widespread and is widely endorsed, there are important practical considerations.One limitation lies in its reliance on the subjective judgement of healthcare practitioners, whose prognostic assessments may vary based on individual experiences and perceptions. 75One way of addressing this is by attempting to reach consensus.One study looked at the performance of the Surprise Question when utilised by a multidisciplinary team.When compared with a consensus that was restricted to either 100% or 75%-100% agreement among the multidisciplinary team, the analyses demonstrated that using a consensus opinion did result in a slightly lower overall accuracy, yet it did not significantly affect the prognostication results. 26A further study analysed the agreement of responses to the Surprise Question between different healthcare professionals for patients with heart failure.The study found the greatest agreement to be between cardiologists and heart failure nurse specialists, perhaps reflecting greater expertise and experience for these healthcare professionals compared with non-specialists. 7A further consideration is that the Surprise Question tends to result in an over classification of patients as 'not surprised'.The Surprise Question could, therefore, be a valuable prognostic tool to identify those unlikely to die, and as a prompt to consider advanced care planning and referral to specialist palliative care services in populations where a nocebo effect from palliative care interventions is not considered likely.
A high false-positive rate may not be necessarily viewed as detrimental to patient care, as this may encourage clinicians to consider an early integration of palliative care into the patient pathway for those in whom death is possible, however it may have implications for service delivery.A holistic patient assessment is integral in the decision to refer to palliative care services, as opposed to a prognostic estimate alone, which is only one consideration.The possibility of a nocebo effect may be a concern to some, however a palliative approach is unlikely to be detrimental to patient outcomes where it is implemented alongside usual care and is complimentary to it. 68Therefore, the Surprise Question may be useful for identifying patients who may benefit from an early integration of palliative care, it should not be used as the sole determinant of treatment decisions.

Strengths and limitations
Our data have several strengths over previous metaanalyses investigating the accuracy of the Surprise Question.Foremost, we include additional studies due to utilising a broad search strategy, including articles published or in press by 1 January 2024 as well as making requests to corresponding authors for unpublished data.Second, our analysis offers insights across a spectrum of healthcare settings, populations, follow-up intervals, respondents and event rates.Furthermore, each stage of the review process was conducted independently by two reviewers and the study protocol was registered prospectively.Some limitations should be noted.First, in 12 studies, the respondents to the Surprise Question were 'physicians and nurses' 14 20 26 35 38 44 47 49 50 52 64 65 and data were not available to separately calculate the accuracy of each healthcare professional.However, it should be noted that there is evidence to suggest that multiprofessional predictions on prognosis are more accurate than single-professional estimates. 26 76Second, 26 studies were excluded after full-text review due to

Figure 1
Figure 1 PRISMA flow diagram of article screening process.PRISMA, Preferred Reporting Items for Systematic Review and Metaanalysis.

Figure 2
Figure 2 Sensitivity of the Surprise Question by setting.REML, restricted maximum-likelihood.

Figure 3
Figure 3 Specificity of the Surprise Question by setting.REML, restricted maximum-likelihood.

Figure 4
Figure 4 Sensitivity of the Surprise Question by specialty.REML, restricted maximum-likelihood.

Figure 5
Figure 5 Specificity of the Surprise Question by specialty.REML, restricted maximum-likelihood.

Figure 6
Figure 6 Sensitivity of the Surprise Question by timeframe.REML, restricted maximum-likelihood.

Figure 7
Figure 7 Specificity of the Surprise Question by timeframe.REML, restricted maximum-likelihood.

Figure 8
Figure 8 Accuracy of the Surprise Question by respondent.REML, restricted maximum-likelihood.

Table 1
Search strategy OR ((GSF OR gold standards framework) AND (dying OR death OR mortality OR survival OR die OR outcome OR outcomes OR palliative OR end of life))) 21 CCRCT (((Surprise OR surprize OR surprising OR surprised) AND (question or questions) AND (dying OR death OR mortality OR survival OR die OR outcome OR outcomes OR palliative OR end of life)) OR ((GSF OR gold standards framework) AND (dying OR death OR mortality OR survival OR die OR outcome OR outcomes OR palliative OR end of life))) 207

Table 2
Characteristics of studies included in systematic review

Table 2 Continued
ContinuedSystematic review

Table 3 Continued
ContinuedSystematic review

Table 3 Continued
ContinuedSystematic review