Background The Prognosis in Palliative care Scale (PiPS) predicts survival in advanced cancer patients more accurately than a doctor's or a nurse's estimate. PiPS scores are derived using observer ratings of symptom severity and performance status. The purpose of this study was to determine whether patient-rated data would provide better prognostic estimates than clinician observer ratings.
Patients and methods 1018 subjects with advanced cancer no longer undergoing tumour-directed therapy were recruited to a multi-centre study. Prognostic models were developed using observer ratings, patient ratings or a composite method that used patient ratings when available or else used observer ratings. The performance of the prognostic models was compared by determining the agreement between the models' predictions and the survival of study participants.
Results All three approaches to model development resulted in prognostic scores that were able to differentiate between patients with a survival of ‘days’, ‘weeks’ or ‘months+’. However, the observer-rated models were significantly (p<0.05) more accurate than the patient-rated models.
Conclusions A prognostic model derived using observer-rated data was more accurate at predicting survival than a similar model derived using patient self-report measures. This is clinically important because patient-rated data can be burdensome and difficult to obtain in patients with terminal illnesses.
Statistics from Altmetric.com
Terminally ill patients, their families and healthcare providers want sufficient information to allow them to adequately ‘prepare’ for the end of their lives.1 This includes a desire to know about when they will die, and how they could expect their physical condition to deteriorate. Unfortunately, clinicians are poor at predicting survival in patients with advanced cancer.2 Prognostic questions arise on a day-to-day basis in clinical practice and any method that is superior to (or independent of) clinician estimates would be highly valued.
The Prognosis in Palliative care Scale (PiPS) was developed as a result of a large prospective multi-centre study involving over 1000 patients with advanced cancer.3 PiPS can be used in both competent and incompetent patients and is able to predict survival in terms of ‘days’, ‘weeks’ or ‘months+’. Alternate forms of the scales are available for use in patients in whom blood tests are available (PiPS-B) and in patients in whom such results are unavailable (PiPS-A). Both scales are at least as accurate as a multi-professional estimate of survival and PiPS-B is significantly better than an individual doctor's or nurse's prediction. Both scales use clinician proxy ratings about the presence or absence of key symptoms, level of physical functioning and overall health of the patient.
However, there are limitations to using proxy ratings for subjective measures and as part of the PiPS study we wanted to determine whether or not clinician ratings could be substituted for patient reported outcomes without adversely affecting the accuracy of the scales. On a practical level, this is clearly important because many palliative care patients are incompetent and a prognostic scale that was only reliable when completed by a competent patient would have limited clinical utility. We therefore undertook a planned subanalysis of the PiPS data set to determine whether patient reports or clinician estimates were superior in terms of prognostic accuracy.
Patients and methods
A detailed report of the methods used in the PiPS study has been published elsewhere.3 Briefly, consecutive referrals to participating palliative care units were screened for inclusion. Eligible patients had locally advanced or metastatic cancer and were no longer undergoing disease-modifying treatment. Both competent and incompetent patients were recruited. Competence to participate in the research was assessed by the attending clinician using Department of Health guidance.4 Competent patients gave written informed consent and the assent of carers was obtained for incompetent patients. A core data set was collected on all participants and an extended data set was collected from competent patients.
Core data set
The researcher collected data on the presence or absence of the following symptoms: pain, breathlessness at rest, loss of appetite, dry mouth, difficulty swallowing and tiredness. Performance status was assessed using the Eastern Cooperative Oncology Group (ECOG)5 score. ECOG scores vary between 0 and 4 (0 = normal functional abilities, 4 = confined to a bed or chair and requires all care). Global health status was recorded using a study-specific 7-point scale (1 = extremely poor health, 7 = normal health).
Other information collected by the research team was obtained directly from patients' notes (eg, demographic and disease-related variables) or from direct observation (eg, pulse rate or the presence of peripheral oedema).
Extended data set
Competent patients were asked to provide self-reports on all of the proxy domains assessed by clinicians. Thus, paired data (patient and clinician) were available for key symptoms, performance status and global health score.
Nurse, doctor and multi-professional estimates of survival were also obtained along with patient-reported estimates of survival. These results are presented elsewhere. Participants were also ‘flagged’ with the National Health Service Information Centre, so that the research team was informed when the patient died.
In order to create a prognostic score that could estimate survival in terms of ‘days’, ‘weeks’ or ‘months’, it was necessary to develop two separate prognostic models, one model to predict survival up to 2 weeks (14 days) and another model to predict survival up to 2 months (56 days). These two models could then be combined to predict whether a patient would survive for ‘days’ (ie, less than 2 weeks), ‘weeks’ (between 2 weeks and 2 months) or ‘months+’ (ie, 2 months or greater).
The original PiPS observer-rated models (PiPS/OR) were developed using backward stepwise logistic regression. In order to correct for over-optimism during model development, the models were further evaluated using a bootstrap technique.6 ,7 The PiPS/OR-14 models were used to predict survival up to 14 days both in patients in whom blood results were available (PiPS/OR-B14) and in those in whom blood results were not available (PiPS/OR-A14). Separate models were also created to predict survival up to 56 days (PiPS/OR-A56 and PiPS/OR-B56, respectively). The performance of each of these models was assessed by calculating the area under the receiver operating characteristic curve (AUC). Finally the models were combined to provide an estimated prognosis in one of three distinct categories; ‘days’ (<14 days), ‘weeks’ (14–55 days) and ‘months+’ (>55 days). The performance of these combined models was assessed by calculating the absolute agreement between the model prediction and actual survival and by use of the linear-weighted κ-statistic which gives an average measure of chance corrected agreement for the two thresholds.8 The relative prognostic value of the models was compared using the AUC statistic, the linear weighted κ and the level of absolute agreement between model predictions and actual survival.
Two further groups of models were also created. The PiPS patient-rated models (PiPS/PR) were developed by using the self-report data from competent patients. The PiPS composite-scoring models (PiPS/CM) were created using a substitution rule. When patient generated data were available then this was used; however, when patient-rated data were missing then observer-rated data were used instead. As with PiPS/OR, both 14-day and 56-day survival models were created and a combined categorical model was produced (‘days’, ‘weeks’ and ‘months+’).
Quadratic weighted κ was used to evaluate the concordance between patient-rated and observer-rated symptom scores. Although weighted κ scores are the preferred method of comparing the concordance between predicted survival and actual survival, they do not lend themselves easily to an intuitive interpretation of meaning. Landis and Koch9 have proposed the following guide for interpreting the κ coefficient: ≤0 = poor, 0.0–0.20 = slight, 0.21–0.40 = fair, 0.41–0.60 = moderate, 0.61–0.80 = substantial and 0.8–1.0 = almost perfect. However, these cut-off points are by nature rather arbitrary and do not take into account factors such as the weighting applied (linear or quadratic), the relative prevalence of the categories or the number of categories being compared.
Paired data (matched patient and observer ratings) were available for 708 subjects (mean age = 71.7 years, proportion of men 399/708, 56%; median survival 50 days). The levels of agreement between patients' and clinicians' ratings are shown in table 1. Concordance was the lowest for ‘dry mouth’ (quadratic-weighted κ=0.27) and the highest for ECOG performance status (quadratic-weighted κ=0.68).
In total, 12 different prognostic models were created: four models derived using observer ratings only (PiPS/OR-A14, PiPS/OR-A56, PiPS/OR-B14 and PiPS/OR-B56), four models using patient-rated data only (PiPS/PR) and four models using composite data (PiPS/CM). The variables included in each of the models are shown in table 2. The PiPS/OR models had greater AUCs than either the PiPS/PR or PiPS/CM models (table 2), although it should be noted that these models were developed in slightly different populations (due to the differing pattern of missing data).
The models for 14-day and 56-day survival were then combined to produce an overall prognostic prediction in terms of ‘days’ (<14 days), ‘weeks’ (14–55 days) or ‘months+’ (>55 days) for each of the three types of data (observer-rated, patient-rated or composite). The accuracy of these models was evaluated by comparing the agreement between the model predictions and actual survival (table 3). Both of the observer-rated models provided statistically superior agreement with actual survival compared with the patient-rated models (p<0.001). The models developed using a composite rating method were not significantly better than predictions derived using observer ratings alone.
Statement of principal findings
We have found that observer ratings about the presence or absence of key symptoms, performance status and global health status are at least as good as patient-reported or composite scores in terms of prognostic ability. This is an important finding. Observer ratings are much easier to obtain and do not rely on the competence or otherwise of patients. Moreover, even in competent patients, obtaining patient-reported data imposes an extra burden on an already vulnerable population and does not improve prognostic ability.
Strengths and weaknesses
This was a large study that was specifically designed to answer the question as to whether patient reported outcomes provided better prognostic information than observer ratings or a mixture of the two. As far as we are aware, no previous study has systematically addressed this issue. The strengths of our study include the large sample size, the prospective design, the inclusion of both competent and incompetent patients and the use of bootstrap statistical methods to minimise over-optimism during model development. However, there were a number of limitations. No specific instructions were given to clinicians about how to obtain the proxy ratings. Clinicians were simply asked to indicate whether they believed the patient to be experiencing the presence or absence of particular symptoms. Some clinicians may have obtained these data by directly asking the patient themselves, from direct observation of patient behaviour, by discussion with other members of the healthcare team or by information obtained from carers. This unstructured approach was adopted because it reflected the reality of the clinical situation and one of the key features of the design of the PiPS study was an attempt to maintain the ecological validity of the research process. Nonetheless, this lack of standardisation means that we cannot give specific advice about the best way to obtain proxy ratings for prognostic decisions and it is possible that other clinicians may not perform as well at providing ratings as the clinicians who participated in this study.
For similar reasons, our desire to keep all assessments to a minimum meant that key terms were not precisely defined. Thus for instance, clinicians were asked to judge whether or not a patient was experiencing pain (yes/no), but no instructions were given about what level of pain should be considered significant, how long or how frequently the patient was experiencing pain for or whether it was adequately controlled with analgaesia. Given the relatively large number of symptoms that were assessed during this study it was considered that the use of more structured symptom assessment tools (such as the Brief Pain Inventory)10 would have imposed too great a burden on both patients and clinicians and (even if it worked in a research setting) would be likely to lead to an unwieldy prognostic instrument.
It should also be remembered that none of the prognostic models described in this paper has yet undergone external validation in an independent data set.11 As such, the assessment of the performance of each of the models can only be considered to be provisional.
Relationship to other studies
A number of studies have previously investigated the reliability or otherwise of using proxy ratings by clinicians or carers when patient self-report data are unavailable.12,–,14 The Palliative Care Outcome Scale is a widely used audit tool for patients with advanced disease.15 It consists of 10 items covering pain, other physical symptoms, psychological distress, social concerns and quality of life. There is both a patient-rated and a staff-rated component to the scale. During the validation of Palliative Care Outcome Scale, 145 subjects (32% of the study population) were able to complete the patient-rated component of the scale. Patient-rated scores were found to correlate acceptably well with staff-rated measures (linear-weighted κ values were >0.3 for 8/10 items).
Martin and colleagues16 have recently reported on the prognostic significance of nutritional variables and patient-reported performance status in patients with advanced cancer. They studied 1767 palliative cancer patients who completed the Patient Generated Subjective Global Assessment (PG-SGA). The PG-SGA is a nutritional screening tool including self-report data on weight change, dietary intake and gastrointestinal symptoms.17 It also includes a self-report version of the ECOG performance status. Clinicians completed the Palliative Performance Scale (PPS) for each of the study participants. The PPS is an observer-rated performance status measure which has been specifically developed for use in patients with advanced disease.18 The authors reported that the most parsimonious predictive model included only two variables, diagnosis and performance status. Patient-reported and clinician-rated performance status measures were equally good at predicting survival in this population, leading Martin and coworkers16 to recommend that patient-reported performance status may be of considerable practical utility as a prognostic tool. In contrast, we found that patient-reported outcomes (including patient-rated ECOG scores) provided less accurate prognostic information than clinician ratings. Moreover, we found that many patients (24% of study participants) were unable to complete patient rating scales because of cognitive impairment.
The difference between our findings and Martin's may at least partly be explained by differences in the disease burden of the participants. The median survival of patients in Martin's study was 3.2 months (90 days), compared with 34 days for the 1018 subjects in our own study3 and the proportion of patients with ECOG scores between 0 and 2 was 52% in Martin's study compared with our figure of 29%.
Quality of life measures are known to have prognostic significance across a variety of different cancers.19 In terms of straightforward quality of life assessment, the patient's perspective is usually taken to represent the ‘gold standard’. One might therefore have expected that observer ratings of symptoms, performance status and overall health would be a poor substitute for patient-generated data. In this large, multi-centre, prospective study we found that observer ratings are acceptable and may in fact be superior to patient-reported data for the purposes of prognostication. This is important because it is not always possible or practical to obtain patient-reported data from patients with advanced cancer, either because of debility, fatigue or cognitive impairment.
However, it is not entirely clear why observer ratings should be so effective at providing prognostic information. Is it possible that patients are not the best judges of their own performance status? Theoretically, performance status (as measured by ECOG) could be ‘objectively’ assessed using real-time activity monitoring. Actimeters are portable accelerometers20 which can record the proportion of the day that subjects are active, and have been used extensively in studies of fatigue, sleep disturbance and circadian rhythms.21,–,23 Patients with ‘paradoxical insomnia’24 under-report the amount of sleep that they experience. Might real-time activity monitoring also reveal that some patients are similarly inaccurate in estimating their level of day-time activity? Inclusion of actimeters in future prognostic studies may provide a valuable insight into whether patients' or clinicians' perspectives more closely mirror reality.
Although activity monitoring may provide an objective measure of performance status, no such ‘external’ measure exists with which to assess the veracity of patients' reports of symptoms or global health status. It must be assumed therefore that patients' reports have the legitimacy of representing the way that the patients' themselves truly feel about their situation. However, just because patients' reports may be the most important source of information about the subjective experience of being ill, it does not necessarily follow that their own assessments will carry more prognostic significance. It is conceivable that clinicians' assessments are more valuable precisely because they are made through the filter of a healthcare professionals' clinical experience. Thus, some patients may report that they are not experiencing pain when in fact they have merely grown accustomed to a low level of background discomfort which is now considered simply normal. This phenomenon of resetting the internal calibration for symptom appreciation has been termed a ‘response shift’.25 Patients' self-reports may also be affected by numerous factors that are difficult to quantify such as cultural background or personal and family history.26 In these circumstances, perhaps a clinician's assessment that the patient is in fact in pain may be more relevant than the patient's own perspective. The restricted nature of the data collected in this study makes it impossible to explore these issues further. However, future researchers might want to consider undertaking some nested qualitative interviews with both patients and clinicians exploring how they arrived at their overall assessment about the symptom prevalence and global health status.
SB is funded by Macmillan Cancer Support and the NIHR CLAHRC (Collaborations for Leadership in Applied Health Research and Care) for Cambridgeshire and Peterborough. The authors would like to thank the following colleagues for their help with this study; Rehana Bakawala, Professor Mike Bennett, Teresa Beynon, Dr CathBlinman, Dr Patricia Brayden, Helen Brunskill, Dr Kate Crossland, Dr Alison Cubbitt, Rachel Glascott, Anita Griggs, Anne Harbison, Debra Hart, Dr Philip Lomax, Dr Caroline Lucas, Dr Wendy Makin, Dr Oliver Minton, Dr Paul Perkins, Marek Plaskota, Dr Dai Roberts, Katie Richies, Dr Susan Salt, Ileana Samanidis, Dr Margaret Saunders, Dr Jennifer Todd, Dr Catherine Waight, Dr Nicola Wilderspin, Dr Gail Wiley and Julie Young. The authors would also like to thank Professor John Ellershaw for chairing the steering committee and Robert Godsill for providing a service user perspective. The authors thanks Rosie Head for administrative support and data management. The authors would also like to thank the following hospices and palliative care units for their participation in the study; Arthur Rank House (Cambridge), Worcestershire Royal Hospital, St John's Hospice (Lancaster), Gloucestershire Hospitals NHS Foundation Trust, The Pasque Hospice (Luton), Guy's and St Thomas' NHS Foundation Trust (London), Princess Alice Hospice (Esher), Bolton Hospice, St Catherine's Hospice (Crawley), St George's Hospital NHS Trust (London), Surrey and Sussex Healthcare NHS Trust, St Ann's Hospice (Manchester), Christie Hospital NHS Foundation Trust (Manchester), Nightingale Macmillan Unit (Derby), Trinity Hospice (London) and Trinity Hospice (Blackpool).
Funding CRUK grant number C11075/A6126.
Competing interests None.
Patient consent Obtained.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.