Objectives We report the use of difference in differences (DiD) methodology to evaluate a complex, system-wide healthcare intervention. We use the worked example of evaluating the Marie Curie Delivering Choice Programme (DCP) for advanced illness in a large urban healthcare economy.
Methods DiD was selected because a randomised controlled trial was not feasible. The method allows for before and after comparison of changes that occur in an intervention site with a matched control site. This enables analysts to control for the effect of the intervention in the absence of a local control. Any policy, seasonal or other confounding effects over the test period are assumed to have occurred in a balanced way at both sites. Data were obtained from primary care trusts. Outcomes were place of death, inpatient admissions, length of stay and costs.
Results Small changes were identified between pre- and post-DCP outputs in the intervention site. The proportion of home deaths and median cost increased slightly, while the number of admissions per patient and the average length of stay per admission decreased slightly. None of these changes was statistically significant.
Conclusions Effects estimates were limited by small numbers accessing new services and selection bias in sample population and comparator site. In evaluating the effect of a complex healthcare intervention, the choice of analysis method and output measures is crucial. Alternatives to randomised controlled trials may be required for evaluating large scale complex interventions and the DiD approach is suitable, subject to careful selection of measured outputs and control population.
Statistics from Altmetric.com
In evaluating the effects of complex healthcare interventions such as large system service reorganisation, the gold standard randomised controlled trial is not always feasible. We report on the use of difference in differences (DiD) analysis to assess the effects of the Marie Curie Delivering Choice Programme (DCP), a service redesign initiative which aimed to enable people to achieve their preferences for end-of-life care.1 ,2 The challenge for researchers is how to examine rigorously changes in outputs that arise following such an intervention. Simple estimates of resource use do not improve understanding of change over time and the relation to causality, be it complexity of illness, individual need or preference, or the choices that are available through new models of care. In addition, in the UK there have been a number of recent government-led initiatives that influence ongoing change in end-of-life care such as the Department of Health National Cancer Plan,3 national service framework strategies for non-cancer conditions,4 ,5 the Department of Health End of Life Care Strategy6 and strategic changes in end-of-life care investment patterns. Against this complex and changing background, simple before and after analysis of resource use by an end-of-life population is not informative. In addition, the numbers of patients participating in and affected by local service redesign initiatives are often small and may be subject to a range of selection biases.
We chose to explore the use of DiD analysis. This method is used to investigate the effects of an intervention over time on two groups, one subject directly to the intervention and one control group at a suitably matched comparator site.7 This allows researchers to control for the possibility that both groups have changed over time for reasons unrelated to the intervention. In effect, the analysis isolates the impact of the intervention by removing other known or unknown factors which may have affected either or both groups during the study period. Such changes might include alterations to national policy or demographics—such as an ageing population—that would affect the intervention and comparator populations. If there were no control group then any changes seen in the intervention group might be ascribed to the intervention itself without accounting for the possible effects of other factors.
Our primary aim was to assess whether DiD is a suitable and robust approach for identifying significant changes in measured outputs following the introduction of a system-wide complex intervention, in this case the DCP. We compared four measures of output at intervention and control sites: (1) place of death; and over the last 8 weeks of life, (2) number of hospital admissions, (3) length of each admission and (4) the total cost per patient of providing inpatient care. These measures of outcome were selected according to the UK policy aims of reducing unnecessary inpatient admissions at the end of life and supporting people to be cared for and die in the place of their choice.
This service redesign initiative1 ,2 aims to improve the care of people who are dying from all advanced illnesses and increase the opportunities for them to be cared for and die in the place of their choice. It is a complex intervention in which a number of components in local health systems are adapted to deliver change. Details of two local service redesigns are published elsewhere.2 Examples include introduction of discharge liaison services, facilitating transfer to home from hospital using specialised ambulance provision and rapid response teams to deal with emergencies arising in the home. The intervention is developed following theoretical and practical pilot work to assess the current state of service delivery in that locality, followed by consultation with local stakeholders to devise and refine a menu of new services. These processes conform to the guidance proposed by the Medical Research Council (MRC) on the development of complex healthcare interventions8 and use action research to achieve new models of care. The MRC guidance goes further to suggest ways to assess the effects of an intervention in practice. During DCP, much process data is collected. Knowledge of effects on patient flows and pathways is supplemented by qualitative data collected from patients and carers to increase understanding of their experience of the new service models. Use of DiD may provide a rigorous quantitative method to assess change in measurable outputs over time.
Ethical approval for this analysis was obtained from Oxfordshire Research Ethics Committee B for the release of anonymised hospital activity data for the participating primary care trust (PCT) and from a comparator PCT within a similar site in another city matched on age, ethnicity and socioeconomic factors. In this paper, the intervention and comparator site remain anonymised.
All individuals living within the PCT in which the intervention was delivered were able to access the intervention. There was no a priori selection of patients, randomly or otherwise, as the service redesign was applied across the entire local health economy. Data on all individuals who died between April 2006 and March 2008 were obtained from PCTs at the test site and comparator site, and included the corresponding hospital admission data. The control site was chosen to match as closely as possible the test site to minimise the risk of any bias arising in the results owing to differences between populations. Factors that influenced selection of the control site were the population size and characteristics, and the Trust's structural design. Both sites provided anonymous data on admissions and deaths, including place of death.
We considered hospital use for the last 8 weeks of life. This time period was chosen to represent the period during which patients were most likely to require the end-of-life care services provided through the DCP. Data on the last 8 weeks of life were extracted from the data set and costs assigned to patients on the basis of the Healthcare Resource Group (HRG) codes.
Difference in differences
The DiD approach was selected as an approach that allowed for comparisons over time between non-random populations and7 ,9 by comparing the treatment group before and after an intervention with a control group from a suitably matched comparator control site that did not receive the intervention.10 The DiD analysis is a quasi-experimental method used to measure the effects of a treatment or intervention over time. Analysis was undertaken for four outputs (1) place of death, (2) number of admissions, (3) length of stay and (4) costs. Data were obtained from the intervention and control PCTs for all patients who died 1 year prior to introduction of the intervention (April 2006–March 2007) and for 1 year afterwards (April 2007–March 2008).
The analysis is a fixed-effect multiple regression model. We tested a DiD effect between the intervention and control PCTs for patients aged over 18 years registered with a general practitioner (GP) within the PCT between the period April 2006 to March 2008 using inpatient attendance data supplied by the relevant PCTs and national reference cost data. Patients without full records available were excluded from the analysis. Variables (1–3) were calculated for hospital admissions over the last 8 weeks of life. We considered hospital activity only within the last 8 weeks for periods of care which commenced before but overlapped the 8 weeks period and adjusted for age, sex and admission method (elective or non-elective).
We adjusted for time trends and focused at the level of the month of any observation. This required the model to account for correlation between months. For all outputs, fixed effects models with random intercepts were fitted, controlling for autoregression but not heteroskedacity. This model type can be used with time-dependent observations and enables any change due to the intervention to be isolated from changes that occur over time due to other trends in the data (eg, an increase in hospital admissions during colder winter months). A paired Z test was used to test for statistical significance.
Baseline differences between the populations are not normally tested using DiD methods. The DiD approach seeks to measure the effect of a change while controlling for many unknown factors (unobserved heterogeneity). Therefore the comparator is picked based on a priori expectations that it exhibits similar unknown factors as the intervention. We measure the effect of the DCP in the intervention site. The non-intervention site then acts as a proxy controlling for the unobserved heterogeneity.
Analysis at the level of the PCT, HRG, age and sex groups, separately by preintervention and postintervention time period and month allowed for the long-term growth in costs and attendance numbers, but assessed the institutional intervention at the level of the institution at which it occurred. The observational unit was the averaged output variable per month per age category (18–108 in categories 5 years wide) per PCT. For the analysis of bed days and admissions, HRG was treated as a fixed effect, essentially fitting a separate model to each HRG within each age and sex group and PCT. This is a type of case-mix adjustment, and allowed for the possibility of different intensities of resource use in different HRGs. Cost data were summed across HRGs and analysed at the level of age group, sex and admission method (elective/non-elective) within PCTs. Costs for the entire study were estimated using National Health Service (NHS) national tariffs for 2010–2011.11 No discounting was required as all costs were estimated using the same base year.
Serial dependence was fitted as a simple autoregressive model, since there were insufficient time points to test seasonal effects. A separate seasonal (winter) term was included in the model to adjust for excessive attendances in winter. The starting model was fitted with all possible interactions. Non-significant terms were removed using backwards stepwise removal, but the key terms of interest (intervention group, study period and the interaction of these two terms) were retained regardless of significance. The interaction of study period and intervention group constitutes the DiD effect, adjusted in this case for the growth over time of attendances within the PCT, and any differences between significant age and sex groups, and assessed separately within each HRG.
Descriptive statistics of intervention and control sites for the four output measures are shown in table 1. Primary DiD grouped analysis results are presented in figures 1⇓⇓–4. We present the difference between the control site and the intervention site at base year, change in both sites over the study period and the difference between the two sites at the end of the study period. There were small changes following the intervention period in the intervention site. The proportion of home deaths and median cost increased slightly, while the number of admissions per patient and the average length of stay per admission decreased slightly. None of the changes that occurred during the period of analysis were found to be statistically significant when the intervention site was compared with the control site.
Figures 1⇑⇑–4 show unadjusted results for the grouped analysis. Data in the figures are presented as changes over time, with the intervention and control sites plotted on the same graph. The time point where the intervention was introduced is marked by the vertical line.
Place of death
Due to restrictions in the data provided by the intervention group PCT, place of death is classified as either a home death or not a home death (figure 1). An aim of the DCP was to increase the numbers of people who were able to die in the place of their choice, working under the assumption that more people would prefer to die at home than currently do so.12 There is wide variation in the number of patients dying at home in any given month over the analysis time period and there is no clear trend in place of death over time. It cannot be concluded that the DCP increased the number of home deaths in the test site compared with the control.
Average number of admissions
Over time, there was a small but statistically significant reduction in the average number of admissions per patient at the test site over the study period (figure 2). However, when set against the control site, this difference disappears. It therefore cannot be concluded that the DCP has reduced the number of admissions per patient in the last 8 weeks of their life.
Length of stay
A small but significant decrease in length of stay was observed in the intervention site relative to the control site over the period of the evaluation (figure 3). No statistically significant reduction in length of stay was observed between the two sites. As with home deaths and number of admissions, wide monthly variations are clearly present. It cannot be concluded that the DCP led to a reduction in length of stay.
The DiD shows no statistically significant change between the sites over the study period (figure 4). There is a wide difference in mean cost per admission between the two sites, though the reasons for this are not clear, nor does such a difference have an impact on whether costs have changed as a result of the DCP. As with the other measures of output, it cannot be concluded that the DCP has led to any changes in the mean cost of care per admission.
The results we present from the DiD analysis do not suggest that there was any statistically significant difference between the test site and the comparator site for any of the measured outputs. That is to say, any changes observed in the measured outputs during the study period cannot necessarily be attributed to the DCP. This is not to say that the DCP has no value or that it has not improved the care of the dying in the test site. For example, an evaluation of qualitative evidence collected for two elements of the service redesign showed that quality of care and patient choice at the end of life did improve.13
However, the DiD methodology has been demonstrated as useful in previous studies7 ,9 and it was an appropriate method for use in our evaluation. Designing a study using such a methodology for evaluation presents difficulties, and choices made during the study design phase have a significant impact on the reliability and accuracy of results. This paper highlights some of these difficulties. Alternative analytical approaches to DiD analysis may be possible. Change point detection methods were considered, as they are suitable for examining whether and when a change has occurred within a data set. Such an approach is often used with time series data.14 However it was unsuitable in this context as we anticipated large seasonal effects. These seasonal effects would potentially mask broader changes as data for only a single year preintervention and postintervention were available. We are not aware of any additional accurate analytical approaches suitable for our data and study aims.
The choice of output measures is an important starting point for any evaluation. In this instance, our high level quantitative outputs act merely as proxy measures for the true objective of the DCP—to provide high quality care for the dying. These outputs should be considered in conjunction with the qualitative evidence collected to understand the full impact of the programme. Three of the outputs chosen for this analysis, namely place of death, number of admissions to hospital and length of stay in hospital, may closely reflect quality of care while the fourth, costs per hospital admission does not. While these measures are not a true reflection of patient outcomes, they do provide important information about healthcare processes. Although any number of alternative outcome measures may be considered, the three process outcomes measured are often accepted as proxies of clinical outcome in circumstances in palliative care research where patient reported outcomes are not available.1 ,2 It is possible to assume a link between the negative aspects of processes and an expected improvement in health outcomes from changes to the frequency with which individuals experience each process. In the absence of more appropriate measures, these three outputs are adequate, though the strength of the relationship between changes in each process measure and outputs is unknown.
Place of death
While it may appear desirable to increase the numbers who die at home based on the belief that more people wish to die at home than so do at present,15 place of death does not necessarily correlate with quality of death. People may die at home for a variety of reasons. This analysis found that approximately 19% of patients in the intervention site died at home; this compares with survey evidence that as many as 64% of the population express a preference for a home death.15 By simply examining place of death, important information on the quality of the death may be missed, or worse, assumed.
A reduction in hospital admissions and length of stay may indicate a better quality death, but not necessarily better quality care. We have assumed that a reduction in these measures is desirable. However, it would need to be shown that patients who spent less time in hospital achieved better health outcomes and that patients who would be most appropriately cared for in hospital were not denied appropriate care. It is outwith the remit of our analysis to explore these issues; however we recommend that any future evaluation of the DCP or similar programmes should consider such factors when selecting outcome measures at the study design stage.
Although it is important to understand the cost implications, inpatient costs are not a good indicator of effect for the DCP as any change in mean or median costs cannot be interpreted independently as good or bad. An increase in mean costs may suggest that the programme is effective in moving easier to manage cases from hospital to home, leaving more complicated, and therefore expensive, patients to remain in hospital. We do not know whether or not those patients left in hospital are there appropriately. Possibly those patients who are easier to care for are dying at home simply because it is easier to provide a home death for them. It might also be that costs are merely shifted from one service provider to another. If fewer people die in hospital, then more are likely to die in other settings and must be supported by appropriate resources, often provided by social services or informally by family and carers. If these costs are not captured in an evaluation, it is difficult to assess whether or not the intervention represents a good use of scarce resources. We must conclude that our evaluation of the DCP does not provide the necessary evidence to estimate whether it has provided value for money for the stakeholders involved.
Selecting outcome measures when studying complex system-wide interventions is difficult. By their nature, such interventions are likely to have an impact on a wide range of factors. Our analytical approach relied on the use of existing data provided by healthcare commissioning organisations and so we were limited by the types of outcomes or outputs we could measure. Prospective study designs that collect data directly from patients would be advantageous, as patient-reported outcomes, and a wide range of clinical data regarding primary, secondary, emergency and social care service usage and experience could be collected. The down side would be that fewer patients can typically be recruited given the costs of research and collecting data directly from people very near the end of life is challenging. A trade-off must be made between study power (through a large cohort) and the details on outcomes it is possible to collect.
Choice of control site
The choice of comparator group in DiD analysis is crucial. The principle is that when changes that occur in the control and intervention group are observed, it cannot be ruled out that they had the same cause—such as a change in government policy or a change in the sociodemographic make-up of the population. But neither can DiD conclusively rule out that the intervention had no effect—correlation is not causation—and if the changes are in the same direction, this may be the result of unrelated factors. What can be said is that compared with the control site, the DCP at the intervention site does not appear to have had any impact on any of the outputs measured. Such an interpretation of the result is corroborated by the descriptive statistics, which do not show any significant change between the preintervention and postintervention outputs.
A key complication was the poor quality of the data provided by the PCTs. At the intervention site, records of where a patient died were supplied separately from records on patient hospital admissions. These data sets were therefore merged using a unique identifier, in this case the NHS number. However, not all patient records could be matched in this way and a number were dropped from the analysis. The total number of records dropped was small (<1%) relative to the number of patient deaths that occurred during the years under analysis and could not have had a significant impact on the results. Even if all records were assumed to be home deaths, these would not be sufficient to show a significant difference in the results.
This worked example showed that DiD analysis can be useful in increasing our understanding of the effects of multicomponent, complex interventions when randomised trials are not suitable or feasible but is limited by the quality of data available for analysis. Intervention and comparator populations must be selected carefully to increase confidence that results are an effect of the interventions and do not arise from latent differences in the populations being studied. Careful attention must be paid to selection of meaningful outcome (output) measures. Researchers may wish to consider DiD as a methodologically robust analysis tool when designing evaluations of service redesign programmes. Additional learning may be gained from supplementing quantitative outcomes with qualitative data on patient and carer experiences of care in test and comparator sites.
- Received 14 May 2012.
- Revision received 28 May 2013.
- Accepted 24 July 2013.
Contributors RD, RA, NA and LJ contributed to the conception and design of the study. RD and RA acquired the data. JR and EK conducted the data analysis and prepared the manuscript. All authors contributed to the drafting and editing of the article and approved the final version of the manuscript.
Funding The Delivering Choice Programme was funded by Marie Curie Cancer Care.
Competing interests JR, RD and LJ were employed by the Marie Curie Palliative Care Research Unit during the study period. The unit receives core grant funding via University College London from Marie Curie Cancer Care (Grant Number MCCC-FCO-11-U). NA is an employee of Marie Curie Cancer Care. No other authors have any interests to declare.
Ethics approval Oxfordshire Research Ethics Committee B.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Ethical approval for the use of the data was obtained only in relation to this study. We do not have permission to share the data more widely.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.