What should we be doing with composite measures? (And do we know one when we see one)

Sept. 18, 2017 / By Richard Fuller, MS

In June the GAO released a report examining the Medicare Hospital Value Based Purchasing (HVBP) payment initiative. This is not their first look at HVBP, with a previous report from October 2015 concluding that the HVBP had led to “No Apparent Change in Quality-of-Care Trends.” In the more recent report, the GAO bemoaned the existing scoring methodology that resulted in lower quality hospitals, as measured by the quality “dimensions” in the HVBP (A.K.A. everything but Medicare Spending per Beneficiary (MSPB)) receiving HVBP bonuses.  

On the one hand, this statement makes complete sense. If the intent is to reward hospitals for quality, how can those with low quality receive bonuses? On the other, creating a performance model that includes an efficiency metric means that efficiency might outweigh other dimensions unless specifically structured such that it does not “disproportionately affect” the final score (in which case, what do we mean by “disproportionately”?).  

These problems and many others occur when measuring performance through composite scores, a topic the complexity of which is comprehensively discussed by Shwartz et al1,2. For those not acquainted with the HVBP, it can be summarized as a composite performance measure using other composite measures as building blocks to measure performance within four dimensions covering; Safety, patient experience, outcomes and efficiency. Underneath the composite measures are multiple component measures (about 37 in total) that are blended to form a single score which in turn is a blended measure of attainment (how well you are doing relative to benchmarks) and improvement (how well you are doing relative to yourself in a previous period).

Composite measures, particularly ones such as the HBVP that cast their net widely to cover many aspects of what reasonably constitutes quality of care, are inevitably dependent upon the subjective weighting of “dimensions” when providing a single substantive measure of total performance. This can lead to differences in value judgments of what is truly important, such as the concerns brought forward in the GAO report of the disproportionate effect of efficiency. But the explosion in the number of measures used to describe and reward provider performance has driven payers and regulators to use composite measures with greater frequency so as to incentivize the widest engagement (or at least the appearance of widest engagement) in efforts addressing all aspects of care delivery deserving attention. The expanded scope of measurement makes the subsequent conflicting judgment of relative “values” more pronounced.

Even when nominally facing individual measures, measures that report on a single aspect of quality, it’s worth noting that what constitutes a composite vs. individual measure is somewhat subjective. For example, the Hospital Acquired Condition Reduction Program (HACRP) contains a “composite” measure (PSI-90 Domain 1) and five individual measures (Domain 2) blended into a single score used to measure performance. These measures report on a single aspect of care, offer guidance on how well programs dealing with infection and patient safety are operating and will likely be addressed by the same team. So while this may be viewed as a composite measure, it is also rational to consider this an individual measure since its contributions are well aligned. While it is also fair to say that a readmission measure reports on a single aspect of care, it is not immediately apparent that the same team will address root causes nor that those root causes are always aligned. For example surgical conditions might be reviewed by a differently focused team to medical admissions, factors related to social determinants originating outside the hospital may result in the inclusion of social work teams and mental health program coordination may cut across all admissions. Moreover, while it may provide good guidance for consumers that infection control of all types reflect the global hospital performance (within a measure like the HACRP), it is less likely that a low readmission rate for elective surgery is a good guide for performance in managing patients with pneumonia. As noted once more by Shwartz and co, the alignment of a hospital wide measure of readmission with individual measures of readmission focusing on individual conditions3 can lead to different guidance as to provider performance.

Which brings us to the measure’s intended use. While it may be reasonable to blend multiple hospital care measures to stimulate hospital quality improvement efforts it can also provide misleading information to consumers. For example, returning to the HVBP, for the period 2013 – 2015, of 2,011 hospitals with an available measure for 30 day AMI mortality, 270 failed to reach the minimum threshold survival rate. Of these, 83 were given positive points for performance (due to annual improvement relative to themselves). A smart consumer (or hospital board) could be forgiven for thinking that a positive score for performance would at least equate to achieving a minimum survival rate. And that is where the GAO report misses the point.

The observation that the single efficiency dimension disproportionately affects the total performance score of the HVBP relative to the quality dimensions takes the position that the relative weighting of each is inappropriate but does not address what the relative weighting should be, nor indeed if other weightings across “clinical quality” is correct. How should we weight “patient experience” (e.g. were nurses polite) vs. outcomes (e.g. mortality). In fact, the use of a mortality measure within HVBP appears heavily diluted when compared to the impact and relative cost of “excess” mortality assigned by other government agencies and within the legal system[1].

So what this brings us to are a series of observations:

  1. In keeping with the guidance of Shwartz and Restuccia, composite measures can be useful but they need to be well aligned. By this it is meant that they need to measure a coherent aspect of care, target a coherent group that will change things (“a hospital” is too vague) and be clear in the information provided to interested parties such as consumers (if performance is improving but still bad then the distinction should be clear).
  2. Too many individual measures will result in them acting as a defacto composite measure and result in those being measured/penalized in prioritizing a handful at most. Thus, the effect of adding multiple measures is to create a composite measure whether or not that is recognized – and the relative rewards and penalties accruing from those measures serve as their weighting.
  3. We have to be careful when weighting measures (i.e. patient experience and mortality) and their goals (improvement vs. attainment) to first consider the intended use.

These principles should follow us as we expand how we measure quality from things that are easier to quantify (e.g. the cost of a complication or readmission), to those that are less tangible to value:  The ability to maintain family or employment through mental health crises, the return to function after surgery, the ability to engender patient confidence within the health system and integrating those things that we value most such as patient mortality.

Richard Fuller, MS, is an economist with 3M Clinical and Economic Research.


References

  1. Shwartz M, Rosen AK, Burgess JF. Can Composite Measures Provide a Different Perspective on Provider Performance Than Individual Measures? Med Care. July 2015:1. doi:10.1097/MLR.0000000000000407.
  2. SHWARTZ M, RESTUCCIA JD, ROSEN AK. Composite Measures of Health Care Provider Performance: A Description of Approaches. Milbank Q. 2015;93(4):788-825. doi:10.1111/1468-0009.12165.
  3. Rosen AK, Chen Q, Shwartz M, et al. Does Use of a Hospital-wide Readmission Measure Versus Condition-specific Readmission Measures Make a Difference for Hospital Profiling and Payment Penalties? Med Care. 2016;54(2):155-161. doi:10.1097/MLR.0000000000000455.

[1] For example the DOT provides an estimate of the value of a statistical life at $9.6mn