Podcast Episode Transcript: Physician payment and performance measurement: Is it fair?

With L. Gordon Moore, MD

Gordon Moore: Welcome to the Inside Angle Podcast. This is Gordon Moore, your host. And today, I’m speaking with Robert Berenson. He is a physician who joined the Urban Institute as a fellow in 2003. In that position, he conducts research and provides policy analysis primarily on healthcare delivery issues, particularly related to Medicare payment policy, pricing power in commercial insurance markets, and new forms of health delivery based on reinvigorated primary care practices. He says about himself that he’s able to provide a sometimes listen-to-contrarian voice in current policy discussions on how to reform our current inefficient healthcare delivery system. Welcome, Dr. Berenson.

Dr. Robert Berenson: It’s a pleasure to be here.

Gordon: So, the reason I thought it would be interesting to have a conversation with you is that today, I was reading a Health Affairs blog that you coauthored a month or so ago talking about the proposed rule from CMS to eliminate E/M coding and to flatten it out. And when I read that rule, the proposed rule and then the rule that came out, it made me feel a little bit nervous. And I thought you had something interesting to say about that. So what do you think about that rule, where do you think it might take us?

Robert: Well, I didn’t like the proposed rule, and it looks like with the final rule, they’ve just bought themselves some time, but they haven’t backed off the concept of combining codes into the single payment, one for established patients, one for new patients. And it will cause a lot of problems. So I can enumerate those if you’d like me to.

Gordon: Yeah, I would.

Robert: Well, let me first back up and say, the reason they’re doing this, CMS claims, is that it’s the only way they can eliminate what’s called the documentation guidelines, which are the rules that physicians are supposed to follow to determine how to code. There are five levels of codes for new patients and for established patients with significant fee differences. And so there’s a natural tendency of physicians to want to upcode to claim more money than perhaps they deserve. And so these documentation guidelines were established over 20 years ago as a protection against upcoding, but in fact, they have facilitated upcoding.

They have compromised the clinical record where there’s a lot of information that doesn’t serve any clinical purpose, but it’s there to justify a code, and they actually have compromised the potential of electronic health records to provide important decision support to clinicians rather than just documentation. So, there’s broad agreement that we must get rid of the documentation guidelines. CMS, however, has decided that the way to do away with the guidelines is to no longer have code levels. So they want to move for a good purpose to do away with the guidelines to a very bad policy. The metaphor that I would use is throwing the baby out with the bathwater. We’ve got a problem that should be solved, but they’ve compounded the problem. In fact, they haven’t even gotten rid of the documentation guidelines in the short term.

So the problem is that if you’re paying the same thing, whether the visit is five minutes long or forty-five minutes long, you are going to shortchange patients. Physicians have no choice in many cases, to change their practice styles to have shorter, more numerous visits. And for the Medicare population, that is not a good thing. It means there’s insufficient time being spent on their problems, and it means they have the inconvenience and the cost of having to make many more visits than they otherwise would have had to make. And in addition to being bad for beneficiary patients, it rewards the specialties that have short visits, they’ll get paid more than they deserve under this average payment model.

In fact, in the last week, I’ve gone to see an orthopedist and I’ve gone to see a dermatologist, and they have both billed me for much higher levels than the five to eight minutes that they’ve spent with me. And who it will penalize, geriatricians, primary care physicians, internal medicine sub-specialists like rheumatologists and endocrinologists. So we will exacerbate the current inequities and the current fee schedule we already have.

Gordon: So as I think about that flattening out, the final rule that came out still allowed for a higher level visit for complex patients. Does that not mitigate or solve the problem?

Robert: If they did it right, it might help it. One is the amount of extra money for that complex patient is pretty negligible, it’s a small amount. But more importantly, CMS’s determination of which specialties would be eligible for that complex patient was based on current patterns of billing which reflects upcoding. So in their notice of proposed rulemaking, they identified ENT physicians and OB-GYNs and a few other specialties who probably do not see complex patients but they bill as if they see complex patients. Because of that, I’ve been recommending now for years and have been thoroughly ignored, that it’s time to actually get some empirical data, which could be done relatively easily about how much time it actually does take different specialties and different situations to see patients.

And if we actually had empirical time data, we would have a basis for coming up with time-based coding, which I think is the only objective way to compare what happens during an office visit. Just as an example, some specialties commonly do extensive histories and physicals such as a neurologist who may spend an hour doing a very detailed neurologic exam, whereas other specialties may commonly not emphasize histories and physicals, but maybe dealing with patients with 10 or more chronic conditions on 12 or 14 medications in trying to manage a coherent strategy for medication management. All of that and many other variations take place during office visits.

And in that Health Affairs blog you were referring to, my coauthor and I recommend that time is really the common denominator or the common metric that all visits have in common. And we need to get empirical time data and create codes that reflect the different time that physicians spend during office visits.

Gordon: So how do you go about getting the empirical data? You’re talking about studying time motion studies, that kind of kind thing?

Robert: Yeah. In fact, I was involved with a feasibility study performed by the Urban Institute and RTI as a subcontractor. These are two Washington think tanks. And we determined that it is feasible for clinically trained nurses or clinic managers to actually observe what’s going on during an office visit, during an interpretation of an MRI by a radiologist, the time it takes to, let’s say, do a shave biopsy, which I had the other day. It took about five minutes. The assumption in the fee schedule is that it—I actually haven’t looked at that one, but I’m sure that dermatologists say it takes 20 minutes.

The most egregious example that I have found is the time that the fee schedule assumes it takes to freeze a wart as any GP or in fact, many patients know that it takes about 15 seconds for the doctor to get the liquid nitrogen canister, walk back to the patient, and take the two seconds to spray the liquid nitrogen on the lesion. The fee schedule assumes that takes 23 minutes. So I’m suggesting that the clinically trained people could just sit in the room, observe how long it takes with a stopwatch and we would have some empirical time data.

Thankfully for procedures, anything that requires a clean room, an operating room or a procedure room like a colonoscopy, timestamps on electronic health records are quite reliable for the time, what’s called the intraservice time, the time the procedure actually takes. And then the observation with the stopwatch would just have to capture some level of pre and post service time. It is doable. And I guess the point I would emphasize is that we now spend $90 billion a year in the Medicare physician fee schedule. And we spend virtually nothing in research and development to update and improve the fee schedule.

Whereas, we’re spending $10 billion over 10 years in this Center for Medicare and Medicaid Innovation to try to test new models, we’ve allowed our established models, which are the predominant payment models that all physicians participate in in Medicare, we’ve allowed those to languish because we’re not going to spend—I don’t have the price tag for you, but it would be a trivial amount to do the kind of observation or time motion study that I’m suggesting. So instead of doing that, we rely on estimates of how long it takes to do things that were developed 30 years ago or that are the result of self-interested specialty societies inflating their time so that they get paid more.

Gordon: One of the things that is striking is that the U.S. healthcare delivery system spends so much more per capita than any other developed country, and yet when you look at broad population outcomes, we’re not sitting anywhere near the top. We’re actually pretty far down on that list. And so there’s room for improvement and there’s been a huge amount of interest and focus on quality improvement. And that quality improvement then calls on clinicians and electronic medical records to capture certain information sets and to bring them out and demonstrate quality. What do you think about that approach and how that’s working?

Robert: First, I’d want to just make this point. The reason the U.S. system cost so much more than other countries is largely at this point because of the high prices that are charged in our dysfunctional health care system. I mean, we now have data that finds commercial insurers pay nearly 200 percent of the Medicare rate to hospitals, and there are not for profit hospitals sitting around with billions of dollars in reserves. And so that is the reason, and we’re only now beginning to get serious about dealing with high prices, whether it’s for prescription drugs, which has gotten a lot of attention, but also for a day in the hospital, for an MRI. We lead the world by a long shot in those prices.

But in terms of getting our money’s worth on quality, I’m something of a contrarian because I think it is very difficult at the individual clinician level to actually measure performance. I think the government’s first responsibility is to protect the public from substandard or even fraudulent behavior. Just as an example, a few years ago, CMS did their first release of the part B spending data. Basically, physician payments are the major component of part B spending in Medicare. And that data show that 5,000 physicians were billing Medicare only level five visits back to payment again, but that’s fraud basically, and yet somehow that’s being tolerated.

In the quality area, let me give you an example. Almost two decades ago now, but not that long ago in Redding, California, there was a surgeon who was operating on healthy hearts and calling them diseased and doing coronary artery bypass procedures. It became a scandal. 60 Minutes did an expose of it. I went back to the Dartmouth Atlas the most recent Dartmouth Atlas before that scandal broke, and it turned out that Redding, California was sitting there, three standard deviations above the national average as having the highest rate of coronary bypass surgery. And yet nobody was looking, nobody is still looking at outlier quality.

I think it is much more difficult to have precise ratings of quality for acceptable, fair, good, excellent performance and that we should leave that mostly to organizations like accountable care organizations to adopt methods, quality improvement methods to improve quality. I don’t think we are in a position to actually rank physicians who are providing acceptable medical care. And in some cases, we should just stop trying. We’re much better off at measuring population health. And as the health system moves to greater aggregation of physicians for better or for worse, they’re getting employed by hospitals or their practices are being bought by hospitals. We’re in a much better position to measure at the aggregate organizational level than we can at the individual physician level.

I think it’s been a great deal of effort for very little gain and has demoralized the physicians and created a real financial burden. There was a study about two years ago that documented just in physician offices, the estimate was that physician offices were spending 15.4 billion dollars a year just to provide the quality measures. If you consider that hospitals are doing it, nursing homes are doing it, a whole range of payers, public and private payers are doing it, and then there’s all these intermediaries who collect the data, I’m guessing we’re spending $50 billion, $70 billion a year on something that has yet to prove useful. I think we should stop doing that.

Gordon: So I’m hearing a couple of things. One is it sounds like you can look at data sets to say, I see these outliers and those are alarming and we should be using these data more intentionally to find those outliers and ask the question of what’s going on. And in that, you’re talking about claims data sets and administrative data sets, is that correct?

Robert: Yeah. I mean, the term I use is that the lines are much brighter between unacceptable and acceptable than between different gradations of acceptable. So yes, I think we can use existing data sets. Now the data set, the claims data by themselves are not definitive, but they would be targets of then more in-depth review, including medical record review to see if for some reason there’s an explanation that some doctor is only treating the sickest patients in the country and  therefore, their mortality data is that much worse. Usually, those excuses don’t hold up to scrutiny, but certainly the claims data, which is really what we have, can be used to identify unacceptable behavior, outlier behavior that deserves a closer look and sanctioning. We don’t sanction very much. We give a lot of people threats but give them passes.

Gordon: When you say the claims data and we don’t have much beyond that, what about the promise of the electronic medical records and the ability to extract information to understand quality?

Robert: Well, I think it’s got promise as I started with. Right now, there’s a lot of fake data in the electronic health records because for better, for worse, physicians put information that may not in fact be accurate in the clinical record to justify their inflated coding. Cut and paste has been facilitated by electronic health records. But in many cases, it just propagates or promotes inaccurate information that may have been accurate three years earlier but just gets brought up currently. You also need to have fairly sophisticated free text stability, free text analysis ability because templates with checklists are often very unreliable.

So I have a question about the validity even of information in electronic health records. It’s certainly what we should be trying to do is mine the electronic health records. But I’m somewhat skeptical that we’re ready to do that in a systematic way at this point without assuring some accuracy of the information that’s in that electronic health record. I’m also quite skeptical of that self-reported data that physicians may provide with some of the codes that are more clinically nuanced. I’m also skeptical that that’s necessarily accurate.

Gordon: Recalling the past, you had addressed the issue of the tension between a broad swath of data to represent a clinician’s capacity and effectiveness versus the cost of gathering that data. So then the tension tends to push to reduce the number of data points that we’re demanding from clinicians just for the point that you were making before about how much it costs, but then we end up with a small handful of data points. And I wonder how well those represent, for instance, an internist’s work.

Robert: Well, when I first saw the measures that were used for PQRS, the Physician Quality Reporting System and thought back to my days as a primary care internist, I frankly was insulted that they thought they could judge my quality, my overall global quality based on a few metrics largely of preventive services, which I’m not saying are not important, but it reflects such a small microcosm of the whole range of activities that an internist provides, that to take a couple of these in most cases without any risk adjustment to give me my scorecard, I just felt was inappropriate.

And so you do have this tension between having enough measures to say something that’s representative of a physician’s practice without producing a lot of burden as you suggest, and then having such a small number that what’s the point. But I would make a different point, which is that out of claims data or registry data, some of the core performance that you want to know of many specialties are unobtainable. So for example, what do we want radiologists to be able to do? I would say above and beyond anything else, we want them to make correct interpretations of images, right?

That’s not available on a claim form. No radiologist says, I’ve interpreted the lumbosacral spine incorrectly or even correctly. In most cases, misdiagnosis as an example, is never detected. And certainly, it doesn’t show up on a claim form, but that’s what we want radiologists to do or pathologists. And yet we can’t measure that, so we measure something that’s in the realm of radiology, but it’s not central to what they do and will give very misleading scores or rankings to those radiologists. And clearly, those physicians who are in large organizations where the organization is able to produce that data will rank higher than those who are in small organizations.

But the point I’m making here is, we can’t measure the real important stuff in many cases. And yet because we’re so committed to measuring, we’ll pick anything just so we have six measures in the Medicare MIPS program. It makes no sense to me that we’re doing that.

Gordon: So it brings me back then to the initial part of the conversation where you’re saying, okay, the E/M coding rules heavily burdensome. They’re driving a lot of dysfunctional behavior, driving a lot of dysfunctional EMR use. Let’s go to a time-based system, let’s get some empiric data so we can understand how to do that justly. Obviously, there’s the risk of misuse or abuse of any system. So I wonder if you could do a fraud—sort of have a fraud misuse oversight of a time-based system by using diagnosis coding. That seems logical because we’re coming up with a diagnosis typically during an interaction, but then you’re pointing out the issue of diagnostic accuracy. And now, I don’t know how to solve the problem.

Robert: Well, I want to go to that system of CMS that we don’t know how to audit time. It would be really easy if we had an all-payer database. Then when a physician is billing for eight hours of work in a four-hour period because you have all the claims, you would have evidence that there’s a problem because Medicare is the dominant payer but not the sole payer. It is a lot more difficult to make inferences, but there have been studies. Now, these are studies which have documented, for example, that colonoscopists bill for 12 hours in a 7-hour business day, that’s just an example.

You can, I believe, look at billing patterns and at least have a sense that somebody may be abusing the time-based coding, and then you can go to appointment books. It may be you’re right that the number of diagnoses may give you a clue or the nature of the diagnosis. But I would make the point and reemphasize the point I made earlier, a neurologist might have a single diagnosis and yet need to spend an hour to do a very complex history and physical. So that would have to be very sophisticated. I do think that there are audit mechanisms that can work and I would also point out that CMS in many of their new codes, telehealth codes and others are using time.

They are explicitly using time that, for example, there’s a new code, which I don’t think makes a lot of sense, but it has been approved for in the fee schedule, for a phone call with a patient to determine that they don’t need to come in for a visit. The time interval was five to ten minutes. Now, I don’t think physicians should be using that code because the transaction cost of submitting a claim is probably costlier than the compensation that they’re going to get from it. But the point I’m making here is that CMS seems to believe they can monitor time-based coding. And again, I think it would be easier to show a variation if it then turns out, for example, that dermatologists are typically doing visits of 10 to 15 minutes for the large majority of their visits, but that some dermatologists show up averaging 20 or 25 minutes, you would have a reason to go look at what’s going on there.

I think there are techniques to deal with time-based coding that are not available when you use amorphous terms like a moderate decision-making versus complex decision-making, that’s much too subjective. Time, at least, has an objective basis. And I’d make one other point here. A lot of people who oppose time-based coding say it penalizes efficient and expert clinicians who don’t need to take 20 minutes because they can do it in 3, they’re so good. I’m a little skeptical of that especially when surveys document that patients want more time with their physicians, even the specialists.

But the way I think one could deal with that is to pay differentially for the first five or ten minutes of a visit rather than the last five or ten minutes. You would reward efficiency in that way. Doctors could see more patients. You wouldn’t want to overdo that disproportionate payment, but I think one could do it that way so that you’re just not rewarding sitting around and schmoozing with your patients.

Gordon: One issue that you raised during our discussion was having to do with physicians and other clinicians aggregating into larger organizations and that there appears to be possibly a disproportionate representation in literature speaking in DC and other places, of large organizations because they can afford that time, while small independent practices may not be able to afford that time and if that difference is real and if that difference is demonstrating a true difference in quality. So I just wanted to test that with you and see what you thought.

Robert: I think you’re bringing up a very important point there. The people who come to Washington are able to testify and are influential in how policy gets made typically are proponents of getting larger. However, there are researchers, and researchers have documented pretty convincingly that in fact, small practices, solos but also the three to five person practice have much better performance than large groups on things like unnecessary hospital admissions, ambulatory care, sensitive hospital admissions, unnecessary emergency room visits, et cetera. So there’s now a recognition that perhaps all of this bigness is creating just bureaucracy and not necessarily more patient-centered care, which small practices seem to know how to do. So that would be point number one.

On the point that policy gets made and the small practitioners are ignored, that I think has been true. But somehow, they’ve gotten their voice in the last couple of years. So, under the MACRA legislation, which created a formal pay-for-performance program and incentives to participate in alternative payment models, small practices came and said, this is highly burdensome to us. It will be very misleading. And somehow, over 500,000 physicians or clinicians have been exempted, at least in the first years of that pay-for-performance program, on the basis that they didn’t have enough Medicare revenue to justify the intrusion. But I think that was simply a way to suggest that maybe this hadn’t been very well thought out, when the large organizations get the extra reward simply because they’re in a position to report easily, whereas the small practices may be doing a great job, but they can’t report it so they get penalized.

So I think starting about two and a half years ago, the policy world began to understand that maybe they’ve gone down the wrong road with all of this public reporting and pay-for-performance. It’s what I hope anyway.

Gordon: You know, what’s really interesting is you describe the ability to report, which we conflate with demonstrating quality, and yet when you describe the small practices, you mentioned a number of what I think of as big quality outcomes in terms of hospitalization rate, emergency room utilization rate, which you’re saying the small practices do better. And it brings me back to Barbara Starfield’s work and wondering about the intangibles that seem to have such a big impact on outcomes about the first point of access and lowering barriers and providing immediacy, having a person-focused relationship overtime, providing comprehensive services and coordinating care across the continuum.

And I think about those attributes in her studies that were the underpinning of high-performing health systems, and if smaller independent practices because of the lower level of bureaucracy are closer to that primary care truth, if you will.

Robert: I think that’s exactly right. It’s now been about 10 years, but in the initial development of the patient-centered medical home, I was struck by how bureaucratic it was looking. You had to have lots of systems in place and you certainly had to have an intraoperative electronic health record. We were asked at the Urban Institute to try to cost the incremental cost of becoming a medical home. And so to just get a little familiarity with the medical home and what practices we’re doing, I went to the Adirondacks in New York where they were actively doing medical home work and also in interoperable electronic health records and specifically asked to visit a zero medical home, somebody who would just abjectly fail all of the criteria for medical homes, and then one who was a very good medical home.

And so I visited two practices. And without any question, the solo doctor who was 55 and didn’t want to, at that point, invest in an electronic health record because she thought she’d be retiring in 5 or 8 years, clearly was the doctor I would want for myself. I heard her tell a patient who was having health problems that if she needed to go to the emergency room to pick hospital A because at hospital A, the doctor would be able to come in and see her and manage her care. Whereas at hospital B, she would just be sent to a hospitalist, and her physician wouldn’t be able to have anything to do with her. That was a relationship. That wasn’t a lot of systems in place, that was the core relationship.

The other doctor who had been in solo practice and had just joined a group of 18 doctors, struck me as a doctor who wanted to do cholesterol management. He didn’t really want to see sick patients. And he scored very highly on his medical home scores. So again, policy hasn’t gotten down to the real stuff, which is about relationships, which is about caring for patients and crises, a whole bunch of things. What we’re able to measure and monitor is not the core of what high quality health care is and what patients want in my view, and yet we keep doing it.

Gordon: Dr. Berenson, I think that sums it up. And I want to thank you for your time today.

Robert: I’m happy to have done it. I’m happy to be one of the contrarians in Washington who think that we need to hear more from small practices and from primary care physicians. Let me make a final point, not just respond to large organizations. Having said all of that, I am concerned actually that whether we like it or not, younger physicians want to be employed, they don’t want to go off on their own and that we will have physicians working for hospitals. And I fear that that’s going to happen, so we’re going to have to deal with that reality. And somehow, policy has to recognize the benefits of size and scope, but still try to reward in its payment and other policies, smallness and relationships.

And I don’t think that’s impossible. There are some organizations that while they are a large, well-organized, highly system-based organization, there’s a primacy on the doctor-patient relationship. Unfortunately, many organizations don’t do that. So with that, thank you very much. I’ve enjoyed it.

Gordon: Thank you. For Inside Angle, this is Gordon Moore. You can find more podcast episodes at insideangle.3m.com.

View Session Spotlight (PDF)