Podcast Episode Transcript: Can AI help solve the challenge of physician burnout?

With L. Gordon Moore, MD, V. “Juggy” Jagannathan, PhD

Gordon Moore: Welcome to 3M’s Inside Angle podcast. I’m your host, Dr. Gordon Moore, and with me today is Dr. Juggy Jagannathan. He is an artificial intelligence evangelist, among many other things. We’ve spoken with him before, and I always find the conversations fascinating because of his depth of understanding as well as forward thinking. And he has very kindly agreed to continue the conversation. Today, we’d like to talk about value-based care, quality measurements, and artificial intelligence, and how those things come together. Welcome.

Juggy Jagannathan: Thank you, Gordon. It’s a pleasure to talk to you, and I’m looking forward to this conversation.

Gordon: Excellent. You were very kind to send me some framework of how this conversation can unfold. If you don’t mind, I’d like you to set up the context of what a quality measure is and then talk about value-based care. And then we can talk about how artificial intelligence folds into that, if that sounds okay.

Juggy: Sounds good to me. So, this whole area of quality measures—and there are literally thousands of them—came into being when CMS decided that they were going to pursue this triple aim: improve individual care and improve health of the population while at the same time make sure that the cost of care is minimized. And the whole rubric around how you achieve this triple aim was encapsulated in this notion of ACO and the Medicare Shared Savings Program a few years ago after Obamacare came out.

And the genesis for it goes a long way back to the capitated payments of the HMOs of the 1990s. But one of the fundamental problems with capitated payments, where you basically say, “I’m going to pay a fixed amount of money for taking care of a population of patients.” Without some checks and balances, it would direct the denial of service to needing populations. So in order to counter that, they developed this whole notion of quality measures, which says, “Quality care is being provided. The patients are being taken care of in reasonable ways while at the same time we are managing the cost of it.”

Having said that, people came up with all kinds of quality measures. They were structural measures, like: do you have an HR? Do you have ways to analyze the data? There were process measures. And the process measures basically measure: are you doing the right thing for the patient at this time? A patient was admitted to the hospital with acute myocardial infarction—did you administer aspirin within the first 24 hours? Did you do a foot exam for a patient who walked into an outpatient who you know is a diabetic? These are all process measures, which is based on evidence collected for treating particular types of patient conditions, etcetera.

Of course, the gold standard is outcome measures. Did whatever you do achieve the desired outcome? Was the patient cured? Were the symptoms resolved? These are usually harder to get. Of course, CMS tracks the outcome measure of all, which is: did the patient die? The mortality measure. And another outcome measure which is very popular for hospital admissions is readmission—was the patient readmitted within 30 days?

So all of these things are a way of categorizing the care that is being provided to the patient. And they’re all coming to play a significant role when CMS is moving towards the ACO model—the Accountable Care Organization model—wherein the amount of money dispersed to insurance agencies for taking care of a population of patients is based on a risk-adjusted model where the conditions of the patients are actually aggregated over an entire population. And they come up with a bucket of money, which says, “You need to take care of these patients with this kind of money. And by the way, at the same time, you need to satisfy the quality measures.”

So in a nutshell, the whole quality measure and value-based care are all bundled up into one big unit where the notion of value-based care is driven by making sure that you don’t over-utilize the system resources—while at the same time ensuring the quality of care, meaning you are doing the right thing at the right time for the patient. That is the collective role of this whole model. Did that make sense?

Gordon: Yeah. That makes sense. So the incentives under a fee-for-service model is that the more I do, the more I get paid. The incentives under a pure capitation model is the less I do, the more money I retain. And so the quality measure set is intended as a balance against the unintended consequence of denied care.

Juggy: Perfect. You said it better than I did.

Gordon: So then, in the context of measuring quality, as you mentioned—I’ve always been interested in the different levels of quality and trying to understand what we’re really after. And I’ve been attracted to the set of measures around outcomes and what really matters in terms of: Has someone died unexpectedly? Have we hospitalized somebody when, through better intervention, we might have prevented that hospitalization or emergency room visit? And those, to me, seem like more relatively important things than measuring: Did we administer a flu shot on time? Not that the flu shot is wrong—I mean, it’s a very good thing. But it’s relatively less important than: Did we, through omission of care, cause somebody to be hospitalized when they might not have been?

Juggy: That’s interesting. So, outcomes measures are the real gold standard, right? I mean, that’s what we are striving for—a healthy population. The healthier the population, the less utilization of facilities like hospitals and the like. But outcome measures are harder to capture. And I have an interesting anecdote here on this front.

I was taking part in a technical expert panel who were designing some quality measures for hospitals. This was way back in 2011. And this was fresh out of the Obamacare era, and EHR was going to be the solution for everything related to quality. And all these proponents of EHRs were basically saying, “Oh, we can collect this data; we can collect that data. We’ll force the physician to enter all this useful information, and magically we’ll have all the different quality measures.” I was the sole voice arguing that you are better off collecting this information using natural-language understanding from clinical documents, where you have a rich variety of data which can monitor the conditions of the patients—where actually the information resides.

So, long story short, that argument didn’t go well at that time. Perhaps it might now. We can actually monitor outcomes a little better by looking at not just clinical documentation done by providers, but looking at information coming from the patients themselves—patient-reported outcomes on their mobile phones. So you have better chances monitoring outcomes now than you ever did before. But the sad fact is we are not collecting. We are not monitoring enough of the outcome measures. And process measures may be easier to figure out if you followed it, or not. But in this day and age, it’s falling way short of what we should really be doing.

Gordon: This is the classic problem I see in measurement right now around quality, where we use what we have and not what we want or need in terms of measures.

Juggy: Exactly. I remember Tenzing saying, “Why did we climb Mount Everest? Because it’s there.” We need to do stuff because it needs to be done, and not avoid it because it’s hard. Maybe the psyche has come around with the advances in AI and various other techniques. Maybe people will start focusing more on outcome measures and less on these nitty gritty process measures.

Gordon: Yeah. There are two things that I see have come out of the missed opportunity. You raised the natural-language understanding way of extracting information from EMRs. One of the obvious dissatisfactions, dismay, irritation on the part of our physician and nursing colleagues who talk about how difficult it is to use electronic medical records and do the documentation because of all sorts of rules—and also because of needing to put in structured data because structured data has become the be-all for measuring process and quality—when in many cases there may not be a strong link between the process and the outcomes that matter. And we can certainly see that it adds a lot of burden in work.

And so, I’ve always been attracted to the idea that if a physician in documentation could just say what’s happening, have that captured in the medical record, and then in the back ends medical-record tools would make meaning of that and extract information. We could not only solve the painful documentation issue, but we could also begin to connect to more data elements.

I had a conversation with Bob Berenson earlier when he talked about and wrote some articles about the number of measures being dumbed down to not just sub-optimal, but kind of nonsensical subsets of what a physician does in practice—because it’s so burdensome to capture these things, therefore we measure fewer and fewer. And then we posit that those few handfuls of metrics reflect what that physician does in practice. And if a typical internist or family physician dealing with adult patients may address hundreds of conditions and hundreds of variables per person in a typical day—we’re measuring five a quality data set: did you get the A1c done on time?

And if we’re able to automatically extract this information, I think that would solve it. But I think we have, still, a hill to climb in terms of convincing others. And so, I wonder about how we do that.

Juggy: I fully agree with you. I have looked at so many different quality measures, and I just scratch my head. Why in the world are we even doing these things? I mean, I’m no physician, so you’re in a better position to talk about it than I am. But I feel that this whole process of collecting structured data in the EHR in order to satisfy some bureaucrat’s notion of what quality is—is not the right approach for measuring quality. The intention is good. You want to measure the quality of care being provided. You want to measure whether the proper care was being provided, which is supported by evidence.

But in this day and age where we are talking about individualizing care being provided, the research program, all offices collecting data on a million individuals to try to individualize care—it’s unclear what the role of quality measures, as currently defined in the hundreds of measures or even thousands of measure coming out, how that generalization of these measures relate to particular populations.

So in my view, it’s all about: How do you keep a population healthy? And there are a variety of ways in which we can move towards that goal of keeping the population healthy.

Gordon: And this is where I think it would be interesting to hear you posit how artificial intelligence could help guide that way or facilitate that transition.

Juggy: Let me mention lower hanging fruit here for AI. We talked about the quality measures and how it’d be nice to use AI to extract the information automatically from the patient-physician encounter information, whatever needs to be captured to figure out if the right thing was done to the patient. That’s one type of AI, natural-language understanding and extraction, which can happen today and is happening today to various degrees. The process is probably very similar to a clinical documentation improvement program which tries to look at documentation and see whether the condition the patient is suffering from is captured correctly or not.

In a similar way, we can do the extraction to help in the reporting of the quality measures. That said, the real bang for the buck for AI is in population health care. And the question is: Your goal is to keep the population healthy. The healthier the population, the happier the population, the lesser the health care utilization, and the lesser the overall cost to the economy. And that is the ACO model as well. So how do you do that?

Keeping the population healthy—you need a way to segment the population. So there are various techniques that I use for segmenting the population. You can use HCC, you can use clinical risk groups—and these are mechanisms or tools which allow you to segment the population into different risk pools: “This population segment needs more care. This population segment needs relatively less care.” And you can establish protocols to monitor those different risk pools in a systematic way. This is one way of managing the population.

But there are also other ways of managing the population. For instance, there are all kinds of health registries. There are truly hundreds of health registries—a registry for diabetics, a registry for newborns, a registry for almost every condition you talk about. And that is also a way of segmenting the population based on specific conditions, and you could reach out to them in that venue to take care of issues related to that.

And a third way I’ve seen it happen is websites like PatientsLikeMe. They track literally hundreds of conditions. So, obesity is a condition. And there are thousands and thousands of individuals—maybe tens of thousands of individuals—registered who suffer from obesity, and they try to figure out how to manage stress, fatigue, pain, etcetera. And they give you information about it. This is trying to manage the population from the population itself—the patient takes a role and basically says, “How am I going to take care of myself?” And he proactively acts on this.

So where does AI figure in all of these things? To me, it’s getting actionable intelligence. What is an actionable intelligence? You can get it from a variety of sources. So, you may have segmented the population into different risk pools. But there has to be a trigger which results in some action which improves the condition of the patient in some fashion.

So what should this trigger be? And this trigger can come from a number of sources. It can come from the wearables that the patient is wearing, maybe. The Apple Watch has all kinds of monitors—they have a heart monitor in it. Maybe you notice an arrhythmia, and the patient doesn’t know what it really means. But you are collecting the data, and it goes into a monitoring station somewhere and it raises an alarm, and people basically say, “Hey, you better take this medication at this point.” And you avoid some major condition which potentially prevented expensive treatment from being administered because you proactively handled the situation.

The trigger can come from a number of sources. And that’s why I think that the actionable intelligence development is one major area. All of the predictive analytic solutions that we are seeing can also help in providing actionable intelligence. Google used 46 billion data points to predict which patient is likely to die or which patient is likely to be re-admitted in a hospital. And they didn’t use any groupers, they used plain data. That’s another way you can use data analytics and deep learning to predict what is likely to happen to XYZ patient because they have access to a lot of data. Once you have a lot of variable data streaming in, you can use that to pinpoint more exactly when something is likely to happen, and you prevent that thing from happening.

And you are likely to save money doing these kinds of preventive actions.

Gordon: This has been the thing that always made me nervous about the wearables and the data streams coming out of devices measuring heart rate—probably the most obvious one coming from an Apple Watch. When a patient would come to me and say, “Hey, with an Apple Watch I can now stream you my heart rate data.” And I’m thinking—I can barely keep up with the lab results coming back, and the last thing I want is a live stream of your data.

Juggy: Right. Obviously, none of these things would work without the proper infrastructure and tools.

Gordon: And that’s where the AI comes in. It’s to say, “Of this stream of data, how do we make meaning of this?” But what makes me, again, a little nervous with this is if we use a machine-learning approach, we see the relationship between data elements and we see that there’s a strong relationship, and therefore that must have meaning. And so we can say, “This relates to that.” But that’s not a randomized control trial. So I wonder about making clinical meaning of data sets. How do we bridge that?

Juggy: There is no substitute for randomized clinical trials. So this is something which keeps coming up again and again Just because you have some data and some analysis done using some data sets doesn’t really mean that they are actionable in the real world. There is no shortcut to actually doing these studies and some double-blind studies and etcetera to really know whether these kinds of solutions—maybe I’m a little old-fashioned on this front.

Certain things are probably obvious and can be used right away. And this heart rate thing—actually, I remember reading a paper just a week ago that Apple had actually sponsored a big study using these heart rate monitors. So you have to prove the efficacy of these solutions in the real world using these kinds of clinical studies before they can be rolled out for these population-monitoring types of activities.

Gordon: And that’s why we need AI—I think there’s that regulatory hurdle so that you can have these devices, you can have these monitors. But when we’re going to use them as intervention in real people, we need to have some kind of scientific evidence to back that up.

So for instance, if we have an Apple Watch that’s streaming heart rate data, that then raises an opportunity to then test that in the real world and say, “Because of these data, do we observe that it truly leads to people ending up fainting? Or going to the hospital? Or needing some sort of intervention for arrhythmia?” And if it does, now we have a terrific tool, and it can move upstream from “go to the doctor and talk about it” all the way to “it’s actually happening live now,” and the AI is able to pull that out and raise an alert at the moment.

Juggy: Correct. And all of those things are possible. So in that sense, again, there is a lot of promise in AI—possibly hype as well. But it needs to be tempered with real solutions that make sense.

Gordon: Part of, also, the promise of AI that is somewhat fascinating to me is the idea that it can move well beyond the typical data set that I would use as a physician in practice—and moves into things that I sort of know about but I have a tough time attending to. For instance, the relationship between asthma and air quality or temperature or other factors. And that’s neat, if it could do that sort of thing.

Juggy: Absolutely. If you know the weather forecast tomorrow is going to be a high-pollen day, it would behoove the guy coordinating the care of a particular segment of population to send an alert notification to all his asthma patients. Or if it’s 100-degree weather, you can ask you care worker to make sure they proactively move some elderly patient who lives in an un-air-conditioned environment to, “Hey, move to shelter. We don’t want you to have a heat stroke.”

So, those are kind of things which we need to think of holistically. There is basically a sea of data out there, and we need to figure out how to utilize these things in some systematic way so that you can get individualized, personalized care for the entire population. And maybe you keep them healthy and happy.

Gordon: And that gets me back to a comment you made earlier about patient-reported outcomes, which to me should be the gold standard. Because it’s terrific that we did not re-admit somebody after a joint replacement. It’s terrific that we avoided complications like surgical site infection and didn’t have an unnecessary hospitalization down the pike. But the real goal for a person typically in having a joint replacement is: “I’d like to be able to move around. I want to be able to go up and down stairs. I want to be able to go visit friends or get to work without pain.”

And the only way we know those things is literally just asking people and having the ability of individuals to reply in a way where their replies have structure and value and are understanding of: “Did we have the desired impact?” And that, to me, is the ultimate in terms of personalized intervention and possible only if we’re able to combine very large data sets—which would be just beyond human capacity, and therefore relies on an AI-type solution where we could look at the relationship between the responses from people about joint replacement and the thing that we’ve done up front to do a good job there.

Juggy: Absolutely. I completely agree. The patient-reported outcome really needs to be the gold standard. And I think it’s also part of the Triple Aim: better care for individuals, population healthy, and the cost of care is minimized. So that’s a good goal all the way around.

Gordon: And I think possible now—really not in the distant future. Technologically possible if we think about using a natural-language understanding approach so that a person can respond to a set of questions about “how are you doing?” And that data is then structured and fed through systems to demonstrate the quality of intervention. And in a way, that’s really real. So to me, that’s the real promise, and I think we’re getting very close. We also then need to line up policy and talk about why we are measuring things that are relatively less important. Why aren’t we focusing on things that really matter to people?

Juggy: Right. I think that’s the big question. And I think whatever we are measuring, we should measure because it’s useful, not because it’s easy. And we should also make the process of measuring less painful to the caregivers.

Gordon: Absolutely. Yeah—easy only in the sense of “I have a line of sight.” But certainly painful for the end user in terms of all the structured data entry.

Juggy: Correct.

Gordon: Yes. Well, Dr. Jagannathan, thank you so much for your time today.

Juggy: Thank you. Enjoyed it.

Gordon: Yeah. And I look forward to future conversations.

Go to Podcast