Optimizing CAC outcomes: What are you training for?

Nov. 6, 2017 / By Jason Mark

Our team’s last post introduced a metric we call “Coder Variability.” As a quick recap, it is an attempt to measure how much variation there is among coders at a facility. Jessica’s post gives an excellent explanation of the metric, so please refer back to it if you haven’t read it yet. In response to the post, a reader submitted the following question:

“Do you think it’s more appropriate for an institution/practice to implement coding instruction, or should that come from the separate insurance carriers or CMS in general because of the variation in regulations?”

As I thought through the answer to this question, it felt more appropriate to turn it into a quick blog post of its own. 

While the context of the question is around computer-assisted coding (CAC) since CAC is a task that leverages aspects of machine learning, it is really a much broader question that applies to many machine learning/artificial intelligence applications. Every week we see new advances and capabilities for machine learning algorithms. In each case, however, the machine was “taught” by optimizing for a particular outcome (e.g. make as few mistakes in categorizing an image as being a dog or a cat). When it comes to CAC, what outcomes are we trying to optimize? Answering this question can be difficult when evaluating Coder Variability and Recall since organizations have different goals and priorities.

In an ideal world, all coders would receive the exact same training from the exact same source. That would reduce one source of variation in their coding. This would suggest that a more centralized training approach across the industry from CMS, AHIMA, or some equivalent type body is preferred. If the goal is to reduce variation and improve coding consistency, then a single source of “truth” would be the way to go. In the real world, however, there are a plethora of issues that can introduce variation. These occur regardless of where and how training may be delivered, simply because coding is still a human-driven process. Consider the following scenarios:

  • One organization might be extremely focused on a quality measure that depends on coding being as complete and accurate as possible. Another organization might be trying to do as much as they can with fewer resources and therefore focus less on completeness and more on throughput. Different circumstances force organizations to find a balance between “as complete as possible” and “as fast as possible.” 
  • Within an organization, this trade-off occurs between coders and is known as “inter-rater reliability.” For example, an experienced coder might code more quickly than an inexperienced coder. This might be because she knows to ignore certain codes that, in her experience, “don’t matter much” or because she uses more codes from memory. 
  • The same issue can even exist with a single coder on different days, referred to as “intra-rater reliability.” A coder might be in a hurry to leave early or have extra meetings on a given day that cause him or her to rush through coding at a faster pace than typical. 

Variation can pose an interesting challenge for builders of machine-learning driven applications. For which of the “outcomes” listed above should the system be optimized? Many approaches require training against an answer that is known to be correct, such as “Is this an image of a dog or a cat?” So, should a computer-assisted coding system learn more from Coder A or Coder B? Should it more closely model Coder A on Monday or Coder A on Friday? These variations can be accounted for and reduced to some degree, but once that is done, then the users of the system need to move their behavior in the same direction to create meaningful change. 

Perhaps we’ve gone far afield of the original question, but as is the case with many of our blog posts, we like leaving our readers with more questions to think about.

Jason Mark is manager, Research & Applied Data Science Lab with 3M Health Information Systems.