AI Talk: Explainable AI and HateCheck

June 11, 2021 / By V. “Juggy” Jagannathan, PhD

x-ray of the lungs

Explainable AI?

I came across an article in AI in Healthcare, about a study published in Nature Machine Intelligence. A lot of studies are claiming great results using black-box neural networks to read radiology images. How good are these black-box neural networks in practice? That is the question researchers from the University of Washington studied in the context of machine learning models created to detect COVID-19 in chest X-rays.

The researchers replicated models for detecting COVID-19 from original datasets and systematically attempted to dissect why the model predicted a certain outcome. They drew upon techniques of explainable AI for this effort. Using saliency maps was one approach where portions of the X-ray images that contributed the most towards the outcome were highlighted. Another approach they tried was using a Generative Adversarial Network (GAN) to systematically convert portions of an X-ray image which leads to a positive diagnosis into one that tests negative. They tested the model with different test sets, drawn from a different institution than the model was originally trained on.

All of these techniques are basically trying to figure out whether the model is coming to the right conclusion based on the true pathology of the case. That is, is it working as intended. To no one’s surprise, they found that the models are relying on all kinds of spurious signals to come up with their decisions. Models were basing decisions on peripheral areas like shoulders and clavicles, or whether or not there were text markers in the images. These are unrelated to the clinical etymology and should have no bearing on the final medical decision. This is known as “learning short cuts” that the model picks up from training data.

This study is a cautionary tale on the creation of such models. It highlights the need for a thorough system audit to ensure it is making the right call for the right reasons. Decisions need to be explainable if they are to be trusted.


In a recent issue of MIT Technology Review, I saw an interesting headline: “AI still sucks at moderating hate speech.” This, of course, is not new information. Online social media platforms are currently scrambling to hire thousands of content moderators (Facebook alone employs more than 15,000 content moderators)! So, what is the focus of this article? It describes a new tool called HateCheck, created by researchers from University of Oxford, The Alan Turing Institute, Utrecht University and the University of Sheffield. This is not an AI program to identify hate speech. This is a tool that will point out exactly how the AI tool used to check for hate speech is failing! That is indeed progress; if you cannot identify how the hate speech detection algorithm is failing, you cannot fix it.

So, how did the researchers manage to accomplish this task? First, here is their definition of hate speech: “Language that is used to expresses hatred towards a targeted group or is intended to be derogatory, to humiliate, or to insult the members of the group.” The groups they addressed with this tool include women, trans, gay, black, disabled, Muslims and immigrants.

They painstakingly documented different ways speech can express hate and ways in which it doesn’t but can still be confused with hate. By talking to an array of non-profits who deal with hate speech in their content platforms, the researchers cataloged 29 functional categories of speech – 18 to detect hate speech, and 11 to classify non-hate. For instance, profanity can be used to express hate, but can also be a form of expression unrelated to hate. The researchers compiled a test data suite of 3,728 curated sentences which can be used to test any program designed to flag hateful content. All of these are assigned to one of the 29 categories the researchers have devised. So, when a particular content moderation algorithm is tested with this test set, you can pinpoint its deficiency. They proved this indeed the case by testing a range of academic and commercial solutions for detecting hate speech. The authors point out their test data set has to be continually improved, though. It is available in the public domain and sure to be a valuable resource for developers who are crafting the next generation of content moderation platforms. Perhaps AI will “suck” less in the future when it comes to detecting hate speech.

I am always looking for feedback and if you would like me to cover a story, please let me know! Leave me a comment below or ask a question on my blogger profile page.

V. “Juggy” Jagannathan, PhD, is Director of Research for 3M M*Modal and is an AI Evangelist with four decades of experience in AI and Computer Science research.

Listen to Juggy Jagannathan discuss AI on the ACDIS podcast.