AI Talk: Health inequities and federated learning

June 4, 2021 / By V. “Juggy” Jagannathan, PhD

Addressing health inequities

One of the glaring impacts of the pandemic is that it has shined a bright spotlight on the health inequities plaguing this nation. Now there is a tangible movement towards addressing these inequities. The first step is documenting them. A collection of health care organizations and tech firms have set up a Health Equity Tracker. Satcher Health Leadership Institute, Gilead Sciences, Google, CDC Foundation, AARP and the Annie E. Casey Foundation are currently backing this venture. 

The tracker presents a visualization of COVID-19 data stratified by race, ethnicity, sex, age, comorbidities, social and political determinants of health and poverty markers across states and counties. I am familiar with social determinants of health—they have been in the news quite a bit lately—but political determinants? It turns out there was a book written about this topic recently (which I have yet to read) by Daniel Dawes. Essentially, the argument here is that government policy and action (or inaction) perpetuates inequities. And there is certainly logic to it. If you don’t act to address inequities, how do you fix them? The goal of the Health Equity Tracker is to collect data from underserved communities and provide policy templates on how to address inequities. The program will continue to track data after policies have been adopted to determine if the actions taken are actually helping to address the inequities.

Addressing bias in AI systems has also been in the news. The tracker project above can help compile the right kind of datasets to train AI algorithms. Beth Israel Deaconess Medical Center has recently announced a seven-step framework that integrates health equity factors and racial justice elements into developing AI systems. These include ensuring diverse data sets, establishing metrics that can address systemic racism and discrimination and tracking outcomes of disadvantaged populations.

The Biden administration has also announced a series of steps aimed at addressing this issue. It appears that the climate is right to address health inequities and social injustice. Let’s hope these efforts succeed.

Federated learning

Privacy is a major concern when using health care datasets to train machine learning models. In the U.S., the major legislation protecting patient privacy is HIPAA. Europe has much stronger privacy protections laid out in the General Data Protection Regulation (GDPR). Researchers have tried several approaches to create machine learning models while simultaneously protecting privacy. One standard approach is to de-identify the data. However, this process is not perfect and tends to leak some private information. Another approach is to create synthetic datasets which use the statistical properties of the underlying data. Yet another approach is to use something labeled as “differential privacy” which adds random noise to the data. There is also an approach called “federated learning (FL).” Here is a blog that gives an overview of these approaches and more. The approach discussed below relates to FL.

A number of researchers in Europe from Germany, the United Kingdom and France have created an open-source software framework called PriMIA (Privacy-preserving Medical Image Analysis). What does it do?  It allows machine learning models to be trained using data from multiple institutions without the institutions actually sharing or pooling the data in one place. The technique used here is FL. It is not a new approach, but, using an open-source framework to achieve this is new. So, how does this process work? Individual institutions train the model with their own data. The models created are shared securely using encrypted protocols with a centralized server. A pooled model is then created and shared back with each institution. The training continues until a desired endpoint is achieved. Finally, the pooled model is available for actual use. The authors validate their approach using a large dataset for classifying pediatric chest radiographs as normal or containing viral or bacterial pneumonia. A deep convolutional neural network is utilized for classifying these images. This approach shows promise for creating models that have access to large diverse datasets while protecting patient privacy.

I am always looking for feedback and if you would like me to cover a story, please let me know! Leave me a comment below or ask a question on my blogger profile page.

V. “Juggy” Jagannathan, PhD, is Director of Research for 3M M*Modal and is an AI Evangelist with four decades of experience in AI and Computer Science research.

Listen to Juggy Jagannathan discuss AI on the ACDIS podcast.