AI Talk: Algorithms and genome sequencing

February 25, 2022 / By V. “Juggy” Jagannathan, PhD

Properties of a good AI algorithm

A few weeks ago, I saw a peer reviewed journal article titled “Ideal algorithms in health care,” and I was curious to see what the authors from University of Florida had to say on that subject. In the article, they identified six characteristics of an ideal algorithm: Explainable, dynamic, precise, autonomous, fair and reproducible. Dynamic refers to the ability of the algorithms to react to real-time data. The facets of algorithms that they identified are reasonable and self-evident.

However, I want to deviate slightly from their defined characteristics of an ideal algorithm. I will argue that there is another aspect which is a more general and necessary feature for all algorithms: trustworthiness. Explainable, bias free, precise algorithms can garner trust, but one can achieve trustworthiness through other means. A physician trusts a particular medication to be effective for a particular condition because the medication has gone through clinical trials and won FDA approvals, and not because he understands fully the mechanics of how the medication treats the condition Likewise, it is possible to trust algorithms when they have been rigorously evaluated and  ensure  validity for the purpose they were created.

The algorithms the authors refer to are defined as those that are either related to diagnosing specific conditions from multi-modal data, or predicting some future state based on such data. But AI methodology also comes into play in a range of other health care applications. A voice-based virtual assistant in health care interacting with patients and clinicians also uses a variety of algorithms and techniques to understand and engage. If the assistant gets the interactions right, it will continue to be used – if not, it will be discontinued. Algorithms also come into play when assistants automatically summarize doctor patient conversations. Such generated summaries need a completely different approach to evaluation.

But the authors clearly highlight the need for a more systematic way to evaluate algorithmic goodness for patients, clinicians and investigators, and that is a good start.

Current state and promise of genome sequencing

I came across a podcast focused on genome sequencing a few weeks ago in The Economist. We all know that genomes factored significantly in the identification of the coronavirus structure and the development of COVID-19 vaccines. So, I was curious to determine what exactly was happening in this sphere now and what health care applications can benefit from it.

The podcast is a good snapshot of what is happening with genetic technology – though a bit UK-centric. For one, when the first whole genome sequencing was done more than two decades ago, the cost was more than a billion dollars. Now the cost is around $1,000 and coming down. The pandemic has been a big accelerant for genetic technology. Virus sequencing has been a big focus in the UK and now the same technology is branching out to detect rare diseases. UK’s Biobank, funded by the National Health Service (NHS), now has over 200,000 people whose genome has been sequenced.

A comparable effort in the U.S. is the “All of Us” research started in 2015 with the goal of enrolling 1 million to study health data, including DNA. This effort has enrolled about 200,000 individuals as of this blog publication. The data with adequate privacy controls is being made available to researchers to study the linkage between personalized data and diseases. All of Us is one of the major undertakings in the U.S. to investigate the promise of personalized medicine.

There are two major arcs of explorations happening with gene technology. One is identifying which gene mutations cause what diseases and the other is focused on therapeutic intervention using genes – CRISPR technology playing a major role.

Getting the genomes sequenced in a large body of the population is critical to identify the links between disease and genetic markers. The UK’s biobank and the U.S.’ All of Us program are focused on this effort. Estonia, a country of just 1.3 million, has embraced a complete digital economy, from blockchain to genome sequencing. Estonia’s approach is not something we can adopt here in U.S. Nordic countries with large social nets are utilizing similar approaches to collecting genetic information. Here in the U.S., the commercial sector has been quite successful in collecting a large body of genetic data – 23andMe and Ancestry have collected DNA data from more than 26 million users. The massive accumulation of new data has led to identifying genetic causes for rare diseases, such as mitochondrial diseases. Researchers have also discovered, for instance, that one can identify a specific allergic reaction related to a cancer drug using genetic data.

Gene therapy is currently taking off like a rocket. The amount of new investment this area attracted just last year was $23 billion. Though the field is not new, there is a lot of excitement surrounding it. We will explore this area in more detail in a future blog.

I am always looking for feedback and if you would like me to cover a story, please let me know! Leave me a comment below or ask a question on my blogger profile page.

V. “Juggy” Jagannathan, PhD, is Director of Research for 3M M*Modal and is an AI Evangelist with four decades of experience in AI and Computer Science research.