AI talk: Yet another foundation model and eye of the storm

April 24, 2023 / By V. “Juggy” Jagannathan, PhD

In this week’s blog I focus on a newly announced foundation model from Meta and the swirling controversies and news around the large language models and generative artificial intelligence (AI) saga. 

Yet another foundation model 

I saw a blog post in Medium by Gurami Keretchashvilli that talked about a new revolutionary image segmentation model that can segment anything. I was intrigued so I checked out what it was about. It turns out Meta AI has indeed released a new model and a dataset. So, what is novel about this model? Prompt-based image segmentation! You say, “cat with black ears,” and the segment of the image with the right cat is highlighted (masked). They have released a new Segment Anything Model (SAM) along with the dataset and code used to train the model. Their stated goal: Foster research in foundation models on computer vision.

They have a demo of the segmentation model – you can upload any image and get them to identify objects in the image. The general approach is to come up with image embeddings (vector representation of the image). The model then extracts objects in the image and masks them based on prompts which can be a specific location in the image, a boxed area or textual prompt, etc. Text prompts representation is based on Open AI’s CLIP model that connects text to images. The dataset to learn the segmentation – 1 billion masks – released with the model and code has an interesting story behind it. Part of it is manually curated, part generated using the model itself and validated, and the rest automatically generated. 

How is this type of model useful? It can be incorporated in any vision application that requires identification of objects in the field of vision. Of course, Meta is into metaverse – and there are a lot of augmented and virtual reality (AR/VR) applications that can benefit from recognizing objects in the field of vision. Surveillance is another huge area, as is its potential in self-driving cars. I can also imagine this tech being used in shopping carts to automatically segment objects as a prelude to recognizing them.

Eye of the storm 

ChatGPT, Bard and a slew of other developments have been at the forefront of the news cycle the past few months. One of the latest of these news stories is by Scott Pelley of 60 Minutes, who interviewed Google executives regarding the rapid evolution of AI technology. Sundar Pichai, CEO of Google, confirmed what has been known for a while: Yes, it is groundbreaking technology and no, as a society we are not prepared to deal with its ramifications. But the silver lining from his perspective is that more people are worried about the technology impact now. It is fairly early stages for this tech, and that is a good thing. 

There is a growing realization that generative AI is going to impact every industry and every job in some way. That is a bit unsettling mostly because little is understood about how or why these models are performing the way they do. Sundar Pichai acknowledges this in the interview: “These algorithms are showing emergent properties, to be creative, to reason, to plan, and so on.” 

Emergent in this context essentially means it just appears out of the blue. It was not coded to achieve those objectives. Here is a nice recent blog on this topic and I also wrote a blog on the subject a few months back. One of the emergent properties discussed in the “60 Minutes” interview is the ability to understand the Bengali language native to India. With just a few prompts, Bard was suddenly proficient in generating translations of the Bengali language. Google is now going about systematically using it with all the low resource languages out there, and there are thousands of them.

The current lack of understanding surrounding these black box models with emergent properties has given rise to a fair degree of alarm. This led to an open letter a few weeks ago from the Future of Life Institute. Basically, the request from the signatories of this letter (currently more than 26,000), is to pause the building of more powerful models for six months and examine the ramifications of the deployment such models and get proper regulatory oversight. The signatories include many leading authorities in AI research and development. Top among them is professor Yoshua Bengio who received the Turing Award a few years ago for his work involving deep neural networks.

Professor Bengio has been fairly vocal on his concerns about ChatGPT-like systems. He underscores his reasoning in his recent blog. Fundamentally, his concern is to ensure these systems don’t cause harm by creating inappropriate or hallucinated content. He also has a research angle to his complaint which comes out in the recent Eye on AI podcast episode. His argument is that the current generative models, based purely on scaling up one aspect of human cognition, in this case the creation of language, is ignoring other aspects. These systems do not have any real world knowledge (common sense) or a systematic way to logically reason with the data. All these capabilities, if they exist, are mushed together in one model. They are not like that in our human brains. There is a section in systems like these devoted to language understanding and creation, but other sections involve reasoning ability, ability to do math, common sense knowledge, etc. Professor Bengio is focused on creating separate neural models to address these deficiencies in LLMs. He calls them inference machines and you can read all about it here. The research is still in its infancy.

Perhaps the Future of Life Institute letter had its intended effect even if there is not likely to be any pause in the frenetic pace with which AI technology is evolving. Senate majority leader Chuck Schumer has called for drafting legislation with the stated purpose: “to create a framework that outlines a new regulatory regime that would prevent potentially catastrophic damage to our country while simultaneously making sure the U.S. advances and leads in this transformative technology.” Let’s hope they are successful.

There is no doubt we are in an inflection point and like the optimistic Pichai opines – “But I think if I take a 10-year outlook, it is so clear to me, we will have some form of very capable intelligence that can do amazing things. And we need to adapt as a society for it.” He is right in my view. 

 I am always looking for feedback and if you would like me to cover a story, please let me know! Leave me a comment below or ask a question on my blogger profile page.  

“Juggy” Jagannathan, PhD, is an AI evangelist with four decades of experience in AI and computer science research.