Inside Angle
From 3M Health Information Systems
AI talk: Emergent behaviors of large language models
Large language models are all the rage now in artificial intelligence (AI). The latest incarnation, ChatGPT, is exhibiting superhuman abilities in responding to instructions – from writing short cogent essays to composing poems in the style of Shakespeare to opine about anything in the world. The focus of this blog is on what these models can do and when.
First, a language model is simply a machine learning model that is trained on corpus of text. It is trained to predict what the next word is, given the left context. Trained with lots and lots of text that can be found on the web. The dataset used is referred to as C4 (Colossal Clean Crawled Corpus), a massive collection of text scraped from the web. A large language model is simply that: Large. General Pretrained Transformer version 3 (GPT-3) from Open AI has 175 billion parameters. Language Model for Dialog Applications (LaMDA) from Google has 137 billion parameters. Pathways Language Model (PaLM) from Google has 540 billion parameters. All released in the past few years.
Now back to the question of why models like ChatGPT are able to do a variety of tasks and do them very well. Therein lies the mystery: Researchers have no idea! AI researchers are step-by-step trying to discover what these models can do. Notice, I use the word, discover, not invent! Yes, they are trying to figure out what is possible to do with these models.
Well, for starters, the behavior has been observed only in really large language models – the ones with hundreds of billions of parameters. Smaller versions of GPT or LaMDA or PaLM do not exhibit the same ability larger ones do. The reason, the behavior is called emergent.
What are some examples of this emergent behavior? Here are few examples.
- Zero or few shot prompting. Give one or more examples and ask a query. For example, giving the model the input: Review: This movie sucks. Sentiment: negative. Review: I love this movie. Sentiment? The model outputs: positive.
- Chain of thought prompting. Here the example inputs are logical reasoning of say, a simple math problem giving step-by-step reasoning. Now given a new similar difficulty problem to solve, the model spits out the answer for the problem with detailed step-by-step reasoning.
- Code generation. Given a natural language description of what needs to be done, the model outputs code in say, python, to accomplish the task.
- Answering general knowledge questions. Ask anything and the model’s answer is quite lucid, even though it may not be correct.
ChatGPT is certainly pushing the envelope on what is possible. Try it out for yourself here (if you have not already done so) while it is still free – millions have already. Researchers are now trying to figure out the reason these models are exhibiting these behaviors. In this blog by Yao Fu, a University of Edinburgh researcher, the author postulates several theories as to what contributes to the model’s emergent behavior. Training on lots of code, he believes, is a significant contributor. He also conjectures that ChatGPT’s use of reinforcement learning with human feedback (RLHF) is an important differentiator. Reinforcement learning (RL) has been the pillar by which DeepMind’s AlphaGo program managed to reign supreme in the world of Go. RLs role in improving natural language processing is relatively recent and is now being increasingly used to improve LLM’s performance.
These models are only going to get better, but there is also a slew of problems to overcome. ChatGPT, interestingly also has a little bit of self-awareness: It knows what it does not know! So, if you ask something that happened after 2021, the cutoff date for its knowledge, it politely informs you it is unable to answer the question. Google’s model has got around this problem by endowing the model to query its own search engine.
If students start turning in essays generated by these models, it will be hard for professors to deal with it. There is also the looming problem of generating disinformation or toxic content. So far, these models have only been dealing with text. Multi-modality is surely round the corner. Of course, we already have models such as DALL*E generating images from text.
Emergent behavior from these models is beginning to look like the beginnings of artificial general intelligence (AGI). And that is scary and exciting at the same time.
I am always looking for feedback and if you would like me to cover a story, please let me know! Leave me a comment below or ask a question on my blogger profile page.
“Juggy” Jagannathan, PhD, is an AI evangelist with four decades of experience in AI and computer science research.