Inside Angle
From 3M Health Information Systems
AI Talk: GPT-3 mega language model
I saw a spate of articles surrounding a new mega language model from Open AI. This blog attempts to summarize this new development.
Generative Pretrained Transformer 3 (GPT-3)
What in the world is GPT-3? It’s the latest version of a pretrained transformer model. Version 2 was released last year and this year we have the third version released by the Open AI. Transformer is a deep learning architecture and language model popularized by Google a few years ago. A language model essentially predicts the next word given the left context, sort of like the command completion words your phone prompts while you are texting or writing anything. To get a visual (technical view) of what GPT-3 does, check out this blog post. This model is HUGE. A common way researchers used to talk about the size of a model was to characterize the number of learnable parameters in the model. GPT-3 has 175 billion parameters. The earlier version had 1.5 billion parameters. To provide a contrast, the human brain has about 85 Billion neurons. Though the two are not really quite equivalent, it is an indication of the scale of these new models. So, the question is, what have they achieved with this new model? Instead of releasing the model, Open AI has released an API to use the model with the intent of commercializing the model applications.
The main purpose of a pretrained language model is its use in transfer learning. What is transfer learning? It is the process of using the pretrained model to new applications easily. Easily implies the training for the new problem can be achieved with much fewer training examples than would be needed if the problem was addressed on its own. GPT-3 takes this approach to the level of almost human comprehension. We don’t need to see thousands of dogs to know something is a dog. We can generalize quite easily with a few examples. This is exactly the intent of the designers of GPT-3: Show a few examples of a new problem, and then apply it. The results are indeed impressive. You can take a look at the results in this technical report. So, for example, if you want to translate English to French, you provide a few translations (in their lexicon – few shot) and ask it to translate a set of words. Give examples of story context and closing sentences, and then ask it to complete random stories. You can test it on common sense reasoning and more. The list seems endless. They report the results using the nerd’s benchmark for assessing natural language understanding capability – “SuperGlue”.
With their limited release of the API for GPT-3 (you need to get on their waiting list to experiment with it), all we can go on for now are rave reviews from people who have tried it. This MIT Technology Review article refers to a whole series of tweets and other posts on the results people are seeing with this technology. One of the most interesting and impressive relates to an entire article on GPT-3 written by, who else, GPT-3! One user of the GPT-3, Kevin Lacker, decided to investigate how close this technology is to passing the Turing Test. The goal of the Turing Test is to see whether the language model can fool a human into thinking that there is a real person at the other end of a conversation. So, Kevin set up the contextual examples to seed his question/answer session – a set like, “Who was the president of U.S. in 1955” with the correct answer and then asked questions like “How many eyes does a giraffe have?” GPT-3 aced the common-sense questions, even when the context was carried over from question to question. Kevin, decided to intentionally trick the system by asking nonsensical questions: How many eyes does my foot have? Two. How many eyes does a blade of grass have? One. Who was the president of U.S. in 1700? William Penn. Clearly, you can trick it with nonsensical questions. A few people tried to get GPT-3 to write simple web code by describing in natural language what needs to be displayed. Getting computers to write code from natural language statements is a whole other big area of research.
So, where does this all leave us? To be sure, it is an impressive step forward. It can give a whole new meaning to the term fake news or article! And, being trained on messy internet data, it has internalized all the biases that exist! The OpenAI team probably doesn’t know what real problems they can solve with this technology and have decided to try an API approach. Also, there is the worry that it might be used for nefarious purposes, which can be controlled by careful selection and monetizing of the solution. One thing is clear: this is not Artificial General Intelligence (AGI), the holy grail of AI researchers, but it is one more step in that direction.
I am always looking for feedback and if you would like me to cover a story, please let me know. “See something, say something!” Leave me a comment below or ask a question on my blogger profile page.
V. “Juggy” Jagannathan, PhD, is Director of Research for 3M M*Modal and is an AI Evangelist with four decades of experience in AI and Computer Science research.