AI talk: What do LLMs know? And AI safety

Nov. 27, 2023 / By V. “Juggy” Jagannathan, PhD

In this week’s blog, I explore the fascinating question: What does a large language model (LLM) really know? Intriguing studies from MIT researchers contain a partial answer. I also delve into the recent executive order from the Biden administration regarding the safety of artificial intelligence (AI).

What do LLM’s know? 

I read two papers which explore this topic – both posted to Arxiv last month. The first one is titled: Language Models Represent Space and Time and the second one The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations Of True/False Datasets – both papers were posted a week apart! I was intrigued and then found out there are scores of papers focusing on what LLMs manage to encode in their billions of parameters (neurons).

The fascination with what these models represent did not start with LLMs. It has been an ongoing investigation into the black-box nature of neural models. The field of research that explores what models represent grew out of a desire to build interpretable, explainable models. When a model makes a prediction, or a response, can we determine what triggered a particular response or the rationale for the response?

How does one go about figuring out what an LLM knows? Turns out, there is a more or less standard approach that has been around for a few years. That technique is called probing. Here is some seminal work from Stanford University researcher John Hewitt on this technique. The core idea behind probing is to use the representation of the input, using the model to train a classifier to predict some linguistic property. So, one could train a probe with the LLMs representation to predict, say, the parts of speech of a sentence. If one is able to train such a probe easily, then one can conclude the model has the ability to encode understanding of nouns, verbs, adjectives, etc.

But there are billions of parameters in a given model. How do you select the content of which parameters to use for training the probe? Researchers have tried lots of different combinations here – since transformer architecture organizes the parameters in layers, one could select the top layer or any layer and see where the probe behaves best. Of course, it is not easy to design proper probes and it is a complex endeavor.

Now back to the original papers that caught my attention. What the MIT researchers have determined recently, is that LLMs encode space and time. They used three spatial datasets that map places to countries and regions around the world, and another three temporal datasets that map historical figures, artworks and news headlines with their respective time frames. Using probes (discussed above) they determined that they could identify specific neurons that encode spatial and temporal content. Their probes correctly identified countries and time frames. And they did this using a model much smaller than GPT-4 (proprietary, hence not probable) – the 70 billion parameter Llama 2. The second paper mentioned above, uses a true-false dataset, and probe a smaller 13 billion parameter Llama model in its capacity to correctly identify true and false statements.

We are a long way from understanding how and why LLMs perform the way they do, but that is not for lack of trying. The types of probing LLMs to determine what they represent is eerily similar to researchers probing brain cells to determine which region controls what function! Just recently a series of studies mapped more than 3,000 cells in the brain to what they control. We still don’t understand how our brain functions. Will we understand LLMs first or brains first? Or neither? Only time will tell.

Executive order on AI safety 

On Oct. 30, the White House released an executive order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. This 100-page tome covers a lot of ground, but there is a fact sheet available if you don’t have the time to read all 100 pages. There is a lot of commentary about the executive order and I found this one by Stanford Institute for Human-Centered AI (HAI) quite useful.

So, what is the main thrust of the executive order? How best to develop and deploy generative AI systems safely and securely, and what guardrails to put on the process. Stanford HAI calls this executive order, a massive step forward. One that aims to put the U.S. in the driver’s seat of the generative AI wave.

Here are some aspects of the order: 

  • Foundation models that are greater than certain size, that can be used for dual use (meaning for military and civilian) need to have a structured testing effort – called “red teams” that will systematically attempt to find flaws and vulnerabilities in the model. 
  • A focus on protecting privacy and advancing equity and civil rights. 
  • Attracting talent to U.S. (and government) by providing immigration incentives to top technical personnel.  
  • Promoting global AI collaboration. 
  • A series of deadlines on various implementation milestones – here is an example:  
    “… within 180 days of the date of this order, the Secretary of Defense and the Secretary of Homeland Security shall, … complete an operational pilot project to identify, develop, test, evaluate, and deploy AI capabilities, … to aid in the discovery and remediation of vulnerabilities in critical United States Government software, systems, and networks.” 

This latest executive order follows a series of smaller executive orders this year, but this one is fairly comprehensive in its scope. National Institute of Standards and Technology (NIST), one of the organizations tasked with a range of follow-on tasks, is seeking collaborators on its effort to create a new consortium – dubbed the U.S. AI Safety Institute. Amidst the frenzy of activities surrounding generative AI, my friend and colleague has an interesting takeaway on why it is so hard to put guard rails on AI. Look at some of the images he manages to generate – presidents playing with blocks!


The two papers on what does LLM know and represent were sent to me by my longtime friend and classmate, Prem Devanbu. And, thanks to my friend and research colleague, Nate Brake, who had shared with us his thoughts and experiments related to the EO.

I am always looking for feedback and if you would like me to cover a story, please let me know! Leave me a comment below or ask a question on myblogger profile page.

“Juggy” Jagannathan, PhD, is an AI evangelist with four decades of experience in AI and computer science research.