AI: Into the 'mind' of an LLM AI. RTZ #366
...new Anthropic research on how token-driven 'matrix math' does its 'magic'
Since the earliest days of AI: Reset to Zero here, over 365 daily posts ago yesterday, I’ve highlighted that the best AI scientists building the ever scaling AI LLM models, barely understand why and how they really do their thing. In a post last July 16 titled “AI: Can’t see how it works…and all that we don’t know”, I noted:
“It may be worth hearing directly on this from a deep practitioner of the LLM AI arts, Sam Bowman, who is an AI academic and practitioner at NYU, and has written important AI papers, on why these results are not like traditional software. Why even the coders do not know how all this AI code works:
“If we open up ChatGPT or a system like it and look inside, you just see millions of numbers flipping around a few hundred times a second,” says AI scientist Sam Bowman. “And we just have no idea what any of it means.”
“Bowman is a professor at NYU, where he runs an AI research lab, and he’s a researcher at Anthropic, an AI research company. He’s spent years building systems like ChatGPT, assessing what they can do, and studying how they work.”
“He explains that ChatGPT runs on something called an artificial neural network, which is a type of AI modeled on the human brain. Instead of having a bunch of rules explicitly coded in like a traditional computer program, this kind of AI learns to detect and predict patterns over time.
“But Bowman says that because systems like this essentially teach themselves, it’s difficult to explain precisely how they work or what they’ll do. Which can lead to unpredictable and even risky scenarios as these programs become more ubiquitous.”
Anthropic, the purveyor of one of the largest Foundation LLM AI models around Claude 3 released their latest AI paper that takes further steps in understanding how the neural ‘AI matrix math’ in these LLM systems may be doing their thing. In a post titled “Mapping the Mind of a Large Language Model”, they explain:
“Today we report a significant advance in understanding the inner workings of AI models. We have identified how millions of concepts are represented inside Claude Sonnet, one of our deployed large language models. This is the first ever detailed look inside a modern, production-grade large language model. This interpretability discovery could, in future, help us make AI models safer.
“We mostly treat AI models as a black box: something goes in and a response comes out, and it's not clear why the model gave that particular response instead of another. This makes it hard to trust that these models are safe: if we don't know how they work, how do we know they won't give harmful, biased, untruthful, or otherwise dangerous responses? How can we trust that they’ll be safe and reliable?”
“Opening the black box doesn't necessarily help: the internal state of the model—what the model is "thinking" before writing its response—consists of a long list of numbers ("neuron activations") without a clear meaning. From interacting with a model like Claude, it's clear that it’s able to understand and wield a wide range of concepts—but we can't discern them from looking directly at neurons. It turns out that each concept is represented across many neurons, and each neuron is involved in representing many concepts.”
“Previously, we made some progress matching patterns of neuron activations, called features, to human-interpretable concepts. We used a technique called "dictionary learning", borrowed from classical machine learning, which isolates patterns of neuron activations that recur across many different contexts. In turn, any internal state of the model can be represented in terms of a few active features instead of many active neurons. Just as every English word in a dictionary is made by combining letters, and every sentence is made by combining words, every feature in an AI model is made by combining neurons, and every internal state is made by combining features.”
Ars Technica goes on to simplify this technical explanation in “Here’s what’s really going on inside an LLM’s neural network”:
“With most computer programs—even complex ones—you can meticulously trace through the code and memory usage to figure out why that program generates any specific behavior or output. That's generally not true in the field of generative AI, where the non-interpretable neural networks underlying these models make it hard for even experts to figure out precisely why they often confabulate information, for instance.”
“Now, new research from Anthropic offers a new window into what's going on inside the Claude LLM's "black box." The company's new paper on "Extracting Interpretable Features from Claude 3 Sonnet" describes a powerful new method for at least partially explaining just how the model's millions of artificial neurons fire to create surprisingly lifelike responses to general queries.”
Both pieces are worth reading in full, especially the Ars Technica interpretation.
But I note all this here, because AI scientists are slowly but surely starting to develop methodologies to understand the ‘interpretability’ of these models. And that’s a good thing in these early days of the AI Tech Wave, as it’s poised to go mainstream for billions of users worldwide over the next couple of years. Drinking in never-ending flows of extractive AI Data (Box no. 4 below).
But we’re still at the ‘beginning of the beginning’ in AI technologies. As Meta’s AI chief Yann LeCun just emphasized to the FT, current AI methods ‘are flawed’, as his colleagues figure out ‘world modelling’ for superintelligence:
“Meta’s artificial intelligence chief said the large language models that power generative AI products such as ChatGPT would never achieve the ability to reason and plan like humans, as he focused instead on a radical alternative approach to create “superintelligence” in machines.”
“Yann LeCun, chief AI scientist at the social media giant that owns Facebook and Instagram, said LLMs had “very limited understanding of logic . . . do not understand the physical world, do not have persistent memory, cannot reason in any reasonable definition of the term and cannot plan . . . hierarchically”.
“In an interview with the Financial Times, he argued against relying on advancing LLMs in the quest to make human-level intelligence, as these models can only answer prompts accurately if they have been fed the right training data and are, therefore, “intrinsically unsafe”.
“Instead, he is working to develop an entirely new generation of AI systems that he hopes will power machines with human-level intelligence, although he said this vision could take 10 years to achieve. Meta has been pouring billions of dollars into developing its own LLMs as generative AI has exploded, aiming to catch up with rival tech groups, including Microsoft-backed OpenAI and Alphabet’s Google.”
“LeCun runs a team of about 500 staff at Meta’s Fundamental AI Research (Fair) lab. They are working towards creating AI that can develop common sense and learn how the world works in similar ways to humans, in an approach known as “world modelling”.
So a long way to go indeed in this AI Tech Wave journey. But it’s all worth diving in a bit deeper to see how it’s just a crazy amount of statistical ‘floating point math’ on infinite numbers of tokens and some code, doing some amazing ‘AI things’ at scale.
As I explained in the post on Microsoft’s upcoming ‘AI PCs’ earlier this week (and soon Apple and others), we’re about to have PCs/laptops in a few weeks, that will do calculations measured in TOPs, or Trillions of Operations a second. And soon it’ll be our smartphones. A lot of AI matrix math indeed.
Even though our chips and machines can run these incomprehensible number of numbers in the cloud and on our devices, there’s a lot that we don’t know yet. But there’s more that we’re starting to understand. While we’ve also got a ton of additional new AI technologies to develop to make these AI systems do far better reasoning and ‘smart agent’ driven ‘agentic’ work that we all aspire for these systems to do.
While we figure out if it gets to AGI (artificial general intelligence), whenever we get there. Stay tuned.
(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)