AI: Putting AI Errors in context. RTZ #368
...about those "AI Gotcha' moments, by Google and others
The Bigger Picture, Sunday, May 26, 2024
To ‘Err is Human, to Forgive Divine’ we like to say. We may need to extend these words to AI as well. That’s the Bigger Picture I’d like to explore this Sunday.
These past two weeks saw big AI leaps forward by OpenAI, Google, Microsoft, Anthropic, Nvidia, and others. All impressive this early in the AI Tech Wave. Only a year and half after OpenAI’s ‘ChatGPT moment’ in November 2022. But these past few days also saw the media and observers pounce on Google for some obviously erroneneous results in their Gemini AI Search product, just being rolled out to users in the US.
It’s ironic that while media, regulators and society has been fearing and dreading what happens if and when AI reaches AGI (artificial general intelligence), and surpasses human capabilities, we also pounce on when these early LLM AI systems also make human like mistakes. And not give them the room to make mistakes and literally learn from them, much like we were taught to do with humans in kindergarten. But instead use these episodes as ‘Gotchas’ and hand-wringing that AI technologies may not be ready for prime-time. While we know that’s the case already, and the companies deploying Ai constantly remind us of that every step of the way. Let me unpack the Bigger Picture.
But first, a bit of background on what transpired. As Axios summarizes it in “Google’s AI summaries cause headaches and spawn memes”:
“The blowback to Google's AI Overviews is growing now that they are showing up for all U.S. users — and sometimes getting things glaringly wrong.”
“Why it matters: The search giant's addition of AI-generated summaries to the top of search results could fundamentally reshape what's available on the internet and who profits from it.”
“Many users habitually use Google searches to check facts, particularly now that AI chatbots deliver sometimes suspect answers. Now these search results are subject to the same mistakes and "hallucinations" that plague the chatbots.”
“Driving the news: Google last week announced that it was making AI-generated summaries the default experience in the U.S. for many search queries.”
“People quickly began highlighting scores of glaring errors, from dangerously incorrect advice on what to do if bitten by a rattlesnake to a summary that had Google's AI suggesting it has kids and serving up their favorite recipe.”
“Google seemed to have particular trouble with data about U.S. presidents, getting wrong how many presidents there have been — and, more troublingly, repeating the baseless conspiracy theory that President Obama is Muslim.”
“In another well-publicized example, Google's summary suggested using glue to keep cheese on pizza — a comical notion seemingly taken from a Reddit post.”
“Adding to the problems, AI has a hard time distinguishing fact from satire, leading Google's summaries to state The Onion content as fact, as highlighted in this thread by the site's new CEO.”
“Context: Google has spent 25 years defending its reputation for informational integrity, devoting enormous resources to fine-tuning the accuracy of its search results.”
“Its push into AI-generated results puts that reputational reserve at risk.”
“What they're saying: "Look, this isn't about "gotchas," wrote former Google AI ethics researcher Margaret Mitchell.”
"This is about pointing out clearly foreseeable harms before--eg--a child dies from this mess. This isn't about Google, it's about the foreseeable effect of AI on society."
Google of course has responded to these over-publicized ‘Gotcha’ AI errors, explaining:
“Now Google said those examples were for rare queries and claimed that the feature is working well overall.”
“The examples we’ve seen are generally very uncommon queries, and aren’t representative of most people’s experiences,” a spokesperson said. “The vast majority of AI Overviews provide high quality information, with links to dig deeper on the web.
“We conducted extensive testing before launching this new experience to ensure AI overviews meet our high bar for quality. Where there have been violations of our policies, we’ve taken action – and we’re also using these isolated examples as we continue to refine our systems overall.”
“The company said that it had added guardrails to its system with a view to stopping harmful content appearing, that it had subjected the system to an evaluation process and testing, that AI overviews were built to comply with its existing policies.”
“According to Google, it has also worked recently to improve the system to make it better at giving factual answers to responses.”
“The problems appeared to have come about in part because of the data that is used to inform the responses, which may include jokes or other content that becomes misleading when it is re-used in an answer. But part of the issue may also be the tendency to “hallucinate” by large language models such as those used by Google.”
“Because those large language models are trained using linguistic data, rather than facts, they have a tendency to give answers that may be worded convincingly but actually include falsehoods. Some experts have suggested that such problems are inherent in those systems.”
You can decide for yourself of course, and read the above in detail, as well as these other stories on the Google AI search episodes. And read and/or watch this interview on the Google Gemini AI rollout by the Verge, asking critical, and sometimes ‘Gotcha’ questions directly to Alphabet/Google CEO Sundar Pichai. Impressed how Sundar Pichai laid out the AI case for Google in a balanced way.
And recognize that this episode could equally happen tomorrow to OpenAI, Microsoft, Meta, and any number of other AI companies innovating and launching AI products and services. Especially as they’re all on the cusp of rolling out AI for mainstream users.
But the Bigger Picture is simple. We’re in the earliest of days applying ‘AI Scaling Laws’ with AI software and hardware smoking traditional computing Moore’s Laws with 3 times or more better performance improvements over every two years, while more than halving the costs. These AI systems today are the worst they’ll ever be today. And the ones from a year or two from now will be almost unimaginably better than the ones today.
And these systems are simply, artificially ‘learning’ from all human digital content now and into the future.
Then applying incomprehensible amounts of probabilistic AI Matrix Math on the underlying data tokens to give us approximations of what may be the right responses to our queries and prompts. To emulate what humans do when humans are asked questions by other humans. Some mistakes of course WILL occur.
This is not about asking if 2+3 is 5. This is more about asking ‘reasoning’ questions like humans ask other humans. We need to provide the same slack to these AI systems in these early days, as we would to friends, family and strangers when we ask questions like ‘how long to bake the pizza’, or ‘where is the best pizza in New York”?
The answers may vary and have some incorrect embedded opinions. Just like human responses do. Even when the AI computers are computing away soon at trillions of operations a second. They’re not infallible.
And may not ever be. Yes, AI systems will likely continue to Err just like humans. Even at Scale. But they’re going to be unexpectedly useful nevertheless. As we’re going to find out very soon as more of us have a chance to use AI systems in new ways in this AI Tech Wave. Just wait for it. Stay tuned.
(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)