AI: CrowdStrike Lessons for AI. RTZ #424
...Scaling AI on top of traditional software a daunting but doable task
The Bigger Picture, Sunday July 21, 2024
In this Sunday’s AI Bigger Picture, I’d like to discuss the vulnerable resiliency of IT networks worldwide, especially as the AI Tech Wave ramps up.
The fragility of global software systems was vividly on display this week, by cybersecurity software company CrowdStrike’s Microsoft update mishap, with all the drama of a mini Y2K event. The cascading events shows how complex and vulnerable, even the ‘Simplest Update Fixes’ can be in real world. As the WSJ outlines in “The Software Patch that shook the World”:
“The outage, one of the most momentous in recent memory, crippled computers worldwide and drove home the brittleness of the interlaced global software systems that we rely on.”
“Triggered by an errant software update from the cybersecurity company CrowdStrike, the disruption spread as most people on the U.S. East Coast were asleep and those in Asia were starting their days.”
“Over the course of less than 80 minutes before CrowdStrike stopped it, the update sailed into Microsoft Windows-based computers worldwide, turning corporate laptops into unusable bricks and paralyzing operations at restaurants, media companies and other businesses. U.S. 911 call centers were disrupted, Amazon.com employees’ corporate email system went on the fritz, and tens of thousands of global flights were delayed or canceled.”
The whole piece is worth reading to understand the intricate, multi-layered, and complex networks that run our modern world. And this piece on how the company came to serve 15% of the global security software market since its founding in Austin, Texas in 2011.
This particular tale has all sorts of technical and regulatory historical implications that Ben Thompson reviews in detail. Involves deep dives into Operating System (OS) ‘kernel’ level access to third party developers, and regulators mandating ‘kernel’ level access to third party developers INTO OS kernels (EU for example). And their becoming ‘checklist’ regulatory items to encourage ‘third-party competition’, as pithily outlined by former Microsoft Windows exec Steven Sinofsky.
This is all particularly timely given the ongoing regulatory tussles going on between authorities in the EU and US vs Big Tech. Particularly with an eye on the coming layers of AI software with the barely decipherable workings of myriad LLM AIs.
The reason to discuss it today, is how these systems based on decades of traditional ‘deterministic’ software systems will fare, as we layer on the ‘probabilistic’ LLM AI software systems onto them in the coming years.
Last Sunday, I described how OpenAI laid out its roadmap for the path to ‘AGI’, or artificial general Intelligence in five levels over the coming years.
Google and others are also planning a similar road map, as I discussed here and here in two parts as AI systems are ‘taught’ to better Reason and do ‘Agentic’ stuff. You can see this coming roadmap summarized in a video clip, as well as the broader podcast on this week’s ‘Trends with Friends’ with my friends Howard Lindzon, Phil Pearlman, and JC Parets.
Now imagine, how these coming layers of AI software infrastructure add to the complexity of our global networks to come.
The current nascent state of this AI Tech Wave makes it all so easy to criticize in the face of these traditional software system mishaps. There’s a lot more AI computational complexity coming as it Scales to do unimaginable amounts of computational ‘Matrix Math”.
AI tech gurus and skeptics like Gary Marcus put it as follows in “Don’t look up: the massive Microsoft/Crowdstrike data outage is a huge wake-up call”:
“The world needs to up its software game massively. We need to invest in improving software reliability and methodology, not rushing out half-baked chatbots.”
“Twenty years ago, Alan Kay said “Most software today is very much like an Egyptian pyramid with millions of bricks piled on top of each other, with no structural integrity, but just done by brute force and thousands of slaves”. As Ernie Davis and I pointed out in Rebooting AI, five years ago, part of the reason we are struggling with AI in complex AI systems is that we still lack adequate techniques for engineering complex systems. That hasn’t really changed.”
“Chasing black box AI, difficult to interpret, and difficult to debug, is not the answer. And leaving more and more code writing to generative AI, which grasps syntax but not meaning, as a new benchmark from Stephen Wolfram shows, is not the answer, either.”
Granted, the current systems are going to be a pain to build new stuff that works with the old stuff. As the WSJ puts it:
“The CrowdStrike problem laid bare the risks of a world in which IT systems are increasingly intertwined and dependent on myriad software companies—many not household names. That can cause huge problems when their technology malfunctions or is compromised. The software operates on our laptops and within corporate IT setups, where, unknown to most users, they are automatically updated for enhancements or new security protections.”
And all of the above is BEFORE any ‘Cyberattacks’ by malicious actors worldwide, which is also a fast growing global underground industry:
“The rising frequency and impact of cyberattacks, including ones that insert damaging ransomware and spyware, have helped fuel the growth of CrowdStrike and such competitors as Palo Alto Networks and SentinelOne in recent years. CrowdStrike’s annual revenue has grown 12-fold over the past five years to over $3 billion.”
“But cybersecurity software such as CrowdStrike’s can be especially disruptive when things go wrong because it must have deep access into computer systems to rebuff malicious attacks.”
Despite the best, latest waves of security technologies, the biggest vulnerabilities remain human and organizational :
“But cybersecurity software such as CrowdStrike’s can be especially disruptive when things go wrong because it must have deep access into computer systems to rebuff malicious attacks.”
“Not all updates happen automatically, and computer attacks often occur because people or businesses are slow to adopt patches sent by software companies to fix vulnerabilities—in essence, failing to take the medicine the doctors prescribe. In this case, the medicine itself hurt the patients.”
Bottom line, I understand being challenged, and sometimes vexed by the magnitude of the coming Software integration and resiliency challenges. As I said on X/Twitter a few days ago, YES, this stuff WILL TAKE TIME.
But I do believe it’s ultimately manageable by the global technology community. We just need to go into this AI Tech Wave on this issue as well as many others. Stay tuned.
(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)
Main issue was why was this pushed out on a Friday AM. In my organization we push out releases on weekends in stages.