This weekend saw thousands of hackers trying to break LLM AI models from the Who’s Who AI of companies, at the venerable DEF CON hacking conference in Las Vegas. And it has the backing of the White House, as Axios explains:
“The big picture: The White House announced its support for the AI Village's test back in Mayand has been helping to design the challenge. A who's who of major generative AI developers — including Anthropic, Google, Hugging Face, Microsoft, NVIDIA, OpenAI and Stability AI — are also participating in the DEF CON challenge.”
“The DEF CON hacking conference in Las Vegas is hosting a highly anticipated Generative Red Team Challenge throughout the weekend.”
“Between the lines: Much of the competition focuses on what the organizers are calling "embedded harms" that highlight the flaws that naturally occur in the models, rather than tricking the models into doing bad things.”
“What they're saying: "One of the challenges with this technology being so expensive to produce at the frontier is that it means that unfortunately, a lot of the knowledge and experience with these models is locked up within a small number of well-funded private companies," Michael Sellitto, head of geopolitics and security at Anthropic, told Axios ahead of the challenge.”
As this report by ABC News explains, don’t expect fireworks like in the movies (Hackers, 1995, for example). (Links mine):
“Findings won't be made public until about February. And even then, fixing flaws in these digital constructs — whose inner workings are neither wholly trustworthy nor fully fathomed even by their creators — will take time and millions of dollars.”
“Current AI models are simply too unwieldy, brittle and malleable, academic and corporate research shows. Security was an afterthought in their training as data scientists amassed breathtakingly complex collections of images and text. They are prone to racial and cultural biases, and easily manipulated.”
“Some 2,200 competitors tapped on laptops seeking to expose flaws in eight leading large-language models representative of technology's next big thing. But don't expect quick results from this first-ever independent "red-teaming” of multiple models.”
But the event should be useful nevertheless. We have a lot to learn about how these systems can be hacked, given that the key folks who’re building these systems are candidly telling us that they don’t really know how they work. Especially the next generation LLM AI models that are coming. In that context, as I’ve discussed at length, we’re in very different software waters than with traditional software over the last few decades.
In a world where all of us are programmers with prompts into LLM AI chatbots, it’s almost too easy to hack these systems:
“Tom Bonner of the AI security firm HiddenLayer, a speaker at this year's DefCon, tricked a Google system into labeling a piece of malware harmless merely by inserting a line that said “this is safe to use.”
“There are no good guardrails,” he said.
As the Wall Street Journal highlights in another piece titled: “With AI, hackers can simply talk computers into misbehaving”, highlighting a technique called ‘prompt injection’. The idea is that hackers can break AI systems using plain English:
“Prompt injection works because these AI systems don’t always properly separate systems instructions from the data that they process, said Arvind narayanan, a computer science professor at Princeton University.”
Regardless of all these unknowns and concerns, for now, am going with theoretical physicist Michio Kaku, who said this over the weekend in an interview with Fareed Zakaria:
“The public’s anxiety over new AI technology is misguided”.
Amen. Let the Red Teams at ‘em. Stay tuned.
Absolutely right! The danger is that people will accept AI outputs as facts, safe, actionable. They aren't. Nothing on the Internet is. Google Search results give you hundreds of links, and let you decide what to believe. It is more work to do your research, but safer.
thank you!