AI: OpenAI et al Baby Steps for AI Controlling our Screens. RTZ #613
...the real mainstream AI applications/services to come will be surprisingly different and useful
The next thing that most LLM AI companies are racing to do with their AI Reasoning and Agent products (aka ‘experiments’), on their way to AGI, in this AI Tech Wave, is teaching AI to ‘agentically’ control our screens.
First our desktops, and soon smartphones and beyond. The idea is that AI watches what we do on our screens and learn to do it themselves. Then do things for us like when we are doing something else. Current examples are relatively banal things ike ‘buy tickets’ or ‘stuff’ for us online. Currently, they’re novelty demos, like teaching a horse to tap three times with its hoof for ‘three’.
OpenAI rolled our the latest version of this, dubbed ‘Operator’, taking their place alongside other experiments by Anthropic, Google and others.
Axios explains this in “OpenAI's Operator agent clicks, types and buys for you”:
“OpenAI released a "research preview" of its first agent Thursday — Operator, a tool that can do web tasks for you.”
“Why it matters: With 2025 shaping up to be the year of the AI agent, AI firms are racing to free AI from the chat box and set it loose in the world. For now, that world is going to be the web browser.”
I’ve talked about the AI world beyond the browser to come of course, and these are experiments for AI to first how to use today’s web browsers.
Despite the grander, long term promise of AI espoused by both OpenAI, Anthropic and others, these are relatively basic, early steps:
“The big picture: AI optimists' manifestos promise the new tech will save lives, reimagine education and make us more productive, but this new release only showed us slightly easier ways to buy concert tickets and order grocery deliveries.”
“All the demos involved tasks that are currently easy to do without an AI agent — and even without AI.”
So here’s the nitty gritty of what they do thus far:
“How it works: Users can tell Operator to fill out forms, order products, make reservations and more. It opens its own browser and starts clicking and typing while you watch.”
“In a blog post introducing the product, the company said Operator will "help people save time on everyday tasks while opening up new engagement opportunities for businesses."
“In Thursday's demo, CEO Sam Altman and three colleagues prompted the AI to make dinner reservations, buy tickets and order from Instacart.”
“After typing the request, the four men simply watched as Operator did its thing.”
“In real life, you might use that time to do something else.”
Again, underwhelming for now, and expensive today due to the variable costs of running these agents at ‘test time’ and inference processing:
“But the time you save — especially when you consider how much time you may have to spend supervising the agent to fill in passwords or make sure it doesn't run wild with your credit card — is pretty minimal.”
“Operator is only available for now to $200/month Pro subscribers in the U.S. OpenAI says it will roll out more widely in coming weeks and months.”
And a review of course of other similar AI ‘operators’ by peers:
“State of play: Other AI companies have released similar tools designed to complete tasks on your behalf, but theirs were more focused on productivity.”
“Anthropic released its "computer use" feature in October that collects data from the web and adds it to spreadsheets, among other features.”
“In December, Google released Project Mariner, a tool that automatically completes tasks in Chrome.”
The technical challenges and innovations behind these early experiments are indeed impressibe and notable:
“Between the lines: OpenAI views Operator as the first baby steps toward loosing its AI on the world beyond ChatGPT's window.”
"Teaching the model how to use the same basic interface we use on a daily basis ... unlocks a whole new range of software to use that was previously inaccessible," OpenAI's Reiichiro Nakano explained in the demo.That's what the core research project is about. It's about removing one more bottleneck on our path toward AGI and letting our agent move around and act in the digital world."
And of course to do it safely as well, since these agent/operators will be our ‘ambassadors’ browsing the world and beyond:
“Threat level: There's always a tradeoff between security and convenience.”
“OpenAI says it's releasing Operator slowly for safety, despite the "rigorous safety measures" already outlined in its System Card.”
“We already let browsers store passwords and other personal information, but letting agents act on our behalf introduces all sorts of new risks.”
“IT teams might not be prepared to secure agents if employees use them, per Axios cybersecurity reporter Sam Sabin.”
But the underlying capabilities being developed through these experiments on the way to discovering true ‘AI product market fit’ (PMF) is likely going to be surprisingly different.
It’s kind of the same thing that’s going on with every AI and non-AI software company rushing to offer to ‘summarize’ everything for us. From documents we produce, to the notifications that increasingly clutter our screens.
But for most of us it’s not ‘summarization’ we want. What most people likely want is ‘expansion’ of the content on the screen, to highlight the key items on a web page or a document (pdf or otherwise). That expands on things mainstream users need to understand from the points being made. Preferably with stuff from things they’ve worked on, done, or looked at in general. That AI Expansion is likely to be developed sooner than later.
Same thing goes for these baby AI steps on ‘controlling’ (ie, using) our screens to do what we already do fairly well by ourselves. What we most likely will end up wanting from these capabilities is for AI to do stuff that augments the stuff we do fairly well with stuff most of us don’t know we should do, or don’t want to master the skills to do. For that, AI will need to be developed to use far more than our web browsers.
Right now these baby efforts are cool AI ‘Science Projects’. But worth our attention because they point to cooler, mainstream useful things to come as this AI Tech Wave rolls on. Stay tuned.
(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)
‘AI Expansion’ …. Love it.