AI: OpenAI & Anthropic build AI Agentic reasoning tools. RTZ #518

...early days, but accelerating progress

Oct 23, 2024

This year going into next continues to see the leading LLM AI companies ramping up their ‘Level 2 AI Reasoning’ and ‘Level 3 AI Agent’ capabilities this AI Tech Wave. It’s something discussed at length here, particularly in the context of OpenAI’s roadmap to AGI. And as discussed before, there is a TON of work to be done on both levels, much of it concurrently.

OpenAI used to call its AI Reasoning product o1 (in Preview and Mini versions), ‘Strawberry’. I discussed OpenAI’s recent ‘Swarm APIs’ to build an ‘agent orchestration intelligence layer, just a few days ago.

Besides OpenAI, its partner Microsoft, Anthropic, and many other enterprise software companies like Salesforce are keenly focused on building the sofware infrastructure to make AI do reasoning and agentic work in a far more scalable, resilient, and reliable way. It’s early days, but the work to turn these capabilities from ‘Science Projects’ to commercially engineered products continue.

In particular, both OpenAI and Anthropic, along with other incumbents, and startups, are pursuing these areas to improve AI reasoning and agentic capabilities for Developers at Scale.

As the Information reports in “OpenAI, in Duel With Anthropic, Doubles Down on AI That Writes Software”:

“OpenAI’s ChatGPT has become a multibillion-dollar business in large part because programmers use it to write and check their code, fix bugs and translate code into different programming languages.”

“Now, facing competition from rival artificial intelligence startup Anthropic, OpenAI is putting more effort into improving the tools it offers for software programming. Some products or features under development aim to make it easier to use OpenAI’s AI for coding tasks inside major code-editing programs like Microsoft’s Visual Studio Code, while others aim to take on bigger software development tasks.”

These products are in development stages, and not yet released.

“Coding tasks became an early application of large language models developed by OpenAI, in part because AI-generated code can quickly be tested to see if it works or not. Microsoft’s GitHub unit used OpenAI’s LLMs to power an AI Copilot product, starting in 2021, that gives code suggestions to programmers while they type.”

“But the launch of ChatGPT in late 2022 provided a more widely accessible and free alternative that quickly gained popularity. OpenAI then convinced millions of programmers to pay for an upgraded version of ChatGPT—and get access to upgraded LLMs before GitHub Copilot did—that could respond to conversational requests for handling such tasks. Those capabilities powered what is now a subscription product on pace to generate about $3 billion annually.”

“On Tuesday, Anthropic announced new software that can use computers the way humans do to take actions on behalf of people, such as moving a cursor, clicking buttons and typing text.”

“The Anthropic software can theoretically help programmers with tasks like building a website and improving the way it looks. OpenAI has been developing a similar product, known as a computer-using agent, for months but hasn’t launched it.”

Anthropic in particular is focused on its LLM AI product Sonnet to control computers by emulating mouse controls ton screens. Techcrunch explains further in “Anthropic’s new AI model can control your PC”:

“Anthropic on Tuesday released an upgraded version of its Claude 3.5 Sonnet model that can understand and interact with any desktop app. Via a new “Computer Use” API, now in open beta, the model can imitate keystrokes, button clicks, and mouse gestures, essentially emulating a person sitting at a PC.”

“We trained Claude to see what’s happening on a screen and then use the software tools available to carry out tasks,” Anthropic wrote in a blog post shared with TechCrunch. “When a developer tasks Claude with using a piece of computer software and gives it the necessary access, Claude looks at screenshots of what’s visible to the user, then counts how many pixels vertically or horizontally it needs to move a cursor in order to click in the correct place.”

“Developers can try out Computer Use via Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertex AI platform. The new 3.5 Sonnet without Computer Use is rolling out to Claude apps, and brings various performance improvements over the outgoing 3.5 Sonnet model.”

I go into these details to outline how early and brittle these capabilities are in this AI Tech Wave. But progress is accelerating, and these efforts should bear fruit sooner than later. Whether it’s called ‘Strawberries’ or not. Stay tuned.

(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)

AI: Reset to Zero

Discussion about this post