This week I spent a fair bit of time highlighting how the AI Tech Value Stack was similar, but at the same time, very different from the PC and Internet waves that created trillions in value for investors private and public over the last four plus decades.
We showed how this current AI iteration builds upon the work done by Jeff Bezos’ Amazon Flywheel. And how one could envision the poster-child of LLM AI today, OpenAI, leveraging the same.
And then highlighted the flywheel loops this time also uniquely leverage what are called the Reinforcement Learning and Feedback Loops, based on human and AI interactions (RLHF and RLAIF).
But, there is one more ingredient that makes the AI stack very different from the PC and Internet stacks.
That of course, is Data. It’s Box 4 in the AI Tech Value stack below.
Data is the foundation upon which Foundation LLM AI models are built by OpenAI, Google, Meta, and a growing number of companies worldwide.
Let’s zoom into Box 4 to understand the components at a high level. Chart 2 below shows the Data box at a process level, while Chart 3 shows some representative sources, and companies that many Foundation LLM AI models use.
The Data, once ‘extracted’, is then processed and optimized for use by users and businesses. This area is particularly seeing massive innovation and will be a driver for the ultimate utility and reliability of AI.
And Data of course today is more ubiquitous, varied, and voluminous, owing to our increasingly digital lifestyles and the proliferation of billions of connected devices.
The key point to note here is that while most LLM AI companies start with similar data sources, the way they process and ‘learn’ from this data can vary widely. It's not just about having a lot of data, but also about having the right models and processing algorithms to make sense of that data.
Each platform uses different techniques and approaches for processing the data and generating insights from it. Some may prioritize certain types of data over others, some may use different reinforcement learning algorithms, and some may have different goals and objectives that influence their data processing strategies.
Therefore, the value derived from the data doesn't come just from the data itself, but from the processes and systems used to analyze and learn from it, using user query feedback loops.
And contrary to the current view that all the logical Data sources online may have been tapped already, my contention is that today’s online data troves are but ‘the tip of the tip of the iceberg’. And far from being a ‘Data Moat’ for the current crop of LLM AI leaders.
This is a key area we will be exploring much more in posts to come. Stay tuned.