Ye Cao

Day 0 for Agentic AI

The agent opportunity is much bigger than we thought.

Today is day 0— not even day 1.

If you see Cursor and Manus, these companies are building the infrastructure
for agents. The difference is that Cursor is starting from the local environment,
while Manus is starting from the cloud environment.

If you look at agent companies, most of them today don't have APIs.

The reason for this is that agents have to live inside the OS to be useful. But
eventually there are ways around this: open source some infrastructure
(according to Manus), or go cloud (Cursor announced today that their next step
will be to go to cloud, and I think they will release an API after that).

These agentic companies might actually even beat foundational model
companies like OpenAI. The reason is that product form changes too quickly:
ChatGPT today cannot do agentic tasks as well as these native agentic
products which live in the OS. So by the time ChatGPT distribution gets better,
the product form has already changed to something else that a ChatGPT UI
cannot sufficiently support.

Conviction is the fuel for progress

Conviction is the most critical ingredient for turning ideas into reality. The
greatest breakthroughs occur not by simply assembling intelligent individuals
but by cultivating strong, unwavering conviction.

I believe that a person with conviction can achieve almost anything physically
possible. However, the issue arises when we hold too firmly to beliefs about
what is possible versus impossible, considering we only know relatively little about
the true laws of physics and nature of reality, thereby narrowing the scope of our
potential achievements.

Over time, those with a broader sense of what might be possible are the ones
who ultimately accomplish extraordinary deeds. This explains why smart
people often fail to achieve significant breakthroughs, especially when they lack
openness toward possibilities.

Believing that "something is impossible" can be detrimental. This stems from a
fundamental truth about how the world functions: genuine value and significant
achievements arise from steadfastly believing in something that others dismiss
as impossible.

Rejecting new ideas outright might make you correct most of the time, given
that genuinely valuable outcomes are rare. However, consistently denying
possibilities earns no meaningful progress, even when you're right.

AI Companies by Organizational Structure

Today's AI company structures fall into four broad archetypes. Each archetype
differs in how it mixes research, engineering, product, and distribution, and in
the moats it tries to build. Below is a high‑level map of those categories.

Foundation‑model leaders such as OpenAI, Anthropic, DeepMind, and
DeepSeek are led by founders who treat models like physics systems and
set ambitions decades out. OpenAI uses a capital‑ and startup‑style sprint,
while Anthropic, DeepMind remain research‑led.

They shape the global foundation‑model road‑map; later companies cluster
around their ecosystems. Their org charts braid research, engineering, and
product, with researcher and engineer density forming the deepest moat.

Product‑first players like Perplexity and Cursor placed early, accurate bets
on product form. Their headcount skews toward design, engineering, and
distribution, though they keep talented researchers in‑house, even if model
R&D has yet to pay off.

They compete with category 1 but rely on upstream model upgrades outside
their control, leaving them reactive. Meanwhile category 1 now offers a
$20/month platform bundle, putting Perplexity‑style firms on the defensive in
distribution.

Vertical (or some horizontal) 2B companies—Spur, Extend, and others—
anchor on domain knowledge in healthcare, finance, law, etc. Headcount
tilts toward product and sales; the product may be an LLM wrapper, but
lasting distribution, not wrapping, is the key.

Their moat is founders’ domain insight plus launch, iteration, and distribution
speed. Big‑tech rivalry is light because giants ignore smaller niches. The
mission is to validate demand quickly and ship fast. Researchers are optional,
useful mainly for road‑map foresight.

2B model‑innovation companies are rarer. Their core team is researchers
and engineers who deliver genuine breakthroughs—for example, enterprise
fine‑tuning shops. They build atop open‑source backbones like Llama,
DeepSeek, or soon GPT‑4.1 to serve clients.

The risk is betting on the wrong branch: if an incumbent lifts the bottleneck the
niche disappears. The challenge is packaging tech into products that existing
2B firms can adopt; when those firms already ship, displacement is hard, but a
right bet yields a moat.

In the end distribution is both the hardest and most vital piece. Products,
models, and papers are only means to enable durable distribution. Even a
brilliant product exists to reach users more easily; if distribution stalls, the
company soon falters.

A New Programming Paradigm

Compute is the fuel of system evolution, and data plus algorithm provide the
direction.

Language models might mark the beginning of a new programming paradigm.
Traditional programming tightly couples code and system behavior, allowing us
to directly infer system decisions by examining the code. However, language
models differ greatly.

Instead, we use benchmarks to roughly judge and understand the model's
behavior. The intelligence and actions of these models are largely independent
from their code, emerging naturally during the training process. I believe this
approach to system programming may represent the early stages of a
fundamentally new programming paradigm, where systems evolve
autonomously.

In this paradigm, we provide only high-level signals and metrics to align the
system broadly with human intentions. Even these high-level directions should
ideally be simple, granting the system greater flexibility to evolve independently.
Consider the future of games as an example. Game settings, environments,
physics laws, and even the purpose of the game itself might no longer be
explicitly defined in the game's programming logic, but rather exist in an
evolved state.

Computational power may become the essential resource fueling this evolution.
Today's language models may thus represent just the start of a new programming
approach, with future GPU resources or other methods continuing to drive
system evolution.

Prompt

Today, many prompts are more valuable than code. Prompts remain relatively
stable, while the quality of code generated improves with model enhancements.
There should be something akin to a "git commit history" for prompts, allowing
us to study their iterations and understand which prompts better activate the
model's abilities.

Good prompts can unlock the model's inherent intelligence, enabling the
generation of higher-quality code. I'm curious whether, in the future, as models
become more empathetic, they might reduce the importance of a user's
prompt-crafting skills.

Minecraft

Minecraft has several interesting features. First, the game doesn't
explicitly provide rules or instructions; players must explore and
discover the laws of physics on their own. These physics laws
resemble, but are not identical to, those of our real world.

Examples include gravity and contact forces, which makes Minecraft
very similar to reality. Second, the game provides a series of
building blocks that players must assemble themselves. The entire
gameplay is essentially a scaling-up process.

This aspect mirrors nature and reality closely, as both provide
foundational elements that can scale effectively. Recently, I've
had some reflections on machine learning, particularly regarding
the emergent intelligence of large language models (LLMs).

It is truly remarkable that the intelligence of LLMs evolved
naturally. During their self-training process, these models
spontaneously developed self-checking, error-correction,
optimization, and human-like communication and empathetic abilities.

These naturally evolved behaviors indicate that the best scaling
happens when laying a solid foundation and then allowing things to
evolve naturally along a generally intended direction. This scaling
principle appears to be adopted by both nature and reality.

Why AI Companies Love Releasing Benchmarks First

I think it's because benchmarks serve as roadmaps. Publishing benchmarks
signals the direction of future development to the community and ecosystem.
As long as the direction is correct, the ecosystem will naturally align itself
accordingly.

Then, the company only needs to excel in the conditions defined by the
benchmark. If the benchmark itself is set incorrectly, the direction becomes
flawed. Disruptive companies often change the way success is measured; they
prioritize differently from mainstream ones.

For example, what truly disrupted AI with generative models? Traditional AI
emphasized accuracy in classification tasks—essentially multiple-choice
questions. However, generative AI required AI to produce much broader
outputs, something nobody initially considered.

This meant that autoregressive models, which didn't seem important at the
time, became central. To recognize the value of autoregressive models, one
must first value AI's horizontal use cases or generalization capabilities.

In essence, different visions lead to different benchmarks, and the perceived
importance of certain directions often comes down to taste and overall strategic
perspective.

Trust fate and also algorithm (相信缘分更要相信算法)

In today's world, algorithms truly connect people in beautiful ways. What
algorithms actually do is solve the bandwidth problem for human-to-human
communication. We are very inefficient at communication, so it takes us lots of
time to learn about others.

Face-to-face conversations are time-consuming, but algorithms allow us to
build an information highway among people. They do this by interacting with
humans through certain interfaces, learning, and creating digital
representations of users.

These digital representations can travel and communicate with other digital
representations at the speed of light. This is why recommendation algorithms
are amazingly efficient at either broadcasting our messages or showing us
information we'd miss otherwise.

I believe we will trust our future with algorithms even more. Today's algorithms
are still far from being perfect.

Interface of intelligent application

One thing I found about intelligent LLM applications is that Intelligence primarily
resides not in the application interface itself, but within the model. The role of the
application interface is to change the way humans and intelligence interface.

Currently, it's more important for interface design to be native than being smart.
By native, I mean the intelligence should be closer to the operating system, ideally

native inside the operating system.

The design of interface with powerful AI should be the primary concern at the
application level. What makes Cursor successful is that the interface utilizes
intelligent agent to reduce human's barrier to understand and write code. Essentially,
this is achieved through the innovation of interface. For example, Cursor, being an
app rather than a browser-based tool, can directly manipulate the underlying OS.

The fact that interface being native allows intelligent model to better operate
the underlying operating system. Whereas Smartness of interface is less important
because artificially set smartness in the interface could hinder the model's inherent
capabilities. This is also aligned with 'Bitter lesson' that long term only most native UI
would scale well with increasingly capable intelligence.

Vision and Technology

Technology allows seeing the future. The difference between good companies
and great ones is that one knows where to double, triple, and quadruple the bet,
whereas a good company only sort of bets everywhere and with hesitation.

The reason the great company can do this is because of its unique vision and strong
conviction on a certain aspect of technology. One example is that MoE, routing experts,
and sharing experts were long discovered, but few companies in the industry realized
its value until Deepseek tripled its bet on it.

The greatest moat for a technology company is the unique vision of the direction of future
technology. It's very hard for other copycat companies to lead without having the right
vision. OpenAI took down Google because it saw the greatness of transformer architecture,
GPT-1, among many other great technology changes.

These changes allowed OpenAI to see a completely distinct vision for the future, while
Google focused on search. Having the right vision for technology is the single greatest
force of innovation for a tech company.

From a business perspective, technology causes changes in a market almost all
the time. Every day, there's some subtle shift in technology that would impact the future of
a business. If one can see these trends consistently leading to a different future, then one
has the chance to capture the opportunity before others who don't understand the
technology. Vision will be the most important differentiator in the future of business.