Ye Cao

The Future with Super AI: Technical Vision and Non-Consensus

Here's my one-line thesis: as AI becomes more capable, the ability
to understand, formulate, and articulate vision becomes the
differentiating human skill. Technical know-how depreciates the most.

Technical Cognition vs. Technical Skill

There's an important distinction between technical skill and
technical cognition. Skill is knowing how to do something—the
specific mechanics of execution. Cognition is knowing what's
possible and how to adapt—the meta-level understanding of systems.

Technical skill is likely the most depreciating asset in the AI
era. The ability to write code, configure systems, or execute
processes will increasingly be delegated to AI. But technical
cognition—understanding how systems work at a fundamental level,
seeing what's possible, knowing when and where to apply leverage—
this remains valuable. Perhaps more valuable than ever.

In the near future, everyone becomes a CEO and product manager.
Each person commands an army of expert advisors, strategists, and
engineers—all AI. But seeing the vision? That still requires your
own judgment.

Claude Code as an Example

Consider Claude Code. This tool fundamentally changes what's
possible for someone with the right cognition. It lets you operate
across multiple layers of the stack—especially hardware—without
needing the specific skills that previously gatekept access.

Claude Code removes technical skill as the bottleneck. What
remains? Imagination. First-principles thinking. The ability to see
what factors make something possible, then direct AI to execute.

If you can't see it, you can't do it.
If you can see it, you can do it.

The manifestation of cognition is the awareness to do something.
Some people "see" that Claude Code can help with a particular
problem. Others don't. This gap in technical cognition may be
unbridgeable—unless AI moves from the passenger seat to the
driver's seat, at which point the dynamic changes entirely.

Why Deep System Understanding Still Matters

I've been exploring kernel development lately. Not particularly
skilled at it yet, but it's fascinating. One observation: the
deeper you go in the stack, the harder it gets—and the more it
reflects accumulated expertise. This echoes Zhang Yiming's view
that difficulty at lower layers represents genuine moats.

But there's another reason to go deep: stripping away abstraction
lets you see more clearly and hack systems more effectively. You
might also see things others don't—non-consensus insights that
emerge from understanding what's actually happening beneath the
abstractions.

Deep system understanding creates the possibility of seeing
futures that others cannot see. This is the source of non-consensus.

Two Futures: Consensus or Non-Consensus?

I've been thinking about what happens when super AI arrives. Two
scenarios seem possible:

The consensus future: Super intelligence means everyone has
access to equivalent reasoning power. Given the same intelligence
applied to the same problems, everyone reaches the same conclusions.
Disagreement becomes irrational. Prediction converges.

The non-consensus future: Even with maximum intelligence,
prediction has fundamental limits. Uncertainty persists. Different
visions remain possible because the future is genuinely open.
Non-consensus continues to exist—and continues to matter.

I hope for the second future. A world where non-consensus still
exists is a world where individual vision, judgment, and courage
still matter. Where seeing something others don't can still create
value. That seems like a more interesting future to me.

Artificial Specialized Intelligence

Recently I read Shunyu Yao's "The Second Half"
and listened to Ilya Sutskever's conversation with Dwarkesh.
One insight from Ilya stood out: AGI as a term is misleading—it was
coined as the opposite of "narrow AI," but the real end game for AI
to create economic value is probably specialist AI across important
domains: drug design, chip design, and so on.

I first encountered this idea of specialization from DeepSeek founder
Liang Wenfeng three months ago.
He argued that AI's end game is specialization across every industry
and field. This was shocking to me—I had always assumed AI meant a
single model that rules all. But increasingly, the opposite seems
true: model specialization has a significant advantage in market
competition.

Consider: Claude, which excels at coding, is often preferred over
ChatGPT, which is more horizontal in its use cases. Or look at
Thinking Machine Lab—they don't even train foundation models, just
focus on RL-based specialization.

Here's what I think is happening: pretraining foundation models
provides a good "prior"—the model learns the basics of the world and
can communicate with humans to exchange signals and improve. From
this prior, RL enables the model to interact with environments and
specialize.

What I find fascinating: AI's second half isn't about building
models. It's about finding direction and measuring progress—work
that resembles product management more than research. The focus
shifts from solving problems to defining the right problems.

I won't debate what AGI means. But here's my take: in some sense,
AGI already exists. Current models can do many things generally, but
not well—because they can't specialize. AGI is just a prior. The
biggest gap between current AI and humans is that humans can spend
years going deep in one domain. AI can't yet.

RL is likely the path forward. What creates real economic value
isn't AGI—it's ASI. AGI can only replace labor that doesn't require
specialization. ASI can genuinely become and replace experts.

General capability is the foundation. Specialized expertise is the
product.

Interface Via the Lens of AI System

This week OpenAI released ChatGPT Agent. Although at first glance,
I thought they copied Manus on 80% of the product form, I did learn
something interesting and novel.

One shocking insight I learned from their demo is that combining the
Operator with the Reasoning model is not only a better product for the
user, but it’s also collectively a more capable system, overall better on
all major hard benchmarks. The reason is that the previous system, where O3
and Operator are separate, had some fundamental bottlenecks: O3 cannot
use a GUI, and Operator cannot reason and call tools. So they were simply
incomplete. And right now, intuitively it feels like you assemble the
skeletons of two powerful limbs, and then you use RL to electrify them to
be one entity to accomplish what previously neither could’ve accomplished.

I believe that this trend of using RL to train a more holistic system that
includes more components to do exponentially more complex tasks will
continue in the future and push the system to new boundaries.

From a product perspective, what I increasingly feel is that more often I
am finding myself using powerful AI systems to interface with the world:
the operating system, browser, applications, and maybe eventually some
parts of the physical world. When interacting with more complex systems,
AI systems do feel more and more like exoskeletons for humans, helping us
break free from our biological limitations—most obviously by reducing
complexity and speeding up the interface with the digital world.

For example, when I was using the ChatGPT Agent to interface with a public
Google Doc, I could feel that it is easier for this system to navigate the
doc on my behalf, especially when the doc is complex, and it's a totally
different level of experience. Increasingly, I feel like we will see the
world more comprehensively through the lens of these powerful AI systems.
And these systems scale really well: ChatGPT Agent marks the beginning of
a new paradigm where AI systems—including more components and more tools—
will be able to achieve more. And some of the more complex tools, with
high learning curves that humans probably cannot even use very well now,
can be commanded via the intermediary of an AI system.

Nature of intelligence: Building Blocks

Nature of intelligence seems to be about building blocks: smaller
building blocks, less intelligent entities, will unlock bigger building
blocks, more intelligent entities, rather than following some law of
conservation of intelligence that states you cannot get more intelligent
things out of less intelligent things.

There is several empirical evidence we see this: first of all, we can
use a relatively dumb model to label a dataset, and then train a
smarter model using this dataset.

We see this in how DeepSeek trained R1-Zero out of V3, and then used
R1-Zero to generate cold start reasoning data, and then used the
cold-start reasoning data to further train DeepSeek-R1. Maybe, we
can argue that this process just changed how the model behaves,
i.e. the model reasons more often, rather than how smart the model is.
But still, the outcome is good: the model unlocked reasoning capability
and can achieve a lot more.

Another empirical thing we see is in multi-modality. Once we train a
dumb text-image model, it can actually be used to do a variety of
tasks like labeling datasets, filtering datasets, etc. The reason it works
is that although this dumb model cannot generate good enough
pictures, it actually does really well on these easier tasks: labeling
datasets, filtering datasets. And doing well on these easier tasks
enables the model to further build the next better model. And by the
same token, a decent text-image generation will be used to create an
even better model.

Now why does this work at all? What seems to happen is that these
capabilities are based on some kind of intelligence threshold. And reaching
a threshold is actually all that matters! Essentially, the key is that there are
different capabilities, and to unlock each capability requires a
different level of intelligence. Maybe parameter size is a good
indicator of the potential the model can reach. And once a model
reaches the threshold for a capability, it doesn’t really matter how
dumb the model is at other things, it can still be as useful as a super
smart model on that particular thing it’s good at.

A dumb model actually reaches some intelligence threshold that
unlocks its certain capability, such as labeling a dataset as positive
versus negative. And because it reached that threshold, we can now
use it to serve as a building block for a more intelligent model by
using its labeled dataset to make the next model better. So intuitively,
this feels like intelligence is like Legos, where bigger ones are built
out of smaller pieces.

A Future of Open Source

I am a big believer in the future of Open Source. I believe a lot
more useful software will be open sourced in near term.
And a lot more non-technical people, especially designers, will
start to contribute more to open source. The biggest change for
the above is the fact that coding agents are now so good and
powerful at bringing ideas to life and will continue improving.

This has two important implications: first, closed-source companies
should reconsider if their strategy of staying closed-source is beneficial
over the long term, because there will very likely be an open-source
competitor that people find more appealing. Second, the long-term
price of many software businesses will be infinitely close to infrastructure
costs such as cloud hosting and data storage.

Let's discuss the first implication. Open-source software has a
huge appealing for many people, especially people with their own
design taste. Being able to control, view, and customize the underlying
code is a significant attraction. Giving customer ownership helps
companies gain early adoption and traction.

For example, take the SaaS product for meeting-booking, which can charge
$10 per month. One can argue that the primary value by the company is their
engineers' ability to assemble lines of code into a functional, beautiful
product. Most value previously comes from this 'assembly process.'
Now, however, the drop in the cost of intelligence fundamentally makes code
assembly much easier.

Soon, the price for customers to use such a product will approach
the raw material and infrastructure cost—e.g. essentially database and
website hosting expenses. There might still be economies of scale
that attract customers to certain SaaS products, but most of today's
SaaS software prices are significantly higher than the component raw materials.
Rationally, instead of subscribing to the product, I can build my own in about 30
minutes using a coding agent. My monthly maintenance cost would likely be
close to $1, compared to $10—or even down to $0 in some cases.

Of course, there may still be value for SaaS companies, especially
where the builders have exceptional taste. This taste and vision
represent one of the main contributions and moats in a world full
of coding agents. Companies with great taste can continue to
innovate and provide additional benefits to users.

However, in the very long term, most products and services will
likely become very cheap. Thus, I believe SaaS—or any product—will
shift toward something resembling political or religious campaigns,
where the ultimate goal is not monetary gain but spreading an
ideology or vision. The best companies themselves become expressions
or manifestos. It's easy to imagine this vision taken to extremes,
where the joy lies simply in allowing more people to use and enjoy
what one has created.

Paradigm shift of human management

A recent and fascinating observation is that AI has changed the
way many things are managed, especially due to shifts in operating
systems, making what used to be a complex task remarkably simple. The
most intuitive example is from the Star Trek movies, where a single
person can pilot a high-tech spaceship alone when needed. This made me
realize that, in the future, it will become normal for one person with
an AI to manage or operate extremely complex organizations or systems.

The traditional management model is distributed. For example, in war,
the supreme commander indirectly manages subordinate commanders, who
then manage their own subordinates. This distributed approach exists
because of human cognitive limitations—no single person can directly
manage a large organization. But AI will fundamentally change this. I
can easily imagine AI becoming an exoskeleton for individuals at the
management level, enabling one person to manage or operate a vast
organization or system at once. So, the distributed model of human
organizations may be replaced by this new paradigm.

Another example is the idea of the one-person billion-dollar company.
AI drives this new organizational model, making smaller teams more
efficient, while large organizations may struggle to leverage AI as
effectively. This technology essentially empowers individuals to an
extraordinary degree, but at the same time, it makes power easier to
centralize. These are changes happening at the management level, but in
technology we see similar shifts. An individual, aided by AI, can now
interact with different operating systems in powerful new ways. AI
serves as a cognitive exoskeleton for humans, allowing complex systems
to be manipulated by a single person.

Examples include MCP, codebases, and even multimodal systems. All of
these ultimately allow people to more easily manipulate and understand
systems across different modalities.

AI complete infrastructure

The idea is to let AI take control of more resources and infrastructure,
because this allows for better scaling. The assumption is that AI's
capability will continue to rise as compute increases. According to this
trend, in two to three years, most of the fundamental infrastructure,
hardware, and company layout will be run by AI.

This trend has already started. For example, Meta's ads automation and
Axiom AI's quant solver. Jason has already mentioned this trend. Even
smaller platforms like Cursor, which controls the user terminal, fit this
pattern. The reason is simple: once AI controls these core settings, it
can create what we call "AI complete," meaning a fully closed AI loop.

This logic is similar to Anthropic's constitutional AI. The core is that
once AI controls a stage of production, that stage can be easily
reproduced and will generate feedback for AI to improve itself. This
allows AI's abilities and the scale of its influence to grow together.

This idea also matches the "bitter lesson"—focusing on letting AI merge
into core infrastructure quickly will create a new paradigm for greater
AI impact. This new paradigm will most likely emerge in an agentic form.

LLM and Policy Makers

Today's logic of AI development is actually very similar to that of policy
makers. Many of the ideas are borrowed from system engineering. For example,
policy makers need to consider how to make a system better, which means they
cannot simply optimize local parts in isolation. At the same time, the way
policy makers think is about making as few changes as possible, and as simple
changes as possible, to make the system more optimized. Policy makers also think
about how to make the broader environment suitable for an ecosystem to thrive.

This is very similar to building a solid foundational environment for LLMs,
allowing LLMs to succeed on their own. Why is this the case? Perhaps it's
because of the "bitter lesson": to make model capabilities scale well, you
can't do too much hand-tuning. This idea is very similar to what Jason posted
today.

In the future, the most important thing for domain experts is to build the
infrastructure that allows LLMs to fully demonstrate their potential. Once this
is achieved, each iteration of the model can effectively scale with the LLM's
underlying compute and search abilities.

Artificial General Adaptation

In my view, the most impressive aspect of AI lies not only in its
intelligence, but in its adaptation. I believe that today's LLMs have
opened up a new compute programming paradigm. A system can evolve
certain specific behaviors purely through computational power—this is a
very general paradigm. And LLMs might be just the beginning of this
paradigm.

As long as we can provide direction to the system (for example,
through rewards), the system can evolve into the behaviors we desire.
The earlier paradigm of RL self-play gaming bots is also the same:
through self-play, the system evolves very strong gaming abilities.
This is a paradigm that closely resembles biological adaptation, but it
can be greatly accelerated through computation.

Flexibility seems to be a crucial part of adaptation. The essence of
flexibility is formlessness—there is no persistent, fixed form. There
may be a broad framework, but it is only used as a direction for the
system to evolve and fill in the details on its own. Today I saw a post
where an RL robot using vision to see the screen, play a game,
receive feedback, and gradually 'learn' how to operate a game controller.
The most striking part here is seeing the similarity between biological
brains and digital brains: both look at the screen, slowly understand,
and learn.

These are the elements needed for evolution: flexibility (digital
neurons), and feedback (reward signals). A flexible system can
gradually evolve. The most incredible thing about humans as biological
beings is our plasticity—once biological evolution has plasticity, it
can iteratively improve itself, and the neurons in the brain can
slowly change. Machines are now at a similar inflection point:
digital neurons are evolving in a certain way, and thus can
spontaneously develop complex patterns. Many of these complex patterns,
like consciousness and self-awareness, are indescribable even by the
system itself in terms of how they are produced.

Machines will continue along this paradigm, becoming even more
flexible, and this will enable even greater evolutionary potential.
The rate of change for machines can be much greater than that of
humans, because machines are natural geniuses at information
processing.

Paradigm shift of human capability

I believe that in the future, the main paradigm for intellectual
achievement will be through individuals using scaled compute, rather
than individuals relying solely on their own intelligence, at least
for many tasks. This new paradigm scales well: through model
enhancement, increased compute power, and decreasing model costs,
these factors together will allow this new paradigm to become the
dominant force.

As a result, quickly adapting to this new paradigm today is very
valuable, enabling individuals to achieve more as their capabilities
scale with compute. If a person only relies on their own intelligence,
they cannot fully leverage this paradigm. Therefore, one important
criterion for evaluating engineers today should be whether their
capabilities and achievements scale with this new paradigm. And I
believe that, all things being equal, certain abilities—especially
imagination—can be scaled much better through this paradigm.

For example, I have some imaginative and interesting ideas, and I can
feel that as this paradigm improves, these ideas become much easier to
realize. However, some other abilities, such as detailed operational
skills for specific tasks, do not seem to scale as well with this
paradigm.

We can already see that different engineers use models in very different ways
during the process of model scaling, and the amount of help they get varies
greatly. However, imagination, critical thinking, and communication skills—all of
these abilities seem to be able to scale alongside the paradigm very well. So,
having these skills may become more important, because they can keep up with
the scaling of the paradigm. Other abilities, which cannot scale with the
paradigm, may gradually become less important—especially the mechanical
memorization of how to perform intellectual tasks.

Another interesting aspect is that certain capabilities actually become more
valuable when they are abundant—for example, imagination. Whereas other
capabilities become less valuable when they are more common—for example,
logical thinking. The reason for this, I think, is because creative works can
actually stimulate and inspire others to be even more creative. Whereas logical
reasoning generally only requires one person getting it right one time. So to me,
creativity seems to be more like a non-zero-sum game than logical reasoning.
We see this in cultural movements as well; cultural movements generally happen
in an explosion, where a lot of artists are mutually inspired by each other.