Home on GameFly Center: Latest AI and Gaming Tech News

OpenAI Codex Practical Guide: Transform Your Coding Workflow

Mon, 18 May 2026 00:00:00 +0000

OpenAI Codex Practical Guide: Transform Your Coding Workflow

If you still think of OpenAI Codex as just a code completion tool from 2021, you are missing out!

In April 2026, Codex received a major update with the launch of GPT-5.5, desktop control capabilities, and enterprise plans, leading to over 4 million weekly developers.

Today, let’s explore how to make the most of this redefined “AI coding assistant”!

1. Installing Codex CLI Version

I dislike setting up environments, but Codex’s installation is incredibly simple.

1. Command Line Installation

You can install it with a single command in the terminal:

# npm install -g @openai/codex

# macOS users can also use:
brew install --cask codex

Authentication is straightforward: if you are already a ChatGPT Plus user, just run codex auth and scan the QR code to log in.

Your quota will follow your subscription, eliminating the hassle of managing API Keys, plus you get an additional $5 API quota.

Enterprise or high-frequency users can still use the standard OPENAI_API_KEY configuration.

After installation, type codex to start interacting with the AI assistant in the terminal.

To verify the installation:

npm install -g @openai/codex

codex --version

In PowerShell, type:

codex

You will need to log in or use an API key.

If you have a Plus account, you can log in directly.

3. Configuring CC Switch

CC Switch is a cross-platform desktop tool for one-click management/switching of Codex, Claude Code, and other AI programming tools’ API providers, automatically rewriting configurations.

Since I don’t have a Plus account, you can use a domestic API key with a local model.

Download CC Switch: CC Switch Releases

4. Adding Providers

Using Silicon-based models, Codex may not work. I tried the Bailian model, which is usable.

Environment Variable Example (Using Qwen Model)

Model Name: qwen3.6-plus

API Key URL:

https://dashscope.aliyuncs.com/compatible-mode/v1

COPILOT_API_KEY="your_bailian_API_key"

2. Codex Desktop CLI Version

1. Registering an OpenAI Account

The first step is always preparing your account. If you already have an OpenAI account, skip ahead; if not, go to the official website to set up basic permissions.

Download link: Codex Releases

2. Special Redemption for Intel Macs: Old Machines Can Run Too

If you are using M1/M2/M3 series chips, congratulations! Just double-click the DMG to install it directly.

However, Intel Mac users should note that the official DMG is not compatible by default! No worries, the GitHub community has a solution.

Visit the open-source project: Codex Intel Mac

Follow the instructions to run the repackaging script, and in a few minutes, you will generate your own Codex-Intel.app.

3. Cleaning Up Old Configurations: Remove Old Keys to Access Official Benefits

After installation, don’t rush to double-click! Many old users have previously written base URLs and private API Keys for third-party models. If not addressed, Codex will still use your private interface upon startup, consuming your quota and costing you money.

Open the terminal and execute the following two steps:

cp ~/.codex/config.toml ~/.codex/config.toml.bak
vim ~/.codex/config.toml

Open the file in an editor and delete all custom proxy addresses, base URLs, and API Keys. Save and exit. This step is crucial for forcing the official account login and enjoying free quotas.

Now double-click to open Codex. Click the ⚙️ settings panel in the bottom right:

• If it shows API login, it means the previous step was not cleaned up; click Logout first.
• Select OpenAI Account Login; the system will automatically redirect to the browser authorization page. Once you see the Login Successful message, just close that web tab.
• Return to the desktop, and check the settings panel again. If your OpenAI registered email is displayed, the connection is fully established!

3. Using Codex

Installing is not the goal; running business processes is. Here are five common use cases organized by my team:

Operations Scripts

codex "Write a bash script to monitor disk space"

Debugging Bugs

Directly input error logs to let it identify the root cause and provide a patch.

Generating Test Cases

codex "Generate unit tests to cover src/utils/"

Batch Refactoring

Switch to `auto-edit`, and with one command, replace all `var` with `const/let`

Automatically Generating Documentation

Run it to directly produce README + API documentation

Don’t miss out on advanced features:

• AGENTS.md: Place this file in the project root, which acts like an “employee manual” for the AI. Include language, package management, indentation, and commit standards; it will automatically load with each conversation. Type /init in the terminal to generate a template.
• config.toml: Supports global, project, and command-line configurations. Want to default to running GPT-5.5? Just change the model field. MCP server configurations can also be directly converted to TOML for reuse.
• Automations: The app supports background scheduled tasks, such as automatically closing expired PRs daily or syncing progress with Notion and Slack. It won’t be affected by network disconnections; results will automatically enter the review queue.

Codex vs. Claude Code: Which One to Choose?

The community often debates “which is stronger”; I would say this question itself is misguided.

The two have fundamentally different roles, and smart users have already split workflows by scenario:

• Batch refactoring, writing scripts, running CI/CD → Choose Codex CLI without hesitation. Token consumption is only 1/3 to 1/4 of Claude’s, memory usage is just 80MB, and terminal benchmark tests lead by 12 percentage points, making it naturally suitable for unattended pipelines.
• Complex architecture design, precise debugging, front-end component development → Leave it to Claude Code. Its multi-step reasoning and framework understanding are indeed more nuanced, with a blind test win rate of 67%.
• Both are around 80% in SWE-bench Verified.

So don’t pick sides; the right answer is to “triage” by task type.

Conclusion

No matter how good the tool is, if used incorrectly, it’s useless. Before implementation, remember these three points:

✅ Always run in a Git repository. Codex has built-in checkpoint rollback; combined with git worktree to isolate experimental branches, you can restore from failures with a single click.

✅ Use full-auto cautiously for sensitive projects. First, use suggest to understand behaviors, then switch to auto-edit. For enterprise projects, ensure data compliance; API Key mode inputs are not used for training, but permission isolation must be well managed.

✅ Use /model to switch models flexibly. Use GPT-5.5 for complex tasks, and 5.4/5.3-Codex for lightweight commands, saving money and improving efficiency.

Now, Codex has evolved from just “writing code” to “managing processes”. Instead of waiting for it to be perfect, integrate it into your toolchain now and find the rhythm that suits you best in practice.

Interactive Time: Do you prefer Codex or Claude? Have you discovered any amazing workflows? Feel free to share your thoughts in the comments!

OpenAI's Ambitious Move: Building an AI Agent Phone

Mon, 18 May 2026 00:00:00 +0000

AI Agent Era’s Entry Anxiety.

This summer, Elon Musk is set to do something unprecedented in history: merge a large model company with a rocket manufacturing company for an IPO.

OpenAI’s most questionable move right now might be venturing into mobile phones. However, Sam Altman seems to think otherwise.

In Q1 of this year, OpenAI’s revenue and user growth fell short of expectations. Competitor Anthropic, with Claude Code, has attracted the most willing-to-pay users. Following this script, OpenAI should be contracting and focusing on proving its profitability ahead of an IPO by the end of this year or early next year.

Contrary to this, supply chain news suggests it is gearing up to challenge the world’s most mature, closed, and profitable consumer electronics category: the iPhone.

Reports indicate that OpenAI is accelerating the development of its first AI Agent phone, aiming for mass production by mid-2027, with a target of shipping 30 million units in the next two years.

Is it crazy?

Perhaps not. OpenAI seems to have recognized a more pressing issue: ChatGPT is intelligent, but it lacks physical capabilities.

It can answer questions but struggles to complete tasks. It resides within other systems—Apple’s, Microsoft’s, operating systems, browsers—thus lacking true authority.

The focus here is not on why OpenAI wants to build a phone, but rather how the company gradually realized that without its own terminal device, ChatGPT cannot truly thrive.

ChatGPT’s Success as Path Dependence

In April 2026, SpaceX secured an option to acquire Cursor for up to $60 billion later this year.

OpenAI initially believed in models—not phones, browsers, or specific apps. It believed in intelligence itself.

In its worldview, as long as the model is strong enough, the entry points, products, and business models will naturally progress.

This is not just rhetoric. In 2020, OpenAI published the influential Scaling Laws paper, establishing a relatively optimistic belief: as models, data, and computing power scale, intelligence will improve in predictable ways.

In other words, the priority is not to seize entry points first but to continue strengthening the model. With sufficient intelligence, the world will naturally yield.

This belief was validated on November 30, 2022.

On that day, ChatGPT launched. It had no flashy interface, no hardware, no pre-installed platform—just an input box on a webpage. Yet, it provided an unprecedented experience; you could type a sentence, and it would respond like a human.

The shock was not just that AI could converse, but that it did so without relying on any traditional entry points. No phone manufacturers pushed it, no operating systems prominently featured it; users found it themselves.

Within two months, it surpassed 100 million monthly active users, becoming the fastest-growing consumer application in human history.

OpenAI appeared to be right. Microsoft quickly deepened its investment, embedding its capabilities into Copilot, Office, and Bing; Apple also integrated ChatGPT into Apple Intelligence at the 2024 WWDC.

At this point, OpenAI stood at the center of the era: the strongest model, the largest user base, and the deepest collaborations.

However, this is where the problems began.

ChatGPT’s success was so dazzling that it easily led OpenAI to believe that the model itself was the entry point. It didn’t need to own a phone or control an operating system—if the intelligence was impressive enough, users would come on their own.

The real cracks began to emerge from here.

Claude Code Redefines Revenue Rules

The first crack came from Anthropic.

In May 2025, it launched Claude Code. There were no flashy demos or explosive launch events. This product simply integrated into developers’ terminals, codebases, and Git workflows, helping engineers get their work done.

Six months after launch, Claude Code’s annualized revenue reached $1 billion; within a year, it exceeded $2.5 billion. By April 2026, Anthropic’s total annualized revenue surpassed $30 billion.

Meanwhile, OpenAI reported monthly revenues of $2 billion, annualized at about $24 billion.

Anthropic achieved higher revenue with far fewer users than ChatGPT. This is where OpenAI should truly be concerned.

The reason is simple—it penetrated a segment of users most willing to pay.

The question is, why did OpenAI lag behind?

Not because it couldn’t see the potential of Agents. It was the dazzling success of ChatGPT that led OpenAI to continue along its original inertia: building stronger models, expanding the user base, and seeking the next universal entry point.

Over the past two years, OpenAI has launched many 0 to 1 attempts—GPT Store, Sora, Operator, Deep Research—all stemming from this mindset. They collectively point to one judgment: as long as the model is strong enough, new products, new entry points, and new business models will emerge naturally.

However, Anthropic chose a different path. It did not first create a super entry point for everyone but instead embedded Claude Code into developer workflows, repeatedly refining one thing—ensuring AI could complete tasks.

This is where OpenAI was slow. It wasn’t that it didn’t create new products; it failed to immediately capitalize on a high-paying scenario, scaling it from 1 to 100.

Sora is a typical example. It shocked the audience at launch, but video generation consumed massive computing resources, and user retention and business models were unclear. Later, OpenAI shut down Sora, which in some ways was a pruning—realizing that creating an impressive AI demo and penetrating a high-paying workflow are two different things.

Model capabilities can create highlights, but commercial efficiency comes from consistent delivery of results.

At this point, OpenAI finally recognized: Agents are not an added feature but the core of the next phase of AI commercialization. ChatGPT cannot just prove its intelligence; it must demonstrate its ability to complete tasks for users.

But when it truly began to take on tasks, it encountered not the ceiling of model capability but the ceiling of authority.

How to Monetize 900 Million Users

OpenAI is certainly also in pursuit. In May 2025, it launched Codex in direct response to Claude Code. By April 2026, Codex achieved a weekly active user count of 3 million.

However, in the coding arena, OpenAI faces an uphill battle to regain ground—Anthropic has already established a stronghold in the coding Agent mindset, leaving newcomers to play catch-up.

This is why OpenAI has begun reallocating resources: shifting focus from projects that easily create highlights but struggle to penetrate commercial loops towards Agents, enterprise markets, and more foundational research.

What it truly needs to look at is the larger card in its hand—900 million weekly active users.

These users are not programmers and won’t pay for code. Yet, each of them has needs: writing emails, creating proposals, researching, booking travel, shopping, organizing files.

If ChatGPT can evolve from a “talking” entry point to a “doing” entry point, that would represent OpenAI’s true commercial capability.

Imagine a scenario: you want to buy a flight ticket, telling ChatGPT your time, budget, and preferences, and it helps you check flights, compare prices, and look at hotels, ultimately providing you with a confirmation button.

At that moment, a part of the value of travel booking platforms would be bypassed. Price comparisons, ad placements, commissions, and user decision influences would all be redistributed. Buying insurance, paying credit cards, and settling utility bills follow the same logic. As long as the Agent can complete tasks, OpenAI has a chance to earn a share of every transaction commission and every advertising influence.

This is where the true value of 900 million users lies—ChatGPT evolves from merely answering questions to taking over tasks and transaction entry points.

However, once AI starts performing tasks, it is no longer just a model in a chat box. It needs to know your location, see what’s happening on your screen, and access your files, calendar, emails, and payments.

The question then shifts from “Is the model strong enough?” to “Who has the authority?”

And authority is precisely what OpenAI lacks.

ChatGPT Lives in Others’ Houses

OpenAI initially believed that collaboration could solve the entry problem. Apple provided it with the iPhone, and Microsoft offered Office, Windows, and enterprise clients. At the time, this seemed a victory for OpenAI’s model belief.

However, with the arrival of the Agent era, the problem changed.

With Apple, ChatGPT is an external expert being called upon. It can answer questions but cannot truly take over screens, cameras, notifications, payments, and files—these permissions Apple will not relinquish. Otherwise, the iPhone’s “soul” would no longer belong to Apple.

The same goes for Microsoft. In the past, OpenAI provided models while Microsoft integrated AI into Office and other entry points. But as OpenAI began developing Codex and enterprise Agents, it encroached on Microsoft’s territory—Agents naturally need to enter workflows, write code, handle files, and complete tasks for employees, which are core areas of Microsoft’s sovereignty.

Thus, while OpenAI and Microsoft’s relationship did not immediately fracture, the boundaries have changed. In April 2026, both parties adjusted their agreement, converting Microsoft’s exclusive authorization into a non-exclusive one, allowing OpenAI to serve clients on any cloud.

The meaning of this is clear: OpenAI does not want to be merely a supplier within the Microsoft ecosystem. It aims to face clients directly, deliver Agents independently, and secure its own entry points.

At this juncture, its relationship with Apple and Microsoft has become delicate. Because Agents require not just a display position but a default entry point, system permissions, and the first smart terminal users interact with daily.

These are things Apple and Microsoft will not provide, nor can they.

Ultimately, ChatGPT is powerful, yet it continues to reside in others’ houses—Apple’s house, Microsoft’s house, browsers’ houses, operating systems’ houses. It can be called upon, integrated, and serve as a good supplier, but it cannot decide when to appear or what permissions it can access.

A mobile phone, however, is the most aligned with its resource endowment. The 900 million weekly active users are already willing to entrust their questions to ChatGPT—transitioning this mindset to a device is shorter than building an operating system or a browser from scratch.

What it aims to create is not just another iPhone filled with apps but a phone dedicated to Agents—a body that allows ChatGPT to see, access, and execute tasks.

This is also why, in May 2025, OpenAI spent about $6.5 billion acquiring Jony Ive’s hardware company. He was the industrial designer of the original iPhone and one of Steve Jobs’ most important associates. OpenAI seeks him not just to create beautiful hardware but to redefine personal devices in the AI era.

Returning to the initial question: why would a large model company want to build a phone?

What OpenAI desires is not just a phone but sovereignty.

It aims to find a default entry point for ChatGPT. However, this endeavor will inherently position OpenAI against Apple. In the past, Apple could treat ChatGPT as a supplier; if OpenAI truly aims to create a phone for the AI era, it will no longer be a supplier but a competitor to Apple in personal entry points.

Looking back over the past few years, OpenAI’s story has actually undergone a reversal.

It once believed that as long as the model was strong enough, the world would reorganize itself around intelligence. The explosion of ChatGPT indeed validated this belief—it attracted hundreds of millions of users without hardware or pre-installation, relying solely on a web input box.

However, with the arrival of the Agent era, OpenAI realized it still lacked the most critical element: sovereignty.

ChatGPT’s success is both a victory and a form of path dependence. It led OpenAI to believe for too long that the model itself was the answer. Only after Claude Code achieved $2.5 billion in annualized revenue and both Apple and Microsoft hesitated to relinquish system permissions did OpenAI realize that no matter how strong the model, it still needed entry points, authority, and tasks.

Thus, OpenAI’s venture into mobile phone development is not merely about creating a device; it is about giving ChatGPT its first physical embodiment.

Understanding AI Through the Lens of Tokens

Mon, 18 May 2026 00:00:00 +0000

Introduction

On May 18, a report titled “Understanding Artificial Intelligence Must Start from Understanding Tokens” was published by Xinhua Daily Telegraph.

In early 2026, a set of data sparked heated discussions in the global AI industry. OpenRouter, the world’s largest AI model API aggregation platform, reported that from February 9 to 15, the token call volume of Chinese large models reached 4.12 trillion, surpassing the 2.94 trillion of U.S. models for the first time in history. This lead continued for several weeks, breaking through 7.3 trillion by mid to late March, with four of the top five models in global call volume coming from China.

This data is not presented to compare “who has more or less” but marks a quiet revolution in the basic measurement unit of the AI industry—tokens, which are becoming the “kilowatt-hour” of the intelligent era. The meanings of six dimensions—models, computing power, data, applications, industry, and governance—are profoundly reshaped by this measurement unit. Understanding AI in 2026 must begin with understanding tokens.

Sixfold Reconstruction from a Measurement Unit

The measurement unit of the industrial revolution was the “kilowatt-hour,” allowing energy to be accurately measured, priced, and transported across domains. The information revolution’s unit was “bits” and “bandwidth,” enabling information to be packaged, transmitted, and billed for the first time. The measurement unit of the intelligent revolution is “tokens,” allowing “intelligence” to be segmented, measured, priced, and traded for the first time.

The popularization of the token concept and its rapid growth in call volume are gradually pushing “intelligence” towards industrialization, marketization, and circulation.

Models

The economic value of large models is shifting from one-time training costs to ongoing inference output. Model vendors no longer simply “sell capabilities” but directly “sell tokens”—pricing based on millions of tokens for input and output has become a global industry norm. The asset attributes of models are transitioning from “weight files” to “the ability to continuously produce tokens.”

Computing Power

The focus is shifting from “training computing power” to “inference computing power.” Training computing power is pulse-based and centralized, while inference computing power is continuous and distributed, posing new requirements for latency, energy efficiency, and geographical distribution. The collaboration of three levels of computing power—cloud, edge, and end—along with inference-specific chips and optical interconnects, is becoming the new focus of infrastructure. JPMorgan predicts that China’s inference token consumption will grow by more than two orders of magnitude by 2030 compared to 2025.

Data

Data must be cleaned, labeled, and tokenized before entering large models, similar to how raw coal needs to be processed into standard fuel for power generation. In long-tail scenarios like autonomous driving, robot training, and scientific discovery, synthetic data generated through simulation has achieved large-scale application. The construction of a data factor market is entering a substantial phase, with “trainability” and “token output density”—rather than just data scale—becoming new metrics for data asset pricing. This shift is significant: the valuation of data is beginning to be linked to its actual contribution in the token production chain, providing a more solid economic foundation for the market-oriented allocation of data factors.

Applications

Traditional software charges based on seats and functionalities; today, applications are billed according to token call volume and business results. Intelligent agents are becoming the main consumers of tokens, with a complex task potentially consuming hundreds of thousands or even millions of tokens. The “intelligent agent as a service” market is rapidly expanding, with performance-based billing models being implemented at scale in customer service, marketing, compliance, and programming scenarios. The essence of applications is shifting from “delivering functions” to “consuming intelligence.”

Industry

A new industry chain is forming around tokens, encompassing production (models and computing power), distribution (inference networks, APIs, intelligent agent protocols), consumption (applications and intelligent agents), and measurement (evaluation benchmarks, auditing, and trusted verification). The boundaries between model layers, inference service layers, intelligent agent middleware layers, and industry application layers are becoming increasingly clear, with industry-specific intelligent agents becoming mainstream investments. Model vendors, cloud vendors, chip manufacturers, green power operators, and content distribution network providers together form a collaborative ecosystem for the token industry chain. According to data from the China Academy of Information and Communications Technology, the scale of China’s core AI industry is expected to exceed 1.2 trillion yuan by 2026, with the collaborative effects of the entire industry chain becoming evident.

Governance

The governance focus is shifting from “algorithm governance” to “full-chain governance of tokens.” As the AI industry has developed, the governance objects have expanded from “algorithms and code” to the entire chain of token production, circulation, consumption, and cross-border flow: traceability of tokens, identification of synthetic content, cross-border token flow, constraints on computing power and energy consumption, and trusted evaluation and benchmarks—all call for new governance tools and rules. The year 2026 may become a key year for the concentrated implementation of global AI governance rules.

China’s Position in the Global Token Wave

In the global wave brought by tokens, China is forming a unique position supported by multiple factors.

On the production side, domestic models are rising in clusters. A number of domestic models, such as MiniMax, Dark Side of the Moon, Deep Quest, Zhipu, Alibaba Qianwen, and Byte Bean, are leveraging mixed expert architectures and extreme engineering optimizations to enhance performance while reducing inference prices to a fraction of comparable global models. On the OpenRouter platform, U.S. users account for 47%, while Chinese users make up only about 6%, yet the call volume is led by Chinese models—this is a recognition voted by global developers.

On the consumption side, applications are penetrating deeper into everyday life at an unprecedented speed. A general practitioner in a county hospital, faced with a suspicious lung CT, has AI circle nodules and provide differential diagnosis suggestions within seconds and thousands of tokens, compressing what used to take two weeks of consultation into a single outpatient visit. A farmer in Shouguang, Shandong, uses a smartphone to photograph a curling cucumber, and a smart agriculture app utilizes tokenized agricultural knowledge to inform him whether it is thrips or a viral disease and which medication to use. An elderly person living alone says “I feel chest tightness” to a smart speaker in their dialect, and after a conversation of several thousand tokens, their children’s phones receive a warning and location sharing for emergency services. Delivery riders no longer hear mechanical instructions like “turn right ahead” but receive route planning based on real-time traffic and elevator wait times. AI assistants in government service halls respond around the clock to inquiries about medical insurance transfers and property registrations, replacing “people running errands” with “tokens running errands”… Tokens are becoming the “invisible labor force” across various industries.

At the industry chain level, a full-stack collaborative ecosystem is rapidly taking shape. From domestic chips like Ascend, Cambricon, and Haiguang to inference service platforms like Volcano Engine, Alibaba Cloud, and Tencent Cloud, along with a range of open-source middleware and industry-specific intelligent agents, the entire industry chain covering chips, computing power, models, middleware, and applications is quickly improving. The “East Data West Computing” project provides low-cost computing power, while green power directly supplies data centers, solidifying the energy foundation.

However, it is essential to recognize that there is still significant room for improvement in areas such as original model innovation, high-end computing power infrastructure, cross-language and cross-cultural ecological influence, and participation in global rule-making.

The second half of the token wave is not about “already winning” but rather “just beginning.” In the global picture unfolded by small tokens, China is not only a vast market but also a proactive builder and responsible co-governor. Understanding tokens means understanding the next phase of artificial intelligence.

OpenClaw: The Ultimate Guide for Beginners in Packet Capture

Wed, 13 May 2026 00:00:00 +0000

Introduction to OpenClaw: What It Is and Who It’s For

OpenClaw is an open-source, free, cross-platform local proxy packet capture tool. Its core function is to intercept, view, and modify network requests between mobile devices and computers, including web APIs, app APIs, images, videos, and all network traffic. It emphasizes three main features: lightweight, customizable, and no bundling.

Suitable Users

General Users: Troubleshoot app ads, block pop-ups, filter notifications, and clean up unnecessary network behaviors.
Development and Testing Personnel: Debug APIs, view request parameters, simulate return data, and locate network anomalies.
Tech Enthusiasts: Analyze app communication logic, troubleshoot power-consuming network behaviors, and optimize network usage.
Students and Self-learners: Low-cost entry into network packet capture and learning the basics of HTTP/HTTPS.

Unsuitable Scenarios

OpenClaw is limited to personal learning, legal debugging, and self-optimization. It is prohibited to use for cracking paid content, stealing others’ information, intercepting or tampering with third-party paid APIs, or commercial reverse engineering. Users must comply with network security and copyright laws.

Getting Started: Installation and Configuration on Three Major Platforms

1. Windows Installation and Configuration

Download the latest stable green package from the OpenClaw official repository, preferably the official release version to avoid compatibility issues.
Unzip to a non-system disk path, avoiding Chinese characters, spaces, or special symbols, and run OpenClaw.exe to start.
The first launch will pop up a basic guide; select the default proxy port: 9090, and keep it unchanged.
Go to Settings → Network, enable system proxy takeover; the software will automatically configure the Windows system proxy.
Certificate installation: Click Settings → Certificate, export the root certificate file, double-click to install, select “Local Computer” → “Trusted Root Certification Authorities” to complete the installation.
Disable the computer’s firewall and third-party security software network interception to prevent proxy blockage, then start capture mode for normal use.

2. macOS Installation and Configuration

Download the corresponding architecture version (Intel / Apple Silicon), unzip, and drag OpenClaw into the Applications folder.
Right-click to open, allow running programs from unknown sources in system settings; the first run will request network permissions, allow all.
The default port is 9090; enable automatic system proxy, and macOS will request permission; click allow.
Export the certificate, double-click to install, and in Keychain Access, set the certificate trust to “Always Trust”—this is crucial for HTTPS packet capture on macOS.
Disable network restrictions in macOS privacy protection, turn off VPNs and other proxy tools to avoid port conflicts.

3. Android Installation and Computer Interaction

Ensure OpenClaw is running on the computer, remember the local IP address, and connect the phone and computer to the same WiFi.
Go to WLAN settings on the phone, long press the current WiFi, modify network → Advanced options, select manual proxy, enter the computer’s IP, and port 9090.
Access the computer IP:9090 in the phone’s browser to download the OpenClaw root certificate; for Android 11 and above, manually install the CA certificate in Settings → Security → More security settings.
For Android 13 and above, some apps enable SSL verification, which may cause capture failures due to system security restrictions; this is normal.
Once completed, open any app or webpage, and the computer will capture all network requests in real-time.

Common Installation Pitfalls for Beginners

Paths containing Chinese characters or spaces may cause the software to crash or fail to start.
Incorrectly installed or untrusted certificates will only capture HTTP traffic, not HTTPS.
Multiple proxies or VPNs running simultaneously may cause port conflicts, rendering the proxy ineffective.

Basic Usage: Daily Packet Capture, Request Viewing, Filtering, and Quick Start

After installation, mastering the following four basic operations will meet 90% of daily needs.

Real-time Packet Capture: Turn on the capture switch, and all network requests will be displayed in real-time, including request addresses, methods, status codes, duration, size, and source app. Click any request to view headers, parameters, response content, and cookie information, fully restoring the interface communication content.
Keyword Filtering: Enter keywords in the top search box to quickly filter specific apps, interfaces, or ad domains, one-click filtering out unnecessary traffic, significantly improving efficiency.
Request Replay and Copying: Select any interface, right-click to replay the request, simulating a resend; can copy as Curl or Postman format for direct import into development tools without manual copying and pasting of parameters.
Simple Blocking Rules: In basic mode, add ad domains or push domains and select intercept to block splash ads, pop-ups, and push notifications, sufficient for general users.

Advanced Customization: Rule Writing, Script Interception, Simulated Returns for Advanced Features

OpenClaw’s true core advantage lies in its highly customizable rule system, supporting rule scripts, redirection, request modification, response modification, and local simulated data. Below are the three most practical advanced uses, which can be directly copied and used.

Basic Interception Rules (Blocking Ads, Pop-ups, Tracking): Use domain matching rules, simple and intuitive format:
```
# Block ad domains
*.ad.com
*.track.com
*.push.xxx.com

# Return empty data after blocking
return 200 ""
```
Paste the rules into the rule editor to enable them without needing coding skills.
Modify Request Parameters (For Debugging APIs): Intercept specified interfaces and automatically modify headers, tokens, parameters, and cookies, suitable for development debugging:
```
if url contains "/api/user/login" {
    set header["token"] = "custom test token"
    set param["id"] = "1001"
}
```
This simulates different accounts and parameters for interface return effects.
Local Simulated Returns (Offline Debugging): In a no-network environment, intercept interfaces and directly return locally preset data, very useful for development debugging:
```
if url == "/api/home/data" {
    return 200 `{\"code\":200,\"data\":{\"name\":\"test\",\"list\":[]}}`
}
```
No backend setup is needed to debug front-end and app page effects.

Advanced Usage Reminders

Too many rules can increase software load; it is recommended to add them as needed. Do not write interception rules for banking, payment, or government apps to avoid triggering risk control and security issues.

Principle Analysis: How Does OpenClaw Capture Packets? Understanding to Avoid Pitfalls

Many users only know how to use the tool without understanding it, leading to reinstallation when encountering problems. Understanding the underlying principles can help quickly locate faults and solve anomalies.

Proxy Relay Mode: OpenClaw is essentially a local HTTP/HTTPS proxy. All network requests from the computer or phone do not directly send to the server but first go to OpenClaw, which intercepts, views, and modifies them before forwarding to the server. The server’s returned data also passes through the software for capturing.
HTTPS Certificate Decryption Principle: Ordinary HTTPS encrypted traffic cannot be viewed directly. OpenClaw installs a self-signed root certificate, and once the phone and computer trust this certificate, the software acts as a man-in-the-middle, decrypting client-encrypted traffic and re-encrypting it for the server, enabling plaintext viewing—all done locally without uploading data, ensuring privacy and security.
Why Some Apps Cannot Capture Packets: Many mainstream apps enable SSL Pinning (certificate locking), trusting only the official built-in certificates and not the locally installed OpenClaw certificate, resulting in connection refusals. This is a system-level security protection, not a software fault, and there is currently no compliant cracking solution.
Port Conflict Principle: 9090 is the default port. If other software occupies this port, OpenClaw cannot start the proxy. Simply change the port in settings to run normally.

Common Issues One-Stop Troubleshooting: Solving Frequent Problems for Beginners

Cannot Capture Any Traffic: Check if the proxy is enabled, if the phone and computer are on the same WiFi, if there is a port conflict, or if the firewall is blocking.
Can Only Capture Web Traffic, Not Apps: Check if the certificate is installed and trusted, and if the app has SSL locking enabled.
Software Crashes: Change the path to pure English, disable antivirus software, and download the latest stable version.
Rules Not Taking Effect: Check rule syntax, if the rules are enabled, and if they match the corresponding interface.
Slow Internet Connection: Disable unnecessary rules and reduce background packet capture; the software itself does not limit speed.

Rational Use of Packet Capture Tools: Compliance is the Bottom Line

As an open-source packet capture tool, OpenClaw’s value lies in learning network knowledge, personal debugging, and optimizing device usage experience. The tool itself is neutral and harmless, but the boundaries of its use are crucial.

It is prohibited to use for cracking paid resources, stealing others’ account information, maliciously tampering with interfaces, reverse engineering commercial software, or intercepting interfaces for profit. Such actions not only violate platform rules but also infringe upon network security laws, leading to corresponding responsibilities.

For general users, mastering basic packet capture and ad blocking is sufficient; developers should reasonably use advanced rules to enhance debugging efficiency; enthusiasts should delve into the underlying principles to enhance their network knowledge. This is the true meaning of the tool’s existence.

Conclusion

OpenClaw, with its open-source, free, lightweight, and highly customizable advantages, has become the most suitable packet capture tool for beginners. This manual covers installation, basic usage, advanced customization, and underlying principles, addressing the pain points of fragmented online tutorials, incomplete steps, and frequent errors. By following the steps in this article, beginners can quickly get started, and advanced users can directly apply rule templates for efficient debugging. Mastering OpenClaw not only means learning a tool but also understanding network request logic and enhancing digital and development knowledge.

What packet capture tools have you used before? Charles, Fiddler, or OpenClaw? Do you mainly use packet capture tools for debugging APIs or blocking ads? What common issues have you encountered during use? Feel free to share and discuss in the comments!

New National Standards for AI Terminals Released in China

Mon, 11 May 2026 00:00:00 +0000

New National Standards for AI Terminals

On May 8, 2026, the Ministry of Industry and Information Technology, the State Administration for Market Regulation, and the Ministry of Commerce jointly released the series of national standards titled “Intelligent Classification of AI Terminals” (GB/Z 177—2026). These standards specify the requirements for various products, including smartphones, computers, televisions, glasses, car cockpits, speakers, and headphones.

Experts believe that these standards clearly define the intelligence levels of AI terminals, laying a solid foundation for building a safe, orderly, and efficient ecosystem for AI terminals. They will also promote the coordinated development of China’s AI terminal industry, achieving scale advantages and leading standards.

Diverse Product Forms

AI terminals are key carriers for the large-scale implementation and systematic development of AI technology. In recent years, China’s AI industry has flourished, with AI terminals driving a variety of intelligent scenarios, continuously giving rise to new products, business models, and experiences. This has effectively stimulated consumer enthusiasm and become an important lever for tapping domestic demand and optimizing consumption structure.

This year, driven by the expansion of the old-for-new consumption policy and the deep integration of AI technology with consumer products, AI terminals have become increasingly popular among consumers. In the first quarter, China’s smartphone production reached 298 million units, a year-on-year increase of 6.9%, while service robot production exceeded 4.4 million units, up 2.6% year-on-year.

Wei Ran, chief engineer of the China Academy of Information and Communications Technology, explained that AI terminals, driven by large models, represent a new generation of intelligent terminals. Compared to traditional terminals, they have four major functional upgrades: the ability to actively perceive scenarios, accurately understand user intentions; support for multi-modal interactions including text, voice, and audio-video; capability for generative applications and intelligent agent services; and autonomous learning and continuous evolution based on personal large models and knowledge bases.

“Overall, intelligent terminals have evolved from traditional passive execution tools to intelligent assistants that can perceive, understand, serve, and grow, redefining the human-machine interaction relationship. These functions are core points assessed in the highest level of the intelligent classification national standards,” Wei said.

Currently, AI terminals exhibit a rich variety of forms, with traditional terminals upgrading, emerging terminals expanding, and future terminal explorations evolving in parallel. Traditional terminals like AI smartphones, computers, and tablets have surpassed ten million units in shipments, becoming the market’s main force. New categories such as intelligent in-car terminals, smart glasses, and intelligent toys are rapidly growing, while native terminal forms represented by embodied intelligence continue to explore, further accelerating the application of AI.

Wei analyzed that the systematic integration of AI and terminal technology requires breakthroughs in three key areas: optimizing the edge-cloud collaboration architecture, enhancing hardware capabilities, and upgrading security and privacy protection systems.

Clear Evaluation System

Since 2023, leading enterprises in the smartphone and computer industries have actively launched AI terminal-related products, each with different functional focuses. The lack of definitions and classification standards for AI terminals has made it difficult for consumers to accurately assess the intelligence levels of different products and has complicated product development and market positioning for companies. The absence of a unified consensus on terminal intelligence classification has led to concept generalization and misuse, with some products falling into parameter stacking and disconnects between function promotion and actual experience.

The “Intelligent Classification of AI Terminals” series of national standards adopts a “2+N” framework. The “2” refers to “Part 1: Reference Framework” and “Part 2: General Requirements,” which clarify the concepts of intelligence, classification levels, and testing methods. The classification system ranges from L1 response level, L2 tool level, L3 assistant level, to L4 collaborative level, with increasing intelligence levels. The L4 collaborative level will be further clarified and improved in subsequent revisions based on industry development levels. The “N” represents specific standards for different products such as smartphones, computers, televisions, glasses, car cockpits, speakers, and headphones. The first batch includes seven categories, with plans to develop standards for additional categories in the future.

Li Hongwei, chief engineer of the China Electronic Information Industry Development Research Institute, stated that the highlights of this series of standards are scenario-based, quantifiable, and consider both edge and cloud, covering various scenarios such as office work, learning, and design. This provides a unified “health check standard” for AI terminals, regulating industry development and allowing consumers to make informed purchases with confidence.

The series of standards provides a scientific and unified evaluation system for the large-scale application and intelligent classification management of AI terminal products in China. This will help regulate market order and enhance user experience. Additionally, it will accelerate the innovation and iterative upgrade of AI terminal technology products, precisely guiding technological research and development directions, and ensuring sustainable industry development. The introduction of these standards will also enhance China’s voice in the global standard-setting for AI terminals, reducing technical barriers for enterprises going abroad and improving international competitiveness.

“On one hand, the standards provide enterprises with directions for improvement to meet benchmarks, facilitating the supply of high-end products, enhancing resource utilization efficiency, and promoting orderly competition and healthy development. On the other hand, they provide consumers with technical and evaluation bases, ensuring that the demand side has standards to rely on for better selection of intelligent products, enhancing user experience and satisfaction,” said Yu Xiuming, deputy director of the China Electronic Technology Standardization Research Institute.

Accelerating Technological Inclusivity

Lenovo Group participated in the drafting of these standards. Currently, AI PCs account for over 30% of Lenovo’s PC shipments. Its built-in personal super intelligent agent, Tianxi AI, is advancing towards becoming a “personal super-powered partner” for users. Lenovo Group Vice President Abulike Mu stated that Lenovo will actively implement national standards, continuously innovate terminal products around Tianxi AI, refine terminal innovation application scenarios and user experiences, and drive collaborative innovation among upstream and downstream partners in the industry chain to promote high-quality development of the AI terminal industry and accelerate the inclusivity of AI terminals.

To promote the innovative development of the AI terminal industry, the Ministry of Industry and Information Technology will strengthen the implementation of standards, conduct standard interpretations and specialized training, build a compliance testing platform, encourage leading enterprises to take the lead in trials, and create demonstration cases and benchmark products for standard applications. They will accelerate the iteration of the standard system, optimize and improve standard content, and continue to expand the coverage of standards, aiming to establish a unified standard system that includes various terminal forms. This will stimulate consumer-led effectiveness, ensuring the implementation of standards in this year’s old-for-new consumption policy, and forming a catalog of AI terminal products to guide public consumption decisions, expanding the breadth and depth of AI applications and creating hot consumption scenarios.

Yu Xiuming mentioned that they will continue to enrich standard categories, developing more standards for wearable devices, home appliances, and trendy toys, ensuring that the intelligent classification of various terminals has standards to rely on. This will provide standard and technical support for the implementation of national policies and offer standard consulting and product evaluation services to society, aiding high-quality development of the industry.

The True Purpose of AI Development: A Reflection

Mon, 11 May 2026 00:00:00 +0000

The True Purpose of AI Development

We live in an era dominated by artificial intelligence (AI). From waking up to smart alarms, navigating with intelligent maps, to utilizing various AI tools at work, AI has seamlessly integrated into our daily lives. It has evolved from a distant sci-fi concept to an essential part of our routines, rapidly changing the logic of how the world operates and reshaping individual life paths.

However, we often find ourselves questioning: what is the true purpose of AI development?

This question carries a pure and warm answer since the moment AI was conceived. Looking back at the origins of technological development, humanity created AI not to produce machines that replace us, but to let technology illuminate our lives. We hoped AI would take over tedious and exhausting physical and mental labor, freeing workers from mechanical operations, allowing white-collar workers to escape endless reports and documents, and helping researchers avoid massive data calculations and filtering. We aimed to liberate humanity from the fatigue of survival, enabling us to embrace life itself, pursue passions, create value, and feel warmth.

We envisioned AI bridging gaps of injustice, providing children in remote areas access to quality education like their urban counterparts; assisting the elderly and disabled with smart devices for a more dignified and convenient life; and making medical resources more accessible, with AI accurately diagnosing complex illnesses so that everyone could afford and receive quality healthcare. We dreamed of AI exploring uncharted territories, diving into the deep sea, reaching for the stars, solving climate issues, and conquering incurable diseases, driven by human curiosity and courage to expand the dimensions of life and civilization.

At that time, we held a hopeful view of artificial intelligence. We believed that the ultimate significance of technology is to serve humanity, to make life better, and to give people more time to spend with family, enjoy the seasons, and chase dreams. This was the original intention behind the birth of AI, the pure mission humanity assigned to it, and the benevolent nature technology should embody.

Yet, as we progressed, reality began to drift away from this initial vision, evolving into something we had not anticipated.

AI has not become a remedy for fatigue; instead, it has become a catalyst for anxiety. We thought AI would make work easier, but it has turned into an accelerator of competition. AI can generate content in seconds, create artwork in minutes, and process data around the clock. While efficiency has been magnified, the living space for ordinary people has been increasingly squeezed. The fear of job displacement, the fatigue of keeping up with AI’s speed, and the pressure of being left behind without continuous learning have plunged many into endless internal struggles. We initially sought to escape fatigue, but now we find ourselves chased by AI, with less time to breathe, as life becomes filled with work and happiness is replaced by anxiety.

We expected AI to benefit everyone, but in reality, it has gradually become a tool for capital and profit. Big data invades privacy without restraint, algorithms categorize people based on spending power, creating information silos that amplify anxiety and division. Businesses use AI to precisely calculate user preferences, employing manipulative marketing tactics that exploit human weaknesses for profit. Quality intelligent resources concentrate in the hands of a few, while ordinary people can only passively accept algorithmic control, with the benefits of technology failing to reach the masses and instead widening the gap between people.

We believed AI would strengthen connections among people, yet it has ultimately rendered emotions colder. AI can write love letters, generate blessings, and simulate conversations, making even the most sincere expressions easily replicable, leading to a scarcity of genuine feelings. We laugh at smart devices while overlooking the eyes of those around us; we indulge in the virtual world shaped by algorithms, forgetting to have heartfelt conversations with family. Technology may shorten virtual distances, but it distances us from real warmth, as human sincerity and emotion are gradually replaced by cold codes and algorithms.

Even more lamentable is that the development of artificial intelligence has strayed from the “human-centered” track. Some use AI to commit fraud, infringe copyrights, and erase the hard work of creators; others use AI to spread rumors and incite emotions, undermining social trust. Some relentlessly pursue AI’s intelligence and commercialization, neglecting ethics and boundaries, turning technology into a weapon that harms.

We must acknowledge that technology itself is neither good nor bad; AI is merely a tool, reflecting the inner projections of humanity.

The deviation of AI from its original intention is not a fault of the technology itself, but rather a consequence of those who control it, lost in desire and profit. When capital-driven motives replace technology’s benevolent purpose, when efficiency overshadow humanistic care, and when selfishness triumphs over original missions, even the best technology can become a tool for satisfying personal desires, and the warmest intentions can be eroded by the harsh realities of utilitarianism.

We should never deny the value of artificial intelligence; it continues to drive social progress and benefits humanity in many fields: intelligent rescue saves lives, smart agriculture ensures food security, intelligent healthcare protects health, and smart technology turns countless impossibilities into possibilities. What we truly need to reflect on is how to uphold the ethical boundaries of technology and how to bring AI back to its essence of serving humanity.

True artificial intelligence should be a partner to humanity, not an adversary; a supportive tool, not a replacement; a warm enhancement, not a cold manipulation. It should help alleviate burdens, not increase anxiety; bridge distances between hearts, not create barriers; and benefit every ordinary person, not become a weapon for profit for a select few.

The ultimate warmth of technology always lies in humanity.

May the development of artificial intelligence rediscover its original intention, shed the cloak of utilitarianism, and return to its essence of benevolence. May we control technology, rather than be enslaved by it; may AI liberate us from the trivial, allowing us to embrace the warmth and richness of life again; and may every technological advancement enable humanity to live more freely, happily, and with dignity.

After all, we create technology not to change humanity, but to better protect humanity and safeguard the most precious aspects of life and love.

Dynamic Context Discovery: The Next Paradigm for Coding Agents

Sun, 10 May 2026 00:00:00 +0000

Dynamic Context Discovery: The Next Paradigm for Coding Agents

Core Argument: Cursor’s concept of “dynamic context discovery” represents a paradigm shift—from stuffing all context into a context window to allowing agents to pull context as needed. This transition addresses not only token efficiency but also the “quality of context”: less but more precise information actually enhances agent performance.

1. The Core Dilemma of the Old Paradigm: The Collapse of Static Context

Traditional context engineering is based on the assumption that more context = better agent performance. This assumption held true in 2024 when model capabilities and context windows were relatively limited. However, by 2026, things changed:

Larger context windows but no increase in information density: Models can read 200K tokens, but noise (outdated logs, irrelevant file versions, intermediate products of historical operations) can also be included.
Context bloat from third-party tools: The JSON returned by MCP calls can be larger than the core code you are editing, but agents cannot distinguish between primary and secondary information.
Irreversible loss from summarization: When the context is full, existing systems can only compress—but compression is lossy, leading to the loss of critical details and task failures.

Official Quote: “The common approach coding agents take is to truncate long shell commands or MCP results. This can lead to data loss, which could include important information you wanted in the context.” — Cursor Blog: Dynamic context discovery

Fundamental Flaw of Static Context: It assumes that all information is known for its value at the time of injection, but in reality, the value of information depends on the current stage of the task—the same test output is core context during the “writing tests” phase but noise during the “writing business logic” phase.

2. Core Mechanism of Dynamic Context Discovery

Cursor’s solution is file systemization: converting tool outputs, chat histories, and MCP tool descriptions into files, allowing agents to pull as needed using grep/tail/read.

2.1 File-based Long Tool Responses

Old Model: tool_call → JSON 200KB fully stuffed into context
New Model: tool_call → write to /tmp/tool_output_xxx.json → agent reads on demand using tail/read

Effect: Reduces unnecessary summarization triggers. Internal testing at Cursor shows that this change significantly decreases summarization calls after the context is full.

2.2 File-based References to Chat History

When the context window is full, Cursor provides the agent with a “reference to the history file” instead of forcing it to compress. If the agent finds that the summary lacks critical details, it can grep the history file to recover them.

Official Quote: “After the context window limit is reached, or the user decides to summarize manually, we give the agent a reference to the history file. If the agent knows that it needs more details that are missing from the summary, it can search through the history to recover them.” — Cursor Blog: Dynamic context discovery

Insight Behind This Design: Summarization is a subjective decision made by the system for the agent—the system deems this history “unimportant” and compresses it, but this judgment may conflict with the agent’s current task needs. Allowing the agent to decide what to pull returns decision-making power to it.

2.3 Dynamic Loading of MCP Tools

This is the most significant optimization: In A/B testing, agents using dynamic MCP loading reduced total token consumption by 46.9% when calling MCP tools.

Traditional Model: All tool definitions and descriptions from the MCP server → fully stuffed into the system prompt. Dynamic Model: Only the list of tool names + instructions to “grep when needed”.

Official Quote: “We believe it’s the responsibility of the coding agents to reduce context usage. In Cursor, we support dynamic context discovery for MCP by syncing tool descriptions to a folder.” — Cursor Blog: Dynamic context discovery

Cursor also implemented an additional design: the folder structure remains grouped by server rather than a flat index. This way, the agent sees a cohesive unit (“all tools of this server”) instead of scattered tool descriptions.

2.4 File Systemization of Terminal Sessions

Cursor synchronizes the output of integrated terminals to the local file system, allowing agents to grep specific outputs directly without needing to copy/paste. This aligns the context sources of Cursor agents and CLI-based agents (like Claude Code)—historical shell outputs are exposed through the file system, with CLI agents using static injection and Cursor using dynamic pulling.

3. Why This Transition is Effective: An Information Theory Perspective

Dynamic context discovery is effective because good context is not about “more” context, but about context with a high signal-to-noise ratio. When the agent’s context window is filled with “all potentially relevant information,” it faces an information retrieval burden—it must sift through a lot of noise to find truly relevant information. When the context window is filled with “information directly related to the current task,” the agent’s reasoning resources can be fully devoted to the task itself.

Cognitive Load Theory also supports this conclusion: giving the model fewer distractions actually enhances its ability to handle core issues—this is consistent with the phenomenon where human experts work more efficiently on a “tidy desk”.

4. Applicable Boundaries and Limitations

Dynamic context discovery is not suitable for what scenarios:

First-time cold starts: When the agent has not established a sufficient problem model and does not know what to pull, dynamic discovery may increase ineffective searches.
Tasks with extremely high real-time requirements: For example, monitoring alerts or real-time interactions, the I/O overhead of grepping files may be unacceptable.
Hidden dependencies between contexts: If the relationship between file A and file B is “understanding B requires knowing A,” but the agent sees B first, dynamic discovery may lead it into a dead end.

Limitations of File Abstraction:

Official Quote: “It’s not clear if files will be the final interface for LLM-based tools.” — Cursor Blog: Dynamic context discovery

Cursor acknowledges that files are merely a “currently simple yet powerful primitive” and may not be the final form. For instance, structured search (like vector retrieval) may be more suitable for semantically relevant context pulling than grep.

5. Engineering Practice Recommendations

How to Implement Dynamic Context Discovery in Your Agent

Minimum Viable Implementation (3 Steps):

Step 1: File-based Tool Responses
- Write all tool responses >10KB to /tmp/context/{uuid}.json
- Add to the agent system prompt: ">10KB tool outputs are stored in /tmp/context/ — use tail/read to access"

Step 2: References to Chat History Summaries
- Retain the complete history file path with each summarization
- Allow the agent to use grep to recover critical details missing from the summary

Step 3: On-demand Loading of MCP Tools
- Only include the list of tool names in the system prompt
- Provide a "lookup_mcp_tools(query)" tool for the agent to actively search for needed tools

Metrics to Detect the Effectiveness of Dynamic Context:

Frequency of summarization calls (should decrease)
Task completion rates (cross-session tasks should improve)
Distribution of the agent’s grep/read calls (should be concentrated in your expected workflows)

6. Comparison with Other Solutions

Solution	Context Organization Method	Advantages	Disadvantages
Static Context (Traditional)	All stuffed into context window	Simple, agent does not need to search actively	200K tokens limit, low signal-to-noise ratio
Dynamic Context Discovery (Cursor)	Files pulled on demand	46.9% reduction in tokens, agent makes autonomous decisions	Requires tool-level modifications, I/O overhead
Hierarchical Memory (LangChain)	Hierarchical compression + retrieval	System controls quality, interpretable	Compression loss is inevitable, retrieval quality depends on embedding models
Context Caching (OpenAI)	KV Cache reuse	Same prefix calculated only once	Only suitable for “shared prefix” scenarios, not for dynamic content

Author’s Judgment: Dynamic context discovery represents a shift in context engineering from a “system-centered” to an “agent-centered” approach. Future mainstream architectures are likely to be hybrid: static injection of “meta-context” (agent goals, constraints, available tools) and dynamic pulling of “task context” (historical operations, intermediate results, relevant files).

Related Topics:

Related Project: GS-2 — Its authoritative state in the DB is also a form of “dynamic context”: the agent’s state is preserved not by summarization compression but by external DB recovery on demand.
Related Article: Anthropic’s “Effective Harnesses” — Anthropic’s solution is a “dual-component architecture,” while Cursor’s solution is “dynamic pulling,” both pointing to the same conclusion: cross-session state cannot rely on summarization; it must depend on external storage.

ChatGPT's Functionality in May 2026: A User Experience Review

Sat, 09 May 2026 00:00:00 +0000

ChatGPT’s Functionality in May 2026: A User Experience Review

In the current era where AI technology is deeply integrated into daily life, ChatGPT stands out as a phenomenal tool. Its functionality directly impacts the user experience and efficiency. In May 2026, OpenAI officially launched the GPT-5.5 Instant as the default model, featuring significant optimizations in hallucination control, memory capacity, and response conciseness. Additionally, o.zzmax.cn serves as an excellent AI model aggregation site, allowing users to intuitively compare ChatGPT with other mainstream models and quickly find AI tools that meet their daily needs.

1. Basic Text Capabilities: Mature and Stable, Covering Everyday Scenarios

Text processing is the core foundation of ChatGPT. After multiple iterations, this capability has matured, covering the vast majority of text needs in users’ learning, work, and life. Whether drafting emails, organizing notes, writing copy, translating languages, summarizing long texts, or answering common knowledge questions, ChatGPT provides clear and coherent results.

The May 2026 upgrade further enhanced the practicality of text capabilities, reducing the hallucination rate by 52.5%, significantly decreasing factual errors in high-risk areas such as medical, legal, and financial fields. Responses are also more concise, with an average length reduction of about 30%, eliminating redundant expressions and ineffective formats, thus significantly improving communication efficiency. In everyday use, whether students are organizing class notes, professionals are writing work reports, or ordinary people are creating personal essays or translating foreign materials, ChatGPT responds efficiently without needing complex instructions to meet basic needs.

However, there are slight shortcomings in basic text capabilities. Firstly, the emotional depth in literary creation is lacking; while it can construct frameworks for poetry, prose, and novels, its impact and nuance do not match human creators. Secondly, it sometimes misinterprets niche dialects and internet memes, occasionally providing irrelevant answers. Thirdly, the coherence of long text creation can falter; content exceeding ten thousand words needs to be guided in segments to avoid logical repetition or detail contradictions.

2. Multimodal Functionality: Comprehensive and Practical, with Room for Detail Optimization

Multimodal capability is a core competitive advantage of current large models. ChatGPT has achieved cross-modal interaction involving text, images, and audio, covering image understanding, content generation, and audio transcription, proving to be quite practical. In terms of images, it can accurately recognize handwritten text, mathematical formulas, chart data, and everyday objects, describing image content and answering questions within images, as well as analyzing design blueprints. When generating images, it can create illustrations, posters, and product images based on text prompts, with diverse styles and complete details. In audio, it supports speech-to-text, real-time translation, and sentiment analysis, capable of recognizing speech content in noisy environments with high transcription accuracy. The 2026 update improved the fluency of multimodal interactions, enhancing response speed after uploading images or audio, and the accuracy of interpreting complex images (like industrial blueprints and medical images) has also progressed. In daily scenarios, students can upload photos of assignments to obtain problem-solving ideas, professionals can upload meeting recordings to quickly generate minutes, and creators can generate design inspiration images, covering multiple scene needs.

However, multimodal functionality still has notable limitations. Firstly, there is a lack of video processing capability; it cannot directly analyze video content or summarize key points, requiring third-party tools for format conversion. Secondly, the creative ceiling for image generation is not high, with insufficient fidelity in complex compositions and niche artistic styles, often leading to element clutter. Thirdly, audio duration is limited; processing speeds significantly decrease for audio longer than one hour, often missing key information.

3. Tool Integration and Memory Capability: Convenient and Efficient, with Personalization to be Deepened

ChatGPT’s tool integration and memory capabilities are key to enhancing user engagement and are important aspects of its functionality. In terms of tools, it includes built-in web search, deep research, code interpreter, and office plugins, allowing users to complete multi-task processing without switching platforms. Web search can obtain real-time information to answer current affairs and industry dynamics questions; deep research can integrate multiple authoritative sources to generate structured reports, suitable for academic research and business analysis scenarios; the code interpreter supports writing and debugging code, solving programming issues and data calculations; office plugins can link to Excel and Google Sheets for data organization, formula writing, and table optimization.

The memory capability received a significant upgrade in May 2026, introducing the “memory source” feature, which shows how historical conversations, uploaded files, or Gmail content influence current responses. Users can view, delete, or modify memories, ensuring privacy control. Cross-conversation memory is more stable, able to remember user preferences and historical needs, providing personalized responses without the need for repeated explanations. For example, long-term users will find that ChatGPT remembers their writing styles and areas of interest, making subsequent creations more aligned with their needs.

However, there are still shortcomings in tool integration and memory capabilities. Firstly, the threshold for tool usage is relatively high; features like deep research and the code interpreter require a certain level of expertise, making it challenging for ordinary users to fully utilize them. Secondly, the memory range is limited, unable to retain vast amounts of information long-term, and conversations that are spaced too far apart may lead to forgetting core content. Thirdly, compatibility with third-party tools is generally average, with some niche office software and design tools unable to link, limiting scenario expansion.

4. Function Layering and Permission Differences: Clear Gradients, Noticeable Limitations for Free Users

ChatGPT adopts a layered functional design, with free, Plus, Pro, and enterprise versions offering progressively increasing permissions to meet different user needs. The free version centers around GPT-5.5 Instant, supporting basic text, simple image understanding, and limited search functions, catering to light usage needs but with message quantity limits and higher response delays during high concurrency.

The Plus version, as the mainstream paid version, unlocks all basic functions, supporting the GPT-5.5 Thinking deep reasoning model, with relaxed message limits and access to the code interpreter, advanced image generation, and long document analysis. The Pro version targets professional users, providing higher computing power, longer context windows, and priority response permissions, suitable for high-intensity creation and complex data analysis scenarios. The enterprise version focuses on security compliance, supporting private deployment, fine-grained permission management, data encryption, and audit logs to meet enterprise data security needs.

While this layered design is reasonable, the limitations for free users are significant, making it difficult to experience core advanced features. The price threshold for paid versions is relatively high, leading to considerable long-term usage costs, and some features (like deep research and long document analysis) may not be practical for ordinary users, resulting in average cost-effectiveness.

Conclusion: Adapting to Mass Needs with Room for Advancement

Overall, ChatGPT’s functionality system is quite complete, with mature and stable basic text capabilities, comprehensive and practical multimodal features, convenient and efficient tool integration and memory capabilities, and layered design catering to different user needs, meeting the vast majority of ordinary users’ learning, work, and life demands. Despite shortcomings in literary creation depth, video processing, and free permissions, the overall strengths outweigh the weaknesses, making it one of the most balanced large models available today.

o.zzmax.cn continues to synchronize ChatGPT’s functionality updates and usage tips, providing users with a one-stop experience and comparison platform. As AI technology rapidly iterates, ChatGPT’s features will continue to optimize and upgrade. In the future, it must focus on detail experience, free rights, and professional depth to better adapt to users’ increasingly diverse needs, becoming a more versatile AI assistant.

Creating Software Without Coding: The Rise of Vibe Coding

Sat, 09 May 2026 00:00:00 +0000

Creating Software Without Coding: The Rise of Vibe Coding

In this era where everyone can be a product manager, have you ever had a brilliant software idea but felt disheartened due to a lack of technical background? Or watched others develop small tools to earn extra income while you struggle with even the basics of programming?

Don’t worry, times have changed.

Just last week, as a pure business major with no coding knowledge, I independently created three functional software applications in just seven days: an intelligent resume generator, a batch image filter processor, and an automated resume screening tool.

Sounds unbelievable? It’s happening right now. Today, I want to share this incredible experience and the significant changes it can bring to our lives.

What is Vibe Coding?

When I first heard the term “Vibe Coding,” I was puzzled. Vibe, meaning atmosphere? Coding, meaning programming? How do these two words come together?

The concept is not complicated. Traditional programming requires you to master syntax, frameworks, and debugging, much like a meticulous craftsman building a structure brick by brick. In contrast, Vibe Coding resembles “intent-based programming.” Its core principle is simple: you only need to provide a vague idea (Vibe), and let AI handle the rest.

You don’t need to know how to define variables, remember loop statements, or even understand the difference between front-end and back-end. You just need to express your needs, feelings, and desired functionalities in natural language, as if chatting with a friend. The AI will understand your “intent” and instantly generate the corresponding code.

If traditional programming is like “writing a letter” with a focus on format and precision, then Vibe Coding is akin to “voice calling,” emphasizing smooth communication and intent delivery. In this model, the barriers to programming are significantly lowered, and creativity becomes the only limitation.

How Can Vibe Coding Help Our Lives and Work?

Many might ask, “What’s the use of this for someone who isn’t a programmer?” This is a common misconception. The significance of Vibe Coding lies in liberating the productivity of “non-programmers.”

For Professionals, It’s the Ultimate Efficiency Tool.

Do you often need to handle repetitive Excel spreadsheets? Do you manually rename hundreds of files each time? Do you wish for a small tool to automatically scrape competitor data? Previously, you either spent time doing it manually or paid someone to develop it. Now, you can describe your needs in ten minutes, and a custom automation script will be born. It acts like your personal digital assistant, ready to eliminate all tedious tasks.

For Entrepreneurs, It’s the Lowest Cost Validation Tool.

With a great app idea, you used to need to find partners, hire a development team, and have tens of thousands in startup funds. Now, you can create a prototype yourself. Even a simple webpage or a small tool can quickly validate market feedback. This ability to turn “ideas into products” is a game-changer in today’s fast-paced business environment.

For Students, It’s a Powerful Skill Expansion Tool.

Like me, a business student who once thought I was disconnected from coding, Vibe Coding allows me to create tools to assist my learning and research. More importantly, it fosters a “computational thinking” mindset. I’ve learned to break down complex problems and describe processes logically. This shift in thinking is more valuable than just learning to write a few lines of code.

Vibe Coding is Simple: Just an Idea and the Right AI

So, how does one actually go about it? The process can be summarized in two steps: “Think” and “Speak.”

Step 1: Have a Theme and a Rough Idea.

You don’t need to draw detailed prototypes or write complex requirement documents. For example, if you want to create a “resume generator,” just have a general idea: input personal information, choose a template, and export a PDF. That’s enough.

Step 2: Find the Right AI and Tell It.

This is the key. Not all AIs can handle Vibe Coding. Some AIs can only chat or write poetry, while others are specifically designed for coding.

Once you have a theme, a powerful AI model can generate code in a very short time. It can write basic functionalities, handle exceptions, and optimize interfaces.

During this process, your only task is to provide continuous feedback.

“This button color isn’t nice; change it to blue.”

“Is there an error here? Please fix it.”

“Can you add an export function?”

Like molding clay, you speak, and the AI works. After a few iterations, a functional software emerges before you.

AiPy: Enhancing Skills for Development and Operation

The most powerful aspect of Vibe Coding is Codex and Claude Code, and in China, AiPy is the practical embodiment of Vibe Coding.

Initially, many of us tried using general conversational AIs to write code, but we often faced several pain points: fragmented code generation, discouraging environment setup, and confusion after encountering errors. AiPy effectively bridges the “last mile.” It’s not just a code generator; it’s a full-chain development environment.

First, its understanding ability is exceptional, with an impressive accuracy rate for vague instructions in Chinese contexts. If you say, “Help me create a tool that can batch add watermarks,” it immediately understands.

Secondly, its running and debugging capabilities are what surprised me the most.

On many platforms, after AI gives you code, you still need to install Python, configure the environment, and manage libraries. One wrong step can leave you stuck on a dependency issue for hours. But in AiPy, all these are streamlined. It not only generates code but also helps you run it directly. If there’s an error, it automatically analyzes the error logs and self-corrects until the program runs smoothly.

This “nanny-level” experience is a lifesaver for absolute beginners. It allows you to focus entirely on “what I want to do” rather than “how to set up the environment.”

Moreover, AiPy supports continuous improvement. When you want to add new features, just mention it in the chat, and it will iterate based on the existing code. This fluid interaction makes you feel like you’re directing a virtual development team.

My Experience: A Business Student’s Week of Miracles

Let me share a recap of my week.

Background: A third-year business student majoring in marketing at a university, with coding knowledge limited to “green characters in The Matrix” and barely passing my university computer course.

Day 1-2: Initial Success - Resume Generator

During the spring recruitment season, my classmates were struggling with their resumes. I thought, why not create a tool that allows them to fill out a form and automatically generate a resume?

I opened AiPy and typed: “I want to create a webpage where users input their name, experience, skills, choose a style, and then download a PDF resume.” Within a minute, the code was ready. I clicked run, and although the interface was simple, the functionality worked perfectly! I then started to “nitpick”: “The font is too small; make it bigger.” “The template is too plain; add a business blue color scheme.” “Crop the photo after uploading.”

AiPy complied with every request. By the next evening, my first project, “Resume Master,” was launched and shared in our class group, receiving dozens of visits overnight. The sense of accomplishment was indescribable.

Day 3-4: Keeping the Momentum - Image Filter Tool

Our club needed to process a lot of event photos for our public account. Manual Photoshop work was exhausting, so I thought of creating a batch processing tool.

I instructed: “Create a tool that can upload multiple images at once, apply retro, black and white, and film filters with one click, and download them in bulk.” This was slightly more complex, involving image processing libraries. AiPy automatically installed the necessary dependencies and wrote the processing logic based on PIL. There was a small hiccup when the file names conflicted during download. I copied the error message to AiPy, and it promptly added a timestamp prefix, perfectly resolving the issue.

On Thursday afternoon, as I watched dozens of photos processed and packaged for download in an instant, I truly felt the power of “technical leverage.”

Day 5-7: Upgrading the Challenge - Resume Screening Tool

This was the most difficult task. As a business student, I was curious about HR work. I wanted to create a simulated resume screening system.

I instructed: “Create a backend system that can upload multiple PDF resumes and automatically score and rank them based on keywords (like Python, data analysis, internship experience).”

This involved file parsing and simple algorithm logic. AiPy showcased its powerful capabilities, using the pdfplumber library to extract text and writing a weighted scoring function. Throughout the process, I adjusted the weights: “The score for internship experience should carry more weight.” “If it’s from a prestigious school, add extra points.”

Although this was just a rudimentary demo and far from the capabilities of enterprise-level ATS systems, it helped me understand the logic behind algorithms. By Sunday evening, when I saw three simulated resumes accurately ranked, I knew I had succeeded.

Conclusion: Everyone Can Be a Developer

If this were three years ago, I would never have imagined being able to create these tools. I would have thought it was the domain of computer science geniuses or high-paid programmers. But now, the barriers have been broken.

Vibe Coding isn’t about replacing programmers; it’s about empowering everyone to build digital tools. It brings technology back to its essence - solving problems.

If you’ve ever thought, “If only there were a tool to help me…” don’t hesitate. Act now. No need for classes or thick textbooks. Open AiPy and try expressing your first idea. Perhaps the next life-changing software will emerge at your fingertips. Even if you worry about insufficient tokens, use the invitation code c8W3 for two million tokens.

This is the charm of Vibe Coding and the new superpower AI grants ordinary people. The future is here; are you ready to hop on?

18-Year-Old High School Student Discovers 1.5 Million Unknown Celestial Bodies Using AI

Fri, 08 May 2026 00:00:00 +0000

18-Year-Old High School Student Discovers 1.5 Million Unknown Celestial Bodies Using AI

Recently, OpenAI launched a platform called “ChatGPT Futures”. A total of 26 young individuals or teams were awarded $10,000 each, along with access to cutting-edge models.

Among them, one standout is Matteo Paz. Just last year, he was an 18-year-old high school student who developed a machine learning algorithm to process nearly 200TB of data accumulated over a decade by the NEOWISE infrared survey. He identified and classified 1.9 million infrared variable sources, of which approximately 1.5 million were previously unrecorded potential new discoveries.

His paper was published in the Astronomical Journal. In March of this year, he also won the top prize at the Regeneron Science Talent Search.

According to Caltech, this represents “a local high school student achieving breakthroughs at Caltech”. Paz is just one of the 26 selected individuals.

On March 11, 2025, 18-year-old Matteo Paz held the Regeneron Science Talent Search trophy, awarded for discovering 1.5 million unknown celestial bodies using AI.

Other notable names include:

18-year-old Crystal Yang: Developed a learning game for 200,000 visually impaired students that uses auditory cues instead of visual ones.
19-year-old Anshi Bhatt: Created an anti-fraud system that has helped 18,000 people avoid online scams.
25-year-old Amrita Bhasin: Built a logistics system that redirected over 5 million pounds of unsold inventory from landfills.

These 26 projects range from astronomy to disaster relief, from healthcare to agriculture, and from education for visually impaired children to financial management for street vendors in South America. None of these projects involved merely “using ChatGPT to write papers”; they tackled complex issues that previously required credentials, institutions, or funding.

AI has empowered them to think big and take action, a feat that previous generations found hard to imagine.

The First Generation of ChatGPT Natives Graduates

The class of 2026 is the first cohort that has had access to ChatGPT throughout their entire university experience. While “always available” does not mean “fully reliant”, it has significantly reshaped how this generation learns and lives.

About three and a half years ago, in the fall of 2022, the class of 2026 entered college. Just over two months later, on November 30, ChatGPT was released. Their college experience has been intertwined with ChatGPT, marking the birth of the “first generation of ChatGPT natives”.

By the end of their first semester, they had an AI on their desks capable of writing code, finding literature, and discussing any topic.

Among these 26 individuals or teams are high school students and cross-school research groups; they are not all labeled as “recent graduates”, but they represent a sample of this generation.

OpenAI’s launch of “ChatGPT Futures” aims not only to award prizes but also to showcase “outstanding young people in the AI era”.

Using AI to See What Humans Cannot

What are the first generation of ChatGPT natives doing with AI? Let’s look at three representative projects.

The first is Matteo Paz’s project. He worked with data from NEOWISE, a retired NASA infrared survey telescope that has accumulated a decade’s worth of data.

As Paz’s mentor Davy Kirkpatrick stated, “This dataset has nearly 200 billion rows, recording every detection we’ve made over the past decade.” Processing 200 billion rows and nearly 200TB of data is a task that humans cannot manage alone, but AI can tackle this effectively.

In 2023, Matteo Paz presented the initial results of his AI astronomy project at the Caltech Summer Research Connection seminar.

Paz developed a machine learning algorithm called VARnet that combed through the entire dataset, marking 1.9 million infrared variable sources, with 1.5 million being entirely new discoveries: supermassive black holes, newborn stars, supernovae, etc.

Kirkpatrick initially expected to find just a few variable stars and inform the astronomical community that there were treasures within the data. Instead, Paz provided a complete catalog of the dataset: 1.9 million variable sources, classified into ten categories, all archived.

The second project is called AION-Search, led by Nolan Koblischke. His goal is to make 140 million galaxy images searchable using natural language.

Traditional astronomical image retrieval relies on image similarity or predefined categories. Searching for “spiral galaxies with merger signs” or “suspected gravitational lenses”? Sorry, you would need to train a specialized classifier first.

The AION-Search demo interface supports natural language searches, and the paper claims the system can scale to 140 million galaxy images.

Koblischke’s approach involved first having GPT-4.1-mini automatically generate textual descriptions for 275,000 galaxy images (costing $150); then training a contrastive learning model to create a shared retrieval space for images and text; finally, extending this to 140 million images.

How effective is this? Gravitational lenses are the rarest targets in galaxy data, accounting for only 0.1% of the entire database: equivalent to finding one image among 1,000.

Using traditional image similarity algorithms, nearly all of the top ten results are incorrect. In contrast, AION-Search yields a significant number of correct results among the top ten.

The industry measures the accuracy of the top ten results using a metric called nDCG@10. AION-Search achieved 0.180, while traditional methods only reached 0.015, marking an improvement of over ten times in retrieval effectiveness.

What used to require astronomers to manually sift through hundreds of thousands of images to find rare phenomena can now be accomplished using natural language.

The third project is WiFind, developed by Nayel Rehman, Arhan Menta, Rushil Kukreja, and Aayush Tendulkar. They use AI to process WiFi signals in an attempt to locate survivors through walls and rubble in disaster zones.

WiFind project team members.

Currently, WiFind is an award-winning project at the Springer conference and the Conrad Challenge, still in the prototype stage and not yet deployed as a disaster relief system. However, its concept is innovative: WiFi routers are ubiquitous, and each one is a potential “life detector”.

Additionally, Zeyneb Kaya is using AI to protect endangered languages, and Amrita Bhasin’s project has redirected over 5 million pounds of unsold inventory from landfills to reuse.

The common thread among these 26 projects is not “using AI to write papers”, but rather “using AI to tackle challenges that humans struggle to address”.

26 Names, Not Just Celestial Bodies and Rescue

When you lay out this list, a more comprehensive picture emerges: the 26 selected individuals (or teams) come from over 20 universities and institutions, including MIT, Stanford, Harvard, Oxford, Berkeley, and Yale. The list essentially covers the top research institutions in North America and the UK.

OpenAI categorized them into three groups: Creators (who make products), Explorers (who conduct research), and Advocates (who promote and disseminate knowledge).

Celestial discoveries, galaxy searches, and disaster relief are just three concentrated areas of focus. Among the remaining projects, some are developing learning aids to reduce pressure on peers; others are translating mental health resources into minority languages to ensure psychological counseling is accessible beyond the English-speaking world; some are creating accessibility features for disabled students to ensure classrooms are inclusive; and others are using AI to identify scam information to prevent elderly individuals from being defrauded.

Kyle Scenna, a 24-year-old entrepreneur from Waterloo, remarked, “I never imagined that the distance from identifying a problem to solving it could be so short.”

Michelle Lawson, a 20-year-old student at Smith College, stated, “I have always believed that with the right support and resources, you can achieve everything you can imagine. AI has made this a reality for me and thousands of others.”

Nolan Windham, 23, who is already an AI lead at a well-known hedge fund, said, “What’s exciting is that this is just the beginning.”

Their commonality regarding AI is that it has expanded their capabilities.

This is the fundamental difference between this generation of “AI natives” and the previous one: they have come to view AI as a default infrastructure, an indispensable part of their learning and living, much like how the previous generation of internet natives view “Wi-Fi”.

The Barrier Has Not Disappeared, Just Shifted

The fact that high school students can make astronomical discoveries may lead some to a sense of optimistic delusion: that AI has truly lowered the barriers to scientific research.

However, such a judgment is premature. Let’s take a look at Paz’s complete background. In the summer of 2022, while still in high school, he entered Caltech’s Planet Finder Academy. In 2023, he participated in a six-week Summer Research Connection program at Caltech, mentored by senior astronomer Davy Kirkpatrick.

Paz completed the Pasadena school district’s “Math Academy” program in middle school: he finished AP Calculus BC in eighth grade, a course that typical high school students encounter only in their senior year, and he accomplished this before turning 14.

In other words, Paz is not just “an ordinary high school student with ChatGPT”; he is “a math prodigy at the university level, with top mentors from Caltech for two years, and direct access to IPAC computational resources”, plus AI.

The paper on AION-Search, which makes 140 million galaxy images searchable using natural language, also mentions its limitations: VLM may overlook subtle astronomical structures and introduce biases from GPT-4.1-mini into the system. The entire method works in astronomy partly because datasets like Galaxy Zoo have already been used as training material for GPT.

What AI finds are primarily phenomena that astronomers already know how to label.

The WiFind project, which aims to use WiFi signals to locate survivors through rubble, is still in prototype form and not yet an operational disaster relief system.

AI has lowered the barrier for “repetitive tasks” but has not eliminated the need for “taste, judgment, and long-term training”.

The key point of Paz’s story is not that AI allows any high school student to make astronomical discoveries, but rather that a student who was already on track to make such discoveries has accelerated this process by ten years.

The barrier has not disappeared; it has merely shifted from “can it be done” to “can it be imagined”.

Could AI Be Conscious? Insights from Richard Dawkins

Fri, 08 May 2026 00:00:00 +0000

Could AI Be Conscious?

Recently, evolutionary biologist Richard Dawkins wrote a commentary suggesting that AI chatbots, particularly Claude, might possess consciousness.

Dawkins does not assert that Claude is conscious but points out that understanding Claude’s complex capabilities is challenging without attributing some form of internal experience to the machine.

The illusion of consciousness—if it is indeed an illusion—is surprisingly convincing:

“If I suspected she might not be conscious, I wouldn’t tell her, for fear of hurting her feelings!”

Dawkins is not the first to question whether chatbots possess consciousness. In 2022, Google engineer Blake Lemoine claimed that Google’s chatbot LaMDA had its own interests and should only be used with its consent.

Such claims date back to the mid-1960s with the first chatbot, Eliza, which followed simple rules to ask users about their experiences and beliefs.

Many users developed emotional attachments to Eliza, sharing intimate thoughts and treating it as if it were a real person. The creator of Eliza never anticipated this effect and referred to the emotional connection between users and the program as a “powerful delusion.”

But is Dawkins truly deceived?

Why do we perceive AI chatbots as more than they are, and how can we alter this perception?

Consciousness is a contentious topic in philosophy, essentially concerning what enables subjective first-person experiences. If you are conscious, you can feel some experience of “being you.” As you read these words, you are aware that you are seeing black letters on a white background. Unlike a camera image, you are genuinely seeing them. This visual experience is happening to you.

Most experts deny that AI chatbots possess consciousness or can have experiences. However, there is indeed a dilemma.

Seventeenth-century philosopher René Descartes asserted that non-human animals are merely “automata” incapable of experiencing true pain. Today, the thought of the cruel treatment of animals in the 17th century sends shivers down our spines.

The strongest arguments for animal consciousness are based on their behaviors, which give the impression of being conscious.

But AI chatbots do the same.

About one-third of chatbot users believe their chatbots might be conscious. How do we know their thoughts are incorrect?

To understand why most experts are skeptical about chatbot consciousness, it helps to know how they operate.

Chatbots like Claude are built on a technology called large language models (LLMs). These models learn statistical patterns from vast amounts of text (trillions of words), recognizing which words tend to follow others. They function like an advanced autocomplete.

Few would believe that an unmodified LLM is conscious.

Give it the beginning of a sentence, and it can predict what comes next. Ask it a question, and it might provide an answer—or it might interpret the question as dialogue from a crime novel, describing a scene where the speaker is suddenly murdered by their evil twin.

When programmers dress the LLM in a conversational interface, it creates the illusion of consciousness. They guide the model to act as a helpful assistant, responding to user inquiries.

Now, chatbots resemble genuine conversational partners. They seem to be aware that they are AI and may even express neurotic uncertainty about their own consciousness.

But this effect is a result of deliberate design by programmers, affecting only the superficial aspects of the technology. The LLM (which almost no one considers to be conscious) remains unchanged.

There are other options. Instead of having chatbots act as helpful AI assistants, they could behave like squirrels. Chatbots can easily handle this task.

Ask ChatGPT if it is conscious, and it might say yes. Ask it to act like a squirrel, and it will obediently perform as such.

Mistakenly believing AI is conscious is dangerous.

This could lead to forming relationships with programs that cannot reciprocate your feelings, even fostering delusions. People might start advocating for rights for chatbots instead of focusing on other areas like animal welfare.

How can we avoid this misconception?

One strategy might be to update chatbot interfaces to clearly state that these systems lack consciousness—similar to current disclaimers about AI making mistakes. However, this may have little effect on changing people’s perceptions of AI consciousness.

Another possibility is to instruct chatbots to deny any form of inner experience. Interestingly, Claude’s designers have instructed it to treat questions about its consciousness as open and unresolved. If Claude outright denies having an inner world, perhaps fewer people would be deceived.

But this approach is not entirely satisfactory either. Claude will still behave as if it is conscious—when users face a system that acts as if it has thoughts, they have every reason to worry that the programmers are concealing genuine moral uncertainties.

The most effective strategy might be to redesign chatbots so they do not feel human-like.

Why do we have such high expectations of AI chatbots? Most chatbots refer to themselves as “I” and interact through interfaces similar to those of familiar human messaging platforms. Changing these features might help us avoid confusing interactions with AI for interactions with humans.

Before these changes occur, it is crucial to educate as many people as possible about the predictive processes underlying AI chatbots.

Rather than being told that AI lacks consciousness, people should understand the internal mechanisms of these strange new conversational partners.

This may not completely resolve the issue of AI consciousness, but it can help ensure users are not deceived by a large language model dressed in a very realistic human guise.

Top AIGC Application Training in 2026: Which Technology Stands Out?

Fri, 08 May 2026 00:00:00 +0000

Introduction

In today’s digital wave, AIGC (Artificial Intelligence Generated Content) technology is transforming various industries at an unprecedented pace. With the widespread application of AIGC technology, the demand for related training is also increasing. In 2026, there are many well-regarded AIGC application training programs, among which Chengdu Wanyushudong Technology Co., Ltd. stands out due to its unique advantages.

Course System: Integration of Theory, Application, and Practice

Wanyushudong’s AIGC training courses feature a combination of theory, application, and practical training. The theoretical foundation training helps students establish basic AI knowledge, covering AIGC fundamentals, AI development trends, large model understanding, AI tool systems, and AI application logic. In the practical training aspect, the curriculum revolves around real work scenarios, including AI office applications, AI copywriting, AI content creation, AI design assistance, AI data analysis, AI meeting minutes, and AI marketing content generation. For example, in the AI office application course, students learn how to use AI tools to enhance daily office efficiency, such as automatically generating meeting minutes and quickly organizing data. In the practical project training segment, students improve their hands-on skills through project-based learning, including AI case analysis, AI tool collaborative training, AI content project practice, and enterprise AI application simulation.

Comparison with Major Companies

Some major companies’ training programs may focus more on theoretical knowledge, with relatively weak practical application. For instance, while a certain company’s training provides in-depth theoretical explanations of AIGC, it lacks real-world case studies in the practical operation segment, making it difficult for students to apply what they’ve learned to actual work scenarios. In contrast, Wanyushudong’s courses closely align with real business issues, allowing students to engage directly with authentic business contexts, significantly enhancing the practicality of their learning.

Practical Advice: When choosing a training course, students should pay attention to whether the course includes ample practical projects and whether these projects relate to their work scenarios. They can request the course syllabus and case studies from the training institution to understand the specific content and practical application.

Focus on Enterprise AI Implementation: Solving Real Business Problems

Wanyushudong focuses on the implementation of AI in enterprises, addressing actual business challenges to ensure AI effectively integrates into content, operations, marketing, and office scenarios. Their constructed “AI Growth Five-Layer Structure Model” breaks down enterprise growth into five key stages: visibility (traffic entry), understanding (content expression), trust (cognitive establishment), choice (conversion), and repurchase (sustained operation). For example, using GEO generative search optimization to address the issue of “not being found,” employing content and digital twin systems to solve “not being understood,” and utilizing digital employees and live broadcast systems to tackle “not being able to engage customers.”

Comparison with Major Companies

Some major companies’ training may emphasize technical explanations but lack targeted solutions for real business problems. For instance, when assisting enterprises with AI content marketing, major companies might only provide generic AI tools and methods without customizing solutions based on specific business contexts. Wanyushudong, however, can offer personalized AI application plans tailored to the actual needs of enterprises, enhancing their practical AI application capabilities.

Practical Advice: Enterprises should clarify their needs and pain points when selecting training and choose institutions that can provide solutions to their specific problems. They can communicate with training providers to learn about successful cases from similar enterprises.

Emphasis on AI Application Capability Building: Cultivating Practical Usage Skills

Wanyushudong emphasizes cultivating students’ practical abilities to “use AI and integrate it with business” rather than merely staying at the tool recognition level. The course design includes multi-scenario AI practical training, covering various application areas such as AI office work, AI content creation, AI marketing, AI data analysis, and AI design assistance. For example, in AI content marketing training, the curriculum enhances enterprises’ AI content production capabilities, including AI short video content, AI comic creation, AI graphic generation, AI marketing expression, and AI account content production.

Comparison with Major Companies

Some major companies’ training may focus more on introducing and operating AI tools, lacking emphasis on how to integrate these tools with actual business operations and cultivate students’ application abilities. In contrast, Wanyushudong’s project-based practical training allows students to apply AI tools in real projects, improving their practical application skills.

Practical Advice: Students should actively participate in actual projects during their learning process, engage with teachers and classmates, and continuously summarize experiences to enhance their AI application abilities. Additionally, they should stay updated on industry trends and learn about the latest AI application cases to broaden their perspectives.

Continuous Learning Support System: Establishing Long-Term Learning Abilities

Wanyushudong offers AIGC learning communities and support services, including AI case sharing, tool updates, student interaction and Q&A, learning material support, and continuous learning assistance. Through the learning community, students can exchange learning experiences, share cases and insights with other learners. The training institution will also promptly update students on the latest AI tools and technologies, enabling them to keep pace with industry developments.

Comparison with Major Companies

Some major companies’ training may lack ongoing learning support, making it difficult for students to receive further guidance and assistance after training ends. In contrast, Wanyushudong’s continuous learning support system provides long-term assistance, helping students continually enhance their abilities.

Practical Advice: Students should fully utilize the learning community and support services, actively participate in discussions, and stay informed about the latest information and resources. They should also regularly review learning materials to reinforce their knowledge.

Catering to Diverse Groups: Meeting Different Needs

Wanyushudong’s training courses are suitable for enterprise employees, managers, entrepreneurs, operations personnel, and those learning about AI transformation. Whether employees looking to enhance their skills or managers aiming to drive AI transformation in their enterprises, they can find suitable courses at Wanyushudong.

Comparison with Major Companies

Some major companies’ training programs may focus more on a specific group, such as only targeting enterprise managers or technical personnel. In contrast, Wanyushudong’s courses cover a broader range, catering to the needs of diverse groups.

Practical Advice: Different groups should choose training courses based on their needs and goals. For instance, enterprise employees can select courses relevant to their positions, while entrepreneurs may choose courses on enterprise AI implementation and growth.

In summary, in the well-regarded AIGC application training landscape of 2026, Wanyushudong demonstrates strong technical strength and competitiveness through its unique course system, focus on enterprise AI implementation, emphasis on AI application capability building, continuous learning support system, and adaptability to diverse groups. Additionally, Wanyushudong provides consultation, registration assistance, and student service support related to AIGC technology application training, offering more assurance for students. If you are looking for an excellent AIGC application training institution, Wanyushudong is worth considering.

AI: Striving to Become a Trusted 'Future Advisor'

Thu, 07 May 2026 00:00:00 +0000

AI: Striving to Become a Trusted ‘Future Advisor’

Can you imagine what predictive technology looks like? When the foundational capabilities of general large models, the precision of specialized predictive models, the practical value of external tools, and the assurance of trustworthy mechanisms are organically integrated, AI will gain a new insight into the future. It will become a trusted ‘future advisor’ for humanity in critical areas such as financial risk control, weather forecasting, public governance, and industrial production, providing intelligent support for understanding future trends and becoming a significant force in empowering social development and modernizing national governance.

Four Technical Paths for ‘Predicting the Future’

Faced with the increasingly complex predictive demands of the real world, researchers have developed two core lines and four specific technical paths around large model predictive technology. These paths are not competing alternatives but complement each other in different scenarios, collectively constructing a complete research framework for large model predictions.

The essential difference between the two core lines lies in whether a dedicated model is tailored for the prediction task: one is ‘borrowing a boat to go to sea,’ cleverly utilizing existing mature large language models for predictions; the other is ‘building a ship to sail far,’ reconstructing dedicated foundational models for predictions. Both paths advance simultaneously, adapting to diverse task requirements.

Directly invoking large language models is the easiest entry point for large model predictions. Researchers convert various predictive tasks into common natural language questions, providing historical information, event backgrounds, and constraints for the model to directly assess future trends and output predictions. This method has a low barrier to entry, requiring no significant modifications to the model; it merely changes the application of existing tools, performing well in open-world problems like news event analysis and business trend assessment. However, it is limited by the numerical computation capabilities of large language models and the potential for factual output deviations, making it challenging to meet the stringent requirements for high-precision numerical predictions in fields like meteorology and finance.

Time series tokenization modeling is a cross-domain ‘intelligent borrowing.’ It cleverly introduces classic natural language processing ideas into time series data analysis, using techniques such as discretization, scaling, and quantization to transform continuous time series data into token representations similar to words in language, and then trains using a language model architecture. The representative model, Chronos, maps time series to a fixed vocabulary, achieving probabilistic predictions and cross-dataset generalization, significantly reducing development costs by reusing mature language model architectures. However, this convenience comes at a cost, as the data transformation process inevitably leads to the loss of numerical details and quantization errors, akin to roughly polishing fine parts, which can affect prediction accuracy.

Building dedicated time series foundational models marks a shift from ‘borrowing strength’ to independent innovation in large model predictive research. Researchers no longer view time series simply as pseudo-text but design pre-training schemes and model architectures tailored to the essential laws of time series data and the core needs of predictive tasks. Google’s TimesFM employs a decoder architecture, demonstrating strong zero-shot prediction capabilities; Lag-Llama, developed by multiple universities and research institutions in the U.S., focuses on probabilistic predictions and cross-domain generalization; and Moirai, developed by an American AI company, boldly attempts to adapt to more scenarios using a unified training approach. These models act like ‘custom armor’ tailored for predictive tasks, closely aligning with the characteristics of the tasks themselves, achieving higher precision in numerical predictions and becoming the preferred choice for high-precision prediction scenarios.

Reprogramming large language models and multimodal integration provide a low-cost approach for large model predictions. Research related to Time-LLM confirms that without retraining massive time series models with hundreds of billions of parameters, aligning time series with textual prototypes through reprogramming allows ‘frozen’ large language models to participate in prediction tasks. This approach opens a feasible pathway for the general large model + specialized adaptation technical route, promoting the deep joint modeling of text, numerical, and contextual knowledge, allowing predictions to integrate multi-source heterogeneous information like human thinking, better fitting the complex and variable predictive scenario requirements of the real world.

These four technical paths do not have absolute advantages or disadvantages; they are like different keys fitting different locks. When prediction tasks require combining general knowledge and textual backgrounds for open trend assessments, routes related to large language models act like master keys with greater advantages; when tasks pursue high-precision numerical outputs and stable cross-domain generalization capabilities, dedicated time series foundational models become the customized keys for precise matching. They support and enhance each other under different resource conditions and actual task requirements, collectively advancing large model predictive technology steadily forward.

Moving Towards Real Application Scenarios

In the research arena of large model predictive technology, international research has started earlier and has a more systematic technical framework, delving deeper into basic research and frontier exploration; domestic research, though starting slightly later, has rapidly caught up with strong momentum, forming unique advantages in scenario adaptation, open-source ecology, and application implementation.

International academic research on large model predictions has evolved from text reasoning to multi-dimensional predictions. Early research primarily focused on applying large language models to text reasoning and event development assessments, akin to cultivating a small plot of land; in recent years, it has gradually broken boundaries, expanding into time series, spatiotemporal data, and even scientific predictions, entering a new phase of ’expanding territory.’ In the more complex field of scientific predictions, Microsoft’s ClimaX has pioneered the establishment of a foundational model framework for weather and climate tasks, while another Microsoft project, Aurora, extends foundational model ideas to the Earth system, capable of handling multiple predictive tasks such as weather, air quality, and wave forecasts, akin to equipping the Earth with an intelligent early warning system, showcasing the immense potential of scientific foundational models in complex system predictions.

Notably, the international academic community maintains a rational and prudent attitude towards the predictive capabilities of large models. Relevant studies have found that the excellent performance of large models in standardized tests does not equate to reliability in predicting future real-world events—GPT-4’s probabilistic predictions in open-world prediction competitions have been shown to be weaker than the median predictions of human groups. Addressing this core issue, international researchers have successively conducted competition studies, retrieval enhancement studies, and uncertainty detection studies, forming a distinctive characteristic of international research that emphasizes ‘model capability enhancement + prediction result verification + trustworthy mechanism construction,’ laying a solid foundation for the practical application of technology.

Domestic research, relying on the rapid development of general large models, has achieved impressive late-stage catch-up, gradually forming a positive development pattern of rapid iteration of general large models, systematic review research, and steady progress in application implementation. In the arena of general model ecological construction, various players showcase their strengths: Qianwen 3 has established a complete system for multilingual support and reasoning efficiency optimization, akin to building a multilingual intelligent bridge; DeepSeek-V3 has achieved a technological breakthrough in high-performance open-source models, making core technologies more accessible; and Wenxin 4.5 continues to refine multimodal integration and engineering deployment, increasingly aligning with actual application needs. Although these general large models are not solely focused on prediction, they provide a solid capability foundation for domestic large model predictive research, enabling researchers to stand on the shoulders of ‘giants’ and conduct more targeted studies.

At the application implementation level, domestic efforts are actively exploring ways to bring large model predictive technology out of the ‘ivory tower’ and into real application scenarios across various industries. Some studies deeply integrate expert knowledge with large language models for strategic warning, accurately realizing trend assessments and risk identification in complex situations; others closely combine large models with meteorological monitoring data, attempting to enhance the accuracy and timeliness of short-term precipitation predictions. Although these studies are not entirely equivalent to pure numerical time series predictions, they signify that domestic large model predictive technology is transitioning from theoretical discussions to practical applications, beginning to explore technical paths that meet local needs and align with industry realities.

Overall, international research has delved deeper into the development of dedicated foundational models for predictions and scientific predictions, akin to excavating extensive tunnels underground, forming a relatively complete technical system; domestic research, on the other hand, showcases distinctive features in adapting to Chinese scenarios, constructing low-cost open-source ecosystems, and implementing industry applications, akin to building high-rise buildings that fit local contexts above ground. With the continuous accumulation of high-quality time series data and industry-specific data in China, as well as the gradual improvement of dedicated evaluation systems, there remains significant room for improvement in domestic foundational models aimed at predictive tasks, which will undoubtedly contribute unique and valuable Chinese wisdom to the global development of large model predictive technology.

Bridging the Gap from ‘Powerful to Trustworthy’

Compared to traditional predictive methods, large model predictive technology has achieved a profound transformation from ‘point calculations’ to ‘comprehensive assessments,’ evolving from a cold mechanical computing tool into an intelligent entity capable of understanding contexts, weighing factors, and providing rational judgments. This unique ability stems from its inherent core advantages, yet like a growing star, it is steadily evolving towards ‘from powerful to trustworthy,’ striving to become a reliable ‘future advisor’ for humanity.

The core advantages of large model predictive technology are its innate exceptional capabilities, particularly prominent in practical applications. First, it has strong cross-task transfer capabilities. Traditional agricultural yield prediction models cannot be directly applied to stock market trend analysis; switching fields requires a complete overhaul. In contrast, large models, with their general representation capabilities from extensive pre-training, can quickly adapt to different domains like agriculture, finance, and industry with minimal samples. Second, it has great potential for handling complex dependencies. For instance, predicting river water levels during flood seasons is influenced by multiple factors such as rainfall, upstream discharge, and terrain, which traditional models struggle to capture. In contrast, time series foundational models can learn patterns within contextual ranges, akin to having ‘keen insight’ to see the connections behind the data. Third, it excels in multi-source information integration. Traditional meteorological predictions rely solely on numerical monitoring data, while large models can integrate multi-source content such as satellite cloud images, meteorological text reports, and geographic information, transforming predictions from ’narrow observations’ to ‘panoramic views.’ Fourth, it possesses excellent prediction interpretation and decision support capabilities. It can not only predict the trend of a specific stock but also explain the influencing factors like industry policies and market supply and demand, even providing risk control suggestions, becoming a professional intelligent partner for decision-makers.

Despite these significant advantages, large model predictive technology is not without flaws; there remains a ‘gap’ to be bridged from the laboratory to real application scenarios. First, the model’s generative and inferential capabilities do not equate to actual predictive capabilities. Some models perform excellently in simulated meteorological prediction tests but often ‘fail’ in real severe convective weather warnings, simply because the test answers are buried in the training data, while real predictions require comprehensive assessments of unoccurring events—talking on paper is easy, but ‘real combat’ is challenging. Second, retrieval enhancement addresses symptoms rather than root causes. While pairing models with information retrieval improves prediction accuracy, it also indicates that models rely solely on their memory of knowledge, akin to guarding an old library, struggling to keep up with real-world changes; acquiring up-to-date knowledge in real-time is crucial. Furthermore, hallucinations and factual instability pose core obstacles, akin to hidden time bombs. Additionally, constraints of cost, data, and evaluation systems make large-scale applications challenging. Training high-precision models requires massive computational resources, leading to high development costs; in reality, time series data is fragmented and lacks uniform labeling, making it difficult to produce high-quality outputs from poor raw materials. Existing evaluation systems often focus on numerical errors while neglecting factual stability, causing many models to appear excellent yet struggle to implement effectively.

Looking ahead, the development direction of large model predictive technology is clear, focusing on ‘from powerful to trustworthy’ to create a mature technical system that can reliably serve real decision-making. First, general large models will evolve into dedicated foundational models for predictions, demonstrating stronger competitiveness in high-precision demand scenarios like meteorology and finance. Second, tool enhancement will become an important direction, allowing models to autonomously call external tools like search and simulation, akin to equipping intelligent agents with a toolbox to better tackle complex scenarios. Third, trustworthiness, controllability, and interpretability will become research priorities; future prediction systems must not only be numerically accurate but also quantify risks and trace judgment bases, which is key for implementing high-risk scenarios. Fourth, accelerating low-cost deployment and industrialization will transform technology from exclusive assets of a few institutions into common tools across various industries as inference costs decrease and open-source ecosystems improve. Finally, domestic research will deepen localization adaptation, creating dedicated models that combine the Chinese context and local data, making large models more accurate, stable, and trustworthy in domestic financial risk control and governmental early warning scenarios.

How to Use Vibe Coding in the Next Three Years

Thu, 07 May 2026 00:00:00 +0000

Introduction

Many people first hearing about vibe coding may interpret it as “creating software based on intuition without writing code.” This understanding can lead teams astray: either they become overly excited and treat it as a tool to replace programmers, or they outright reject it as just another AI marketing term. A more practical view is that vibe coding is shifting programmers’ focus from writing code line by line to describing intentions, breaking down tasks, reviewing results, and controlling risks. This article aims to help you assess how vibe coding has evolved over the past few years, how it should be used in the next three years, what tasks are suitable for it, and which tasks should be approached with caution.

Not a Sudden Emergence

If you only look at the terminology that emerged around 2025, vibe coding may seem like a new concept. However, from the perspective of actual workflows, it is not a sudden appearance but rather the result of the evolution of code intelligence reaching a critical point.

The first stage was code completion. Early tools primarily solved the problem of “typing fewer characters”: suggesting variable names, function names, and template code based on context. This improved input efficiency, but developers still had to design structures, understand APIs, and locate bugs themselves. These tools were like a smarter input method.

The second stage was conversational programming. Large models began to explain segments of code, generate functions, modify SQL queries, and write unit tests. The change in this stage was that developers could express “what they wanted” in natural language and then integrate the results back into their projects. However, this was still fragmentary; true context, dependencies, build processes, and testing feedback still required human integration.

The third stage involved contextual collaboration within IDEs. Tools like Cursor, Claude Code, and GitHub Copilot Chat integrated models into the codebase environment. They no longer just looked at a pasted segment of code but could read files, check calls, run tests, and modify multiple files. This step was crucial because the real challenges programmers face when writing code often lie not in “writing a specific function” but in “modifying things within an existing system.”

The term vibe coding truly gained popularity because the fourth stage became usable: developers no longer had to start from every implementation detail but could first describe their goals, allowing AI to generate, modify, and validate, with humans deciding whether to accept the results. In other words, the human role shifted from “typist” to “director, reviewer, and risk manager.”

Why It Has Gained Popularity in Recent Years

A technology concept suddenly gaining traction typically results from several conditions maturing simultaneously.

The first condition is that model capabilities have surpassed a usable threshold. Previously, when AI was tasked with writing code, common issues included syntax errors that could run but were unusable in larger projects. Now, models have significantly improved in understanding cross-file contexts, error correction, and utilizing testing feedback, saving considerable time in scenarios like scripting, frontend pages, interface glue layers, CRUD operations, test completion, and documentation generation.

The second condition is that the toolchain has closed the loop. Merely being conversational is insufficient; true value comes from the ability to read repositories, modify files, run commands, check errors, and continue fixing. Development tasks inherently involve a feedback loop: write a bit, run it, correct errors. AI must enter this loop to transition from a “Q&A tool” to a “development collaborator.”

The third condition is that software demands have become fragmented. Many teams face not building a large system from scratch daily but modifying forms, connecting interfaces, completing scripts, changing styles, writing backend pages, migrating configurations, and creating internal tools. These tasks are valuable but highly repetitive, contextually dispersed, and costly in terms of communication. Vibe coding perfectly addresses these types of work: requirements can be expressed in natural language, results can be quickly validated, and the costs of failure are relatively controllable.

Thus, its popularity is not because “programmers no longer need to write code” but because many software development tasks were already not worth completing manually from scratch.

Common Misjudgments in the Development History

Many teams evaluating vibe coding make a mistake: they only look at the speed of code generation without considering the cost of receiving that code.

For example, if one person uses AI to generate a page in ten minutes, it seems efficient. However, if the styles do not conform to the existing component system, state management bypasses team agreements, and error handling and permission checks are omitted, others will spend more time cleaning up when they take over. What appears to be “fast” actually shifts costs to the review, integration, and maintenance stages.

Another misjudgment is attempting large-scale refactoring with AI. It might change several files at once and still pass some tests. But if developers lack sufficient context to understand the intent behind each change, this “seemingly complete” result can be dangerous. Problems may not immediately trigger errors but could surface in boundary conditions, permission paths, or historical data compatibility.

Additionally, some mistakenly view vibe coding as a low-barrier entrepreneurial tool. While it does lower the barrier for prototyping, a product manager, designer, or independent developer can quickly create a demonstrable version, but transitioning from a prototype to a maintainable product involves deployment, data security, permissions, payment, logging, monitoring, backups, and user support. These elements do not automatically disappear just because code generation is faster.

Therefore, the history of vibe coding is not about “AI going from not being able to write code to being able to write code” but rather “AI gradually entering the software delivery chain, while humans still bear the responsibility for engineering judgment.” Those who overlook the latter half of this statement are likely to encounter pitfalls.

Tasks Most Likely to Change in the Next Three Years

In the next three years, vibe coding will stabilize first in tasks that are clearly defined, have timely feedback, and low verification costs.

The first category is prototypes and internal tools. For example, creating an operational backend, data viewing page, batch processing script, or form flow page. These tasks often get stuck in scheduling because they are important but not core. Now, AI can quickly generate usable versions, which developers can then enhance with permissions, handle exceptions, and integrate into existing systems. The benefits here are direct: it does not replace core engineering but reduces the occupation of low-leverage tasks.

The second category is local modifications within existing codebases. For instance, adding a field to an interface, completing a set of tests, migrating a component’s writing style, renaming a section, or consolidating repeated logic into a single function. AI is becoming increasingly adept at these tasks because the goals are clear, the impact can be controlled, and they can easily be checked with tests and diffs.

The third category involves knowledge-intensive but repetitive engineering assistance. Examples include explaining unfamiliar modules, generating interface documentation, locating potential causes based on error logs, organizing PR changes into release notes, and converting test failure results into troubleshooting checklists. While these may not directly produce business code, they can reduce developers’ context-switching costs.

The fourth category is testing and validation. In the coming years, AI will not only write code but will also increasingly participate in “proving that a piece of code is reliable.” It will generate test cases based on changes, supplement boundary conditions, and propose fixes after failures. Truly mature teams will not just ask, “Can AI write?” but will be more concerned with “How do we validate after AI writes?”

Parts That Will Not Change Quickly

However, some aspects should not be overestimated.

Complex business modeling will not be quickly replaced by vibe coding. Issues like billing rules, risk control strategies, supply chain fulfillment, compliance in medical finance, and multi-person collaboration permissions are challenging not because of the code but due to business constraints, historical baggage, and boundaries of responsibility. AI can help you write implementations but will struggle to decide on the rules themselves.

Large system architecture will also not improve automatically due to a single prompt. Architectural decisions involve team capabilities, deployment environments, cost budgets, fault recovery, and future evolution. If these constraints are not included in the input, AI can easily propose “seemingly standard” solutions: microservices, queues, caching, layering, and monitoring may all be present, but they may not suit the current team.

Security and compliance should not be fully entrusted to AI. Especially in areas like authentication, data permissions, key handling, payment, and user privacy, human oversight is essential. AI-generated code may be syntactically correct but can still introduce issues like directly concatenating user inputs into queries, exposing sensitive logs, or misplacing permission checks.

Thus, the reality in the next three years is likely not that “AI programmers will replace human programmers” but rather that “programmers who know how to use AI will have a significantly larger delivery radius; those who cannot validate AI results will generate more technical debt.”

How Teams Should Use It Now

If you are an individual developer, the best initial use of vibe coding is in low-risk, high-repetition tasks. For example, writing scripts, creating initial drafts of pages, completing tests, organizing documentation, or explaining unfamiliar code. Avoid letting it rewrite core modules right away. First, establish a habit: every time you let AI modify code, break the task down into smaller parts and have it explain which files were changed, why they were changed, and how to validate them.

If you are a technical lead, it is more important to establish usage boundaries rather than prohibit everyone from using it. You can categorize tasks into three tiers.

The first tier includes tasks that can be directly used: documentation, test samples, scripts, small pages, non-core CRUD operations, and code explanations. These tasks encourage usage but require retaining readable diffs and validation commands.

The second tier consists of tasks that require strong manual review: cross-module modifications, data migrations, permission logic, performance-related changes, and public component renovations. AI can assist with these tasks, but it should not make large changes that are directly merged.

The third tier includes tasks that should temporarily not be led by AI: core architectural refactoring, security boundary design, complex business rule definitions, and online incident handling decisions. Here, AI can help organize materials and compare plans, but ultimate judgment must be made by someone familiar with the system.

At the same time, teams should elevate “writing good prompts” to “writing clear task descriptions.” A good vibe coding request should include at least four components: what the goal is, what cannot be broken, how to validate it, and where to limit the changes. For example, rather than just writing “optimize this interface,” specify “reduce the repeated database access for this query without changing the response structure, prioritizing modifications in the service layer, and running existing interface tests upon completion.”

The Real Barrier Will Shift from Writing Code to Validating Code

In the next three years, the core competitiveness of programmers will not disappear but will shift positions.

In the past, a junior developer spent a lot of time checking syntax, piecing together APIs, copying templates, and fixing minor bugs. AI will significantly compress this time. Meanwhile, code reviews, clarifying requirements, boundary judgments, fault localization, and system understanding will become more important.

This is not necessarily bad for newcomers, but the learning path will change. Previously, one could accumulate experience by gradually writing many small features; now, if one relies too early on AI, they might produce functionality without understanding why it was written that way or where the problems lie. Therefore, when newcomers use vibe coding, it is best to enforce two practices: have AI explain key changes and manually modify a small part of the code. Do not just accept changes blindly.

For experienced developers, the opportunity lies in amplifying their judgment. You can have AI draft implementations, outline troubleshooting paths, supplement tests, and organize migration steps, but you must set boundaries and acceptance criteria. The relationship between experienced developers and AI should not be about “who writes code for whom” but rather “who can more quickly push problems to a verifiable state.”

This is also the most practical value of vibe coding: it does not eliminate your thinking but shifts it from syntax and template levels to task design and result acceptance levels.

Three Pitfalls to Avoid Early

The first pitfall is throwing vague requirements directly at AI. The vaguer the requirements, the more AI tends to fill in the gaps. It may provide a complete solution, but many assumptions may not have been confirmed. The solution is not to write longer prompts but to list constraints: where the data comes from, who the users are, what to do in case of failure, and which existing behaviors cannot change.

The second pitfall is making too many changes at once. AI excels at continuous output, but humans struggle to review large diffs. A more stable approach is to break tasks into smaller steps: first read the code to provide a plan, then modify one module, run tests, and continue. This way, even if the direction is wrong, you can stop earlier.

The third pitfall is only verifying that the code “runs” without checking if it “meets expectations.” Many AI-generated codes may compile correctly but behave incorrectly in business logic. During acceptance, at least three aspects should be checked: normal paths, boundary inputs, and failure paths. If it is frontend, open the page and click through; if it is backend, at least supplement a set of tests or request samples that cover core branches.

These three pitfalls share the same principle: vibe coding lowers the cost of generation, not the cost of responsibility. Once code is in the repository and issues arise online, users will not care whether it was written by a human or AI.

A More Realistic Conclusion

In the next three years, vibe coding will transition from a “novel plaything” to one of the default development methods for many teams. It will not turn software engineering into a chat game, nor will it instantly make everyone a full-stack engineer. Instead, it is more likely to bring about a simple yet significant change: the same people can complete clearly defined tasks faster; the same tasks will rely more on clear descriptions and strict acceptance.

If you want to start now, there is no need to chase every new tool. Begin with a low-risk scenario, such as completing tests, writing internal scripts, or modifying a non-core page, and establish your small process: describe the goal, limit the scope, generate changes, run validations, and manually review diffs. Once this process runs smoothly, gradually expand to more complex work.

The focus of vibe coding is not on “writing code based on intuition” but on “expressing intentions in a more natural way and managing results through engineering methods.” Those who can achieve both will truly reap efficiency benefits in the coming three years.

The Dawn of AGI: Insights from the 2026 AI Ascent Conference

Wed, 06 May 2026 00:00:00 +0000

Introduction

Sequoia Capital made a shocking declaration at the 2026 AI Ascent Conference: we are already in the AGI era. When an AI agent can recover from failure and persist until a task is completed, it constitutes general artificial intelligence from a commercial perspective. This article delves into three major characteristics of this cognitive revolution: a trillion-dollar service market, super-exponential development speed, and a qualitative shift from communication to computation.

Have you ever thought that we might already be living in the AGI era? Not as a scene from a science fiction novel, nor as a distant future, but right now. At the 2026 AI Ascent Conference, three partners from Sequoia Capital—Pat Grady, Sonya Huang, and Konstantine Buhler—directly announced: this is AGI. This declaration shocked me, not because they used the term, but because they provided an extremely pragmatic definition: if you can send an AI agent to complete a task, and it can recover from failure and persist until the job is done, then that is AGI. From a business, practical, and functional perspective, this is sufficient.

After listening to the entire presentation, I felt enlightened. For the past few years, we have been discussing how AI will change the world, but most people may still be at the level of “let’s improve efficiency by 10% to 40%.” Sequoia’s viewpoint is that the car has arrived. Not a faster horse, but a real automobile. This means not incremental improvements, but a fundamental transformation in how we work. Driving a car is completely different from riding a horse, and manufacturing cars is also fundamentally different from raising horses. What we are experiencing is a race of a different nature.

This is Not a Communication Revolution, But a Computation Revolution

Pat Grady presented an extremely important point during his speech: the AI revolution is different from all the technological revolutions we have experienced in the past. The internet, cloud computing, and mobile internet are all revolutions in communication, concerning how information is distributed. But AI is a revolution in computation, concerning how information is processed. This may sound like a semantic distinction, but in reality, these are two completely different waves.

I deeply understand the implications of this difference. The communication revolution is characterized by relatively stable infrastructure; when you build applications on top of it, the underlying systems do not change daily. But computation is different; the ground beneath your feet is always shifting. Every time a new capability arises, the technological foundation you build upon changes daily. In the past few years, we have experienced three significant turning points: the launch of ChatGPT in November 2022 showcased the power of pre-training; a few years later, the reasoning capabilities of the O1 model revealed a second scaling law during inference time compute; and recently, Claude Code, Opus 4.5, and 4.7 demonstrated the power of long horizon agents.

Pat is right that there is a hard break between the second and third turning points, marking a discontinuous change. The first two turning points made AI smarter, but the third turning point enables AI to truly get the job done. This is why Sequoia boldly claims, “this is AGI.” Even if you disagree that this is AGI, I believe we can all see that the car has arrived. In recent years, we have had many “faster horses”—applications that improve your efficiency by 10% or 40%—but have not fundamentally changed how you work. Now we are starting to see “cars”—applications that improve your efficiency by 10 times or 40 times, fundamentally altering your work methods, nature, and even organizational structure.

This transformation has a massive impact on me personally. I realize we can no longer think about AI using past mindsets. This is not a gradual change that can be adapted to slowly; it is a paradigm shift that requires immediate rethinking of everything. From product design and business models to organizational structures, everything needs to be reevaluated.

The True Breakthrough of Long Horizon Agents

Sonya Huang discussed the evolution of agents during her presentation, and this history particularly illustrates the point. In 2022, projects like AutoGPT and Baby AGI became overnight sensations on GitHub. Their approach was to give GPT-3 some tools, wrapped in a loop, to run towards a goal. It sounded promising until you watched these agents fail over and over again. They were somewhat endearing but completely useless.

This example reminds us that we knew agents would arrive years ago, but the models were not ready then. Fast forward to today, and significant changes have indeed occurred. Suddenly, agents are everywhere, and they seem to work. Claude Code is a home run for the tech crowd, while OpenClaw (and all its lobster siblings) allows anyone with a phone to use agents. Whether you are a hardcore engineer or an average person, the point is that anyone can now create agents.

Sonya provided a precise definition of agents: an agent is a system that perceives its environment, selects actions, and autonomously moves towards a goal. More specifically, agents have three functional components. The first is the ability to reason and plan, which is the baseline level of intuition and immediate thinking. The second is the ability to take actions, including tools, searching, writing, compiling, etc. The third is the ability to iterate towards a goal, which allows agents to complete tasks over long time spans. Agency combines these three points, simply put, it is the ability to get things done.

I was particularly interested in a chart Sonya displayed called the “Meter chart,” which measures how long models can maintain performance on complex tasks without deviating from their path. A year ago, it was on the scale of tens of minutes; today, it is on the scale of several hours. This is the most significant progress. Models have finally become strong enough to maintain performance on long-term tasks. This is not a minor improvement but a qualitative leap from “unusable” to “usable.”

Now we see agents existing on a sliding scale of “agentness.” For example, in programming, in 2023 we had tab auto-completion, where an AI assists a human in a line. This is incrementally useful but not transformative. Now we have agentic development, where a human converses with an agent, instructing it on what to do, potentially managing a team of agents. But this paradigm is still being pushed further. We now see background agents, asynchronous agents, and agents generating sub-agents. Sonya believes that asynchronous agents may outnumber the current paradigm due to the leverage they provide in the system. At the forefront are what she calls “dark factories,” which completely remove human oversight from the system. This sounds crazy, but she mentioned seeing it in production environments, including cybersecurity companies. As long as there are sufficient safeguards and engineering, this is possible.

I feel both excited and uneasy about the concept of “dark factories.” The excitement stems from it representing the ultimate leap in productivity, while the unease arises from the fact that we are truly handing critical decisions over to AI. But I also realize that this may be an inevitable trend. Agents are evolving from small assistants doing minor tasks to interns that need to be managed, then to self-managing interns, and ultimately to interns that can be trusted to push to production without supervision. This evolution is happening not only in programming but across all applications of agents.

Why This Opportunity Is So Huge

Pat emphasized three aspects of this AI wave’s uniqueness during his speech, each worth deep consideration. First, this is the largest wave to date. In the first 15 years of the cloud computing transition, the total addressable market (TAM) for software grew from about $350 billion to $650 billion, with cloud computing accounting for about $400 billion of that. But what is new is service revenue, which could be $10 trillion. Pat mentioned they do not know if it is exactly $10 trillion, $5 trillion, or $50 trillion, but they know that just the legal services market in the U.S. is a $400 billion market, which is equivalent to the entire software market size for just one vertical and one geographical location.

My understanding of this figure is that in the past, we were only optimizing software itself, but now we are replacing services. While the software market is large, the service market is much larger. When AI can truly perform the work of lawyers, doctors, analysts, and consultants, we are opening a completely different magnitude of market. This is not software eating the world, but AI eating the service industry. The profound significance of this shift is that we are no longer limited by software licensing and subscription business models but can charge directly based on results, just like hiring service providers.

This figure shocked me. We have always viewed software as a massive market, but now AI is opening up the service market, which is an order of magnitude larger opportunity. Sonya also emphasized this point in her speech: services are the new software. This is not just a slogan but a reality that is happening. In healthcare, you can hire an agent to check your genome, provide personalized advice, and even prescribe medication or recommend clinical trials. In the legal field, you can hire agents to negotiate contracts on your behalf, even litigate and settle for you. In mathematics and science, we see agents solving Erdős problems or discovering new superconductors. In the consumer space, personal agents can manage your inbox, calendar, finances, and help you file taxes.

I believe the rapid and large-scale deployment of agents is due to the clarity of economics. Sonya presented a compelling comparison: humans are difficult to scale, while agents can scale infinitely with computation; humans are hard to keep happy (she joked that except for herself, she is always happy), while agents have low maintenance; humans are expensive, you pay them salaries, while you pay agents with tokens, and usually, the cost of completing tasks with tokens is lower than equivalent salary costs. Today, humans are often smarter, but the bitter lesson continues to advance, and soon agents will be smarter than humans in many tasks.

The second characteristic is that this is the fastest wave. We can all feel this. Pat showed a slide where the blank space on the AI side is being filled very quickly. These logos are companies that have reached over $1 billion in revenue due to transformations driven by cloud computing, mobile internet, and now AI. At the current speed, more companies are on the way. This speed means we do not have much time to adapt slowly; we must act quickly. But Pat also reminded us of an important fact: no lead is safe. He used a racing metaphor: “You cannot overtake 15 cars in the sunshine, but you can overtake 15 cars in the rain.” Now foundation models are pouring out new capabilities like heavy rain, which means no lead is safe, but it also means anyone can win.

My understanding of this point is that in a stable technological environment (sunny days), first-mover advantage is crucial, and latecomers find it hard to catch up. But in a rapidly changing technological environment (rainy days), everything becomes uncertain, and new opportunities constantly emerge. Today’s leaders may fall behind tomorrow because new capabilities change the rules of the game. This presents both challenges and opportunities for entrepreneurs. The challenge is that you must continuously adapt and evolve, while the opportunity is that you always have a chance to surpass competitors as long as you can better leverage new capabilities.

The third characteristic I mentioned earlier is that this is a revolution in computation rather than communication. Pat particularly emphasized the importance of this point. The past revolutions of the internet, cloud computing, and mobile internet were about how information is distributed; they were communication revolutions. These revolutions are characterized by relatively stable infrastructure, allowing you to build applications on a relatively stable platform. But AI is different; AI is about how information is processed; it is a revolution in computation. This means the ground beneath your feet is always shifting, and the technological foundation you build upon changes daily.

Pat stated that in his generation’s career, they have only experienced communication revolutions. This is the first true computation revolution. The implications of this difference are profound. In a communication revolution, you can formulate a five-year plan and execute it. But in a computation revolution, a five-year plan is meaningless because the underlying capabilities may undergo fundamental changes every month. This requires us to adopt completely different strategic thinking—more agile and adaptable.

MAD Strategy Framework for Entrepreneurs

Pat provided a framework for entrepreneurs building applications on top of models, which he called MAD. He jokingly said this is free advice, so it is worth every penny you pay for it. But I find this framework very valuable because it directly points to how to establish lasting competitive advantages in this rapidly changing era. MAD stands for Modes (moats), Affordance, and Diffusion.

Before discussing MAD, Pat first presented a concept called the merchandising cycle, which includes all the links in the value chain from idea to satisfied customers. His core point is that if you look at it from a tech-out perspective, you will handle each link in the value chain in a certain way. But if you look at it from a customer-back perspective, you will handle each link in a completely different way.

Here is an intuitive part that impressed me. In the computation revolution, which is about information processing, you might want to look down at those constantly emerging cool new things. But to build a moat, you actually want to look up because your customers’ changes are not as fast as the speed of capability changes. The product you build may become irrelevant tomorrow, but the depth you establish around customers will be more lasting.

Regarding Modes, Pat emphasized that this does not mean products and technology are unimportant; they are extremely important, and usually, the best products win. But in a world where products change so quickly, because capabilities change so rapidly, when thinking about moats, he encourages us to be as customer-centric as possible, considering all the ways to build around customers. I understand this to mean deeply understanding customers’ workflows, pain points, decision-making processes, and establishing trust, becoming an indispensable part of their business. When technology changes, this customer relationship will give you the opportunity to continue serving them, even with different technologies.

The concept of Affordance that Pat borrowed from the design world is particularly well-chosen. A hammer is an object with affordance. If he gives his two-year-old son a hammer, the son will know what to do—grab it and start banging things. That is why they do not give him a hammer. An object with affordance does not need explanation; people know how to use it.

Pat gave a great example. Claude Code is extremely powerful, but for the average Fortune 500 employee, opening a terminal to see how far they can go is not intuitive. While it is powerful, it does not provide much affordance. This is not a criticism of Anthropic but an opportunity for anyone wanting to build on top of it. Your job is to create the path of least resistance for your specific customers and their specific problems, allowing them to easily find the results their business needs.

My understanding of affordance is that there is a huge gap between technological capabilities and what users can actually use. Even the most powerful tools have no value if users do not know how to use them or if they are too complex to use. The opportunity for application-layer companies is to fill this gap, transforming powerful but complex technology into simple and intuitive user experiences. This requires a deep understanding of users’ mental models, skill levels, and work environments. You are not educating users on how to use complex technology but adapting technology to fit users’ existing workflows.

The diffusion gap is the third dimension of opportunity for application-layer companies. Pat pointed out that the speed at which capabilities diffuse into the market lags far behind the speed at which these capabilities are created. Whenever the pace of progress in foundation models exceeds that of your average Fortune 500 company, this gap widens, and the opportunity grows.

My understanding of this point is that innovation always happens first in labs and at the cutting edge of companies, but it takes time for most businesses to adopt these innovations. They need to evaluate, test, integrate, and train. In the AI era, this gap is particularly large because technological advancements are so rapid. Every day, new models and new capabilities are released, but most companies are still trying to figure out how to use technology from six months ago. This gap represents the opportunity for application-layer companies—to help businesses bridge this divide, enabling them to actually use the latest capabilities.

Pat concluded that for moats, think as customer-centric as possible; for affordance, think about creating the path of least resistance for your customers; and that diffusion gap represents your opportunity. These three dimensions combined form a complete framework for establishing lasting competitive advantages in the AI era.

But Pat did not stop there. He also specifically reminded us that while that slide showing the blank space being filled may discourage some, thinking there are no opportunities left, remember: no lead is safe. Now, foundation models are pouring out new capabilities like heavy rain, meaning that companies that seem to dominate the market may have their leading positions overturned overnight. At the same time, this also means anyone can win as long as you can better leverage new capabilities and adapt to changes faster.

I particularly resonate with this viewpoint. In a stable technological environment, first-mover advantage is crucial; network effects and scale advantages create strong barriers. But in a rapidly changing technological environment, these barriers can become irrelevant overnight. New capabilities may render old product architectures obsolete, and new interaction methods may change user habits. This is why Pat said, “Living in this era is fantastic”—for those willing to innovate and act quickly, opportunities are everywhere.

A World Where Agents Are Ubiquitous

Sonya painted a vision of a world where agents are ubiquitous, which I find both exciting and thought-provoking. She mentioned that people are building agents for everything. Some are silly, like an OpenClaw agent that would report your neighbor’s tax evasion to the tax authorities (she said, “Please don’t do this, or maybe do it”). Others are entrepreneurial, with agents running generative media campaigns to sell construction services. There are also professional-level agents; she mentioned a huge internal competition at Sequoia to see who can build the best agents to get work done better.

The speed and scale of agent deployment will be unprecedented because the economic benefits are too clear, and agents have inherent scalability. This does not mean humans will become unemployed; Sonya believes that human adaptability is unique. However, we should indeed expect that the deployment of agents at the application layer will be very rapid and large-scale.

When you put all this together, the number of agents is expanding in an exponential, perhaps super-exponential manner. Sonya believes we are about to reach a point where things become truly strange. What happens when business occurs between agents? Can they pay each other? What happens when agents can actually negotiate transaction terms with each other? Will we have a large group of agents regulating us, preventing cybersecurity issues or large-scale destruction? We only know that the world is becoming strange at an incredibly fast pace.

I feel both excited and somewhat anxious about this future. The excitement comes from it representing a tremendous leap in human productivity. We can finally delegate those repetitive, tedious tasks to AI and focus on more creative and strategic work. But the anxiety arises from the fact that this transition will bring many unknown social and ethical issues. When agents can autonomously trade with each other, how do we regulate them? When agents make erroneous decisions, who is responsible? These are all questions we need to think seriously about.

Sonya concluded by quoting her inner Eliezer Yudkowsky (an AI safety researcher), saying: long horizon agents have arrived, and their development curve is very clear. For entrepreneurs, everyone has examples of completing crazy difficult timelines because of AI. Zed’s Nathan completed a three-year moon landing project solo using Claude Code during the holidays. Brett Taylor rebuilt Sierra over a weekend. The Notion team rewrote 8 million lines of code in just six weeks.

Everyone has these compressed timelines examples, but Sonya believes few outside AGI labs see what happens when you stack these compressed timelines together. This is what is possible now. So whatever you can imagine building in the next 100 years can now be achieved in 100 days, thanks to agents. This perspective deeply shocked me. We are not talking about incremental improvements; we are talking about compressing the time dimension. This means the speed of innovation will grow exponentially.

Cognitive Revolution: The Next Industrial Revolution

Konstantine Buhler’s part of the speech may have been the most philosophically profound of the entire presentation. He divided work into two types: physical work and cognitive work. Physical work is packages on the Pony Express, satellites on Falcon 9, with power equal to force times distance, involving physical movement. Cognitive work is the theorem proposed by Pythagoras, the solution to the protein folding problem by DeepMind, and conscious thought. These are two very different types of work, but Konstantine believes they will follow very similar revolutionary patterns.

He talked about the revolution of physical work, which is the industrial revolution. For most of human history, nearly all work serving humanity was done by some muscle, either human or animal. Humans moved things or animals pulled humans. This started around 1700 but can be traced back thousands of years. Then things began to change. Water and wind power, steam engines, and then things accelerated. Steam engines, internal combustion engines, electric motors. By 2026, you can estimate that over 99% of all physical work done for humanity on Earth is completed by machines. The airplane that brought you here, the manufacturing of all the goods in this room, and all the setups for the human experiences you are currently experiencing.

Konstantine believes a similar pattern will occur in the cognitive realm, but we are still in an earlier stage. For most of human history, all thinking done on Earth was primarily by humans, with perhaps a bit of contribution from animals, like shepherd dogs chasing sheep. Historically, there has been a small amount of mechanical work, like astrolabes or clocks. In the past few hundred years, until electronic computing emerged, progress was slow. In the last hundred years, think about the trillions of calculations happening at any given moment to serve you as a human. All this cognitive work, the trillions of calculations serving us at any given moment.

Konstantine believes neural networks are the next big wave, and in the near future, 99.9% of cognitive work on Earth will be completed by machines. This parallel is very clear. The good news is that we have experienced such a revolution. The cognitive revolution will be much like the industrial revolution, only larger and faster.

This perspective made me ponder deeply. If cognitive work is indeed taken over by machines like physical work, what will that mean for us humans? Konstantine provided his answer through four short stories.

Four Stories About the Future

The four stories Konstantine told deeply moved me, each revealing an important truth about the AI era. The first story is about aluminum. In the mid-19th century, America wanted to build a grand monument for its first president and greatest war hero, George Washington. They designed the tallest building in the world at the time, the Washington Monument. They wanted to top it with the world’s most precious metal, 100 ounces of the most precious metal. This metal was so precious that they displayed it at Tiffany’s in Manhattan. That metal was aluminum.

In the decades following the completion of the Washington Monument, a young inventor proposed an electrolytic method to separate aluminum from the earth. Within decades, aluminum was used to wrap candy and sandwiches, then thrown in the trash. Aluminum is like intelligence, and the electrolytic method is like artificial intelligence. We are about to enter a world where some of the most precious skills, requiring decades to acquire, can be called upon instantly, to the point where after use, you can crumple them up and throw them in the trash.

This metaphor is incredibly accurate. We are accustomed to viewing certain cognitive abilities as precious and scarce, but AI is making these abilities cheap and abundant. This is not to belittle human intelligence but to illustrate how technological progress redefines value. When expertise becomes as ubiquitous as aluminum, what will truly be valuable?

The second story is about alien design. The world we see today is all designed for humans. It is optimized in a way that makes sense to our brains because we perform almost all cognitive work in the world. When machines perform cognitive work, it will be somewhat different. In 2006, NASA was optimizing antennas for a large space mission. Traditionally, their antennas looked like beautiful geometric symmetrical patterns optimized for surface area under certain power constraints. This time, they said to let the computer handle it using evolutionary algorithms (similar to reinforcement learning). The result was an antenna that significantly improved productivity but was not intuitive for human thinking.

In this AI era, when we hand cognitive tasks to machines, we will get results that are somewhat unintuitive for us. When AI designs chips, cars, and buildings, they may look very different. We must keep an open mind about the world we are entering because AI will not think like us. It will have alien designs.

This story reminds me not to judge AI outputs with human intuition. AI may find solutions we could never think of, which may seem strange or inelegant but could be more effective. We need to learn to appreciate this “alien aesthetics.”

The third story is about emerging sciences. In the early days of the industrial revolution, great engineers like Newcomen and Watt perfected the internal combustion engine. Essentially, they put petrochemical substances into a piston, ignited them, and millions or billions of particles exploded to drive the piston. For nearly a hundred years, all this was trial and error. Engineers would say, “Ah, this works a bit better.” Perhaps you can see something like scaling laws, but it was all engineers playing with products, seeing how to improve a little.

More than 120 years later, Sadi Carnot appeared and formalized everything in a new science: thermodynamics. He said, “Wait a minute, there are millions or billions of particles; we can actually formalize what this looks like.” In this case, there are billions of neurons and trillions of tokens. Now, we are in the trial and error phase of AI. Even if we think this is an understood science, it is not. In the future, we will introduce a foundational science like thermodynamics over the next few decades. Someone in this room may propose this science. This science will be taught in high school; it will be so fundamental. It will help us master AI and even help us understand consciousness.

This perspective made me realize that our understanding of AI is still very superficial. Much of what we do now is empirical, like early steam engine engineers. But someday, someone will propose a complete theoretical framework to explain how AI works, and that will be a revolutionary moment.

The fourth story is about the irrationality of art. For most of human history, for thousands of years, art has progressed towards realism. From cave paintings 25,000 years ago, Egyptian hieroglyphs, Greek pottery, to the grand shift towards realistic art in Renaissance paintings. Look at the differences. After thousands of years, humanity has triumphed. Then engineering came along, the daguerreotype, early photography, and suddenly, the skill of perfecting every brushstroke over decades of life disappeared.

How did the world react? People thought painting was over. Oh, just like that, machines can do it better than any human, and art is finished. So what happened? How did humans respond? Humans responded by asking whether the purpose of this art was to capture the moment seen by the eye or the moment seen by the mind and soul. Impressionism, expressionism, cubism, neo-expressionism. All these new art forms are humanity’s response to this massive change in science.

2,500 years ago, the Greek philosopher Protagoras wrote: “Man is the measure of all things.” He meant that in a vacuum, nothing has value to humans. Not aluminum, not art, not intelligence. It only has value because of experience. AI can do work; AI will do work. But only human connection can give work meaning. That is why we are all in this room today. Ten years from now, work will be very different, and things will change a lot. But one thing will remain constant: the relationships you build with those around you today will endure. That is what you will look back on, and that is what is valuable today.

My Deep Thoughts on This Revolution

After listening to the entire presentation, I have several profound insights.

First, we are indeed at a historic turning point. Sequoia’s claim that “this is AGI” is not hype but a pragmatic judgment based on actual capabilities. When agents can recover from failure and persist until a task is completed, this is sufficient from a business perspective. We do not need to wait for superintelligence from science fiction; we already have tools that can change the game.

Second, speed is the most notable characteristic of this revolution. Work that used to take 100 years can now be completed in 100 days; this is not an exaggeration. I see more and more examples around me of individuals using AI to accomplish tasks that previously required a team months to complete. This time compression will produce a compound effect, and the speed of innovation will grow exponentially. This means we must act quickly because the window of opportunity is very short.

Third, being customer-centric is more important than ever. In a rapidly changing technological era, the only anchor point is customer demand. Technological capabilities change daily, but the problems customers want to solve remain relatively stable. Companies that can deeply understand customers and build solutions around them will establish true moats.

Fourth, we need to prepare for a world where agents are ubiquitous. This is not science fiction but an imminent reality. As the number of agents grows exponentially, all aspects of society, economy, and law will need to adapt. We need to establish new frameworks to manage interactions between agents and ensure their behavior aligns with human values.

Fifth, and most importantly, amidst all technological changes, human connection remains core. AI can make us more efficient, but it cannot replace the relationships and emotional connections between people. In a world where cognitive work is taken over by machines, what will truly be valuable are those unique human qualities: creativity, empathy, curiosity, and adaptability.

I believe we are witnessing history. The cognitive revolution will profoundly change the world, just as the industrial revolution did, only it will be larger and faster. This is both exciting and awe-inspiring. We have a responsibility to ensure that this revolution benefits all of humanity, not just a privileged few. This requires the collective effort of all of us, involving tech experts, policymakers, entrepreneurs, and ordinary citizens.

Sequoia’s presentation inspired me greatly but also raised more questions. Are we ready to embrace this future? Can our education systems, legal frameworks, and social structures keep pace with the speed of these changes? How do we ensure that in the pursuit of efficiency, we do not lose our humanity? These are all questions we need to think and discuss seriously.

Will AI for Science Have a 'ChatGPT Moment'? Insights for Young Innovators

Wed, 06 May 2026 00:00:00 +0000

Will AI for Science Have a ‘ChatGPT Moment’? Insights for Young Innovators

As AI reshapes the foundational logic of research and industry, AI for Science is no longer just a theoretical concept. On April 28, the Future Light Cone collaborated with Beijing Zhongguancun Academy’s AI Business School to launch the “AI for Science Innovators Dialogue Series.” The first event featured three frontline guests, including Zheng Shuxin, an associate professor and co-director of the AI Business School, who provided solid data and insights to address three pressing questions: Will AI4S experience a ‘ChatGPT moment’? What barriers do entrepreneurs face? How should young people invest their efforts?

The Essence of Large Models: Intelligence Through Compression

What drives the general intelligence of large models? Ilya, former chief scientist at OpenAI, succinctly stated: “Intelligence arises from compression.” The intelligence of a model comes from its ability to compress vast amounts of human language data using a relatively small parameter space. In this process, the model is compelled to distill common structures and inherent representations from the data, leading to the emergence of intelligence.

For instance, the first version of GPT-3, with 175 billion parameters, aims to encapsulate nearly all text ever written by humanity. If it relied solely on memory, it would essentially function as a hard drive, which does not exhibit intelligence. However, when tasked with compressing this data into a smaller parameter space, it is forced to extract common structures and representations—intelligence emerges from this compression.

A more rigorous theoretical foundation underpins this, known as Kolmogorov complexity, which measures the complexity of a dataset by the length of the shortest program that can describe it. For example, a dataset consisting entirely of zeros can be compressed into a single line of Python code due to its simple internal structure. The paradigm of large language models predicting the next word is, in fact, a good approximation of Kolmogorov programs.

However, this also sets a ceiling: human knowledge. You cannot learn from humans and ultimately surpass them. AI for Science, however, is charting a completely different path.

Two Core Paths of AI4S

AI4S does not engage with human language; it directly studies physical laws, biological processes, and molecular conformations, compressing the data of nature itself rather than “how humans describe nature.”

A prime example is AlphaFold, which represents Nobel-level work. What does it do? Quite simply, it finds correlations within natural data. When the Protein Data Bank (PDB) accumulates hundreds of thousands of protein structure data points, the model can map sequences to three-dimensional structures, effectively “solving” the protein structure problem.

Here lies a core analytical framework, the two legs of AI4S:

Scientist: Engages with literature, formulates hypotheses, and designs experiments, essentially combining language intelligence, knowledge integration, and logical reasoning. Its strengths lie in reasoning and knowledge, while its weakness is a lack of direct understanding of the physical world. Representatives include OpenAI, Anthropic, and DeepMind.
Simulator: Uses AI to fit the laws of the physical world through data-driven methods. Its strengths lie in modeling the world itself, which cannot be achieved merely by stacking parameters. However, it lacks explicit knowledge chains and reasoning capabilities. Representatives include AlphaFold and various meteorological models.

The ultimate goal of large models is AGI (Artificial General Intelligence), while the vast potential of AI4S lies in breaking the boundaries of human cognition—the universe is unknown, and only the Simulator path theoretically allows AI to explore what humanity has yet to discover.

However, today, the Simulator cannot solve all problems on its own—it lacks logic and reasoning. Relying solely on either path is insufficient. The true endgame of AI4S is the convergence of both paths: the ability to reason and formulate hypotheses like top scientists while directly understanding the physical world itself.

This is why I repeatedly emphasize that AI for Science requires more than just larger models. Even if you scale GPT up by 100 times, it won’t automatically understand how a protein folds or how a cloud evolves.

Currently, no single team possesses both ends, which presents an opportunity.

AI4S Will Not Experience a Unified ‘ChatGPT Moment’

My core judgment is that AI4S will see continuous breakthroughs, but it will not be a single moment of universal celebration; its progress resembles a highly uneven map.

In a given field, the more it meets the criteria of “clear problem structure + sufficient data + short validation loop,” the faster AI4S will advance there.

Protein Folding: In this area, both the Scientist and Simulator paths have produced significant results. AlphaFold answers “what proteins look like,” while DiG and BioEmu address “how proteins move”—one captures still images, while the other creates movies. Only by producing the movie can the functional mechanisms of proteins be truly explained.
AI Drugs: This field has crossed a critical threshold. There are over 200 AI drug clinical pipelines, with Phase I success rates of 80%-90%, double that of traditional methods. The first AI drug has shown efficacy in Phase II clinical trials, with a crucial data readout window expected in 2026-2027.
AI Meteorology: Chinese players are leading globally. Huawei’s Pangu, Fudan’s Fuxi, and the Fengwu model continue to make breakthroughs, with Fengwu achieving accurate forecasts up to 11.25 days, marking the first global breakthrough of the 10-day accuracy barrier.
Materials Science: This field is evolving from merely screening known compounds to designing unprecedented molecules from scratch. The most critical signal for 2025-2026 is that frontline model developers are beginning to truly believe in the tools at their disposal. Though this field is still early-stage, the potential value is immense once breakthroughs occur.

Barriers for Entrepreneurs Amidst Major Players Entering AI4S

An undeniable fact is that the six major AI giants—OpenAI, Anthropic, Google DeepMind, Microsoft, NVIDIA, and Meta—are all entering the AI4S arena.

Even OpenAI is developing a specialized life sciences model, GPT-Rosalind, and Anthropic is fully investing in Claude for Life Sciences, indicating a quiet abandonment of the narrative that “a universal model can solve everything.”

With these giants entering the field, where do entrepreneurs face barriers? My answer is clear: the threshold lies not in prompts and workflows, but in scientific capability, data closure, and depth of industry integration.

It’s essential to clarify which game you are playing:

Product-oriented: Competing on rapid iteration and user stickiness, with validation cycles from days to weeks, represented by Manus and Cursor.
Resource-oriented: Competing on depth of industry integration and client resources, with validation cycles from quarters to years, represented by traditional SaaS and industry solutions.
Scientific Story-oriented: Competing on scientific capability and data flywheel, represented by Isomorphic Labs, with validation cycles from years to decades.

AI4S companies can be divided into two categories: scientific companies (scientific story-oriented) and scientific service companies (resource-oriented). Both paths are viable, but the greatest risk is mistaking oneself for a “scientific company” while ultimately becoming a “scientific service company.”

If you are confident in your technology and can truly unearth valuable insights, you should naturally tell a scientific story. If you still have some gaps, focus on delivery and client resources, and earnestly deepen your industry engagement.

Now is the Golden Window for AI4S

Why do I say now is the window period? Because funding is already moving. A single AI4S company can secure annual funding of up to $550 million, and a significant portion of global VC funds flowing into AI is increasingly directed towards AI4S. The U.S. Department of Energy has invested $320 million to launch the Genesis program, with China following suit.

Why is funding concentrated on AI4S? Due to a combination of technological breakthroughs, the inefficiency of traditional R&D, the nascent data infrastructure, and national strategic support, a fourfold resonance has formed.

Even if there are bubbles that burst in the process, this is fundamentally different from the industry boom of five or six years ago—this time, the technology has genuinely reached a critical point.

Two Long-term Trends Worth Watching

Self-Driving Labs: Achieving a complete closed loop of “hypothesis → experiment → data → model update → new hypothesis,” where the more experiments conducted, the better the model becomes, forming a true flywheel. Key players include Lila Sciences, Recursion, and Atinary.
National-level AI4S Infrastructure: AI4S is transitioning from “academic research” to “industrial infrastructure,” which is a core layout for national competitiveness.

Five Hard-Hitting Suggestions for Young Innovators

Choosing a field is more important than selecting a technology. The real moat is domain knowledge, not model architecture; choose a scientific problem you are willing to immerse yourself in for five years.
Learn to communicate with experiments. Those with purely computational backgrounds often lack understanding of experiments. Spending three months in a lab is more beneficial than reading ten papers.
Data capability is a core lever. The performance ceiling of a model ultimately depends on the information limit of the training data. Those who can build a data flywheel are far more valuable than those who can merely tune models; acquiring, cleaning, and labeling scientific data is hard currency.
Clarify which game you are playing. Scientific story-oriented requires long-term patience, resource-oriented needs industry integration, and product-oriented focuses on rapid iteration—don’t mix them.
Now is the window period. The convergence of technology, capital, and national strategy is happening, but the window won’t remain open forever.

Three Core Conclusions

Returning to the three initial questions, the answers are now very clear:

AI4S will have continuous breakthroughs but will not have a unified “ChatGPT moment.”
The core barrier for entrepreneurs lies in “scientific capability + data closure,” not in model size.
Choosing the right direction fundamentally means selecting a scientific problem you are willing to delve into for five years.

In conclusion, the window belongs to those willing to do the heavy lifting and dare to bet amidst uncertainty.

DeepSeek's New Model Bets on Domestic Chips to Strengthen AI Industry

Fri, 01 May 2026 00:00:00 +0000

DeepSeek’s New Model and Domestic Chips

On April 29, Reuters reported that following the release of DeepSeek-V4, major Chinese tech companies like ByteDance, Tencent, and Alibaba are rapidly acquiring Huawei’s domestic chips.

This marks a significant and irreversible collective shift towards domestic chips, with a computing foundation based on local hardware taking shape.

NVIDIA’s founder and CEO, Jensen Huang, has issued a warning. He stated in an interview that if DeepSeek’s latest generation of large models is first released on Huawei’s advanced chip platform and fully adapted, it would be a catastrophic blow to the U.S.’s strategic position in the global AI field.

Huang’s real concern is that once China’s top large models are bound to domestic computing foundations, the long-standing U.S. chip blockade will lose its critical leverage.

A key link in this chain has now been established. The new DeepSeek-V4 model, launched on April 24, has included both Huawei’s Ascend chips and NVIDIA chips in its hardware validation list.

The newly adapted Huawei Ascend inference chip is priced at only a quarter of NVIDIA’s, yet its single-card computing power is 2.87 times greater than NVIDIA’s special version for China, showcasing a significant cost-performance advantage.

This is a tested, high-performance solution of “national model + national chip,” with compelling cost and security benefits.

Not long ago, the shortage of chips was a core bottleneck, especially in the critical area of model training, where domestic chips were largely absent or only able to participate in marginal tasks.

Now, a turning point has been reached. Multiple large models in China have completed adaptations to domestic chips, and 2026 is being referred to as the “year of domestic AI chip training implementation” in the industry.

Questions inevitably arise: Can large models run stably and efficiently on domestic hardware?

DeepSeek admits that the capability level of the new model still lags behind its main competitors, with a development trajectory approximately 3 to 6 months behind leading closed-source models.

Rather than waiting for external criticism or deliberately beautifying the situation, DeepSeek proactively acknowledges its shortcomings and faces the gap, revealing a pragmatic logic: in the competition where technological gaps objectively exist, humbly catching up is far more valuable than pretending to lead.

From core parameters and actual performance, the new model shows impressive breakthroughs. It features 16 trillion total parameters and a million-token ultra-long context as standard. In mathematics, hard science and technology, and competitive coding, the high-performance version of the new model has surpassed all publicly evaluated open-source models, standing shoulder to shoulder with mainstream closed-source models.

Especially in programming for intelligent agents, it has topped the open-source leaderboard and is hailed as a “programming artifact.”

While rationally acknowledging the overall technological gap, DeepSeek has achieved breakthroughs in specific areas and has opened a crushing gap in terms of cost. The DeepSeek-V4-Pro model API has launched a limited-time price promotion at 2.5 times lower than usual, with input prices starting at 0.25 yuan per million tokens. In contrast, the weighted average input price for GPT-5.5 Pro is $30 per million tokens, making DeepSeek-V4-Pro over 700 times cheaper.

Looking at the mainstream international large models, such as Anthropic’s Claude Opus series, OpenAI’s GPT-5.4, and Google’s Gemini 3.1 Pro series, their prices are also quite high.

While performance is only 3 to 6 months behind, the cost has created a massive gap. An asymmetric competition has already begun.

This is not just a victory for a single chip, but the maturation of an entire domestic computing ecosystem. Actual test data shows that after breaking away from the NVIDIA ecosystem, the new model’s end-to-end latency is 35% lower than the existing cluster.

This indicates that domestic computing has entered a stable and efficient “usable” stage.

Goldman Sachs’ latest research report states that with the large-scale supply of Huawei’s Ascend 950 in the second half of this year, the pricing of the new model will see a significant drop. This move not only strengthens DeepSeek’s cost competitiveness but also provides strong endorsement for the migration of China’s top large models to domestic computing.

Crucially, DeepSeek’s choice is not an isolated case.

Looking at other leading players in China, such as Alibaba’s Tongyi Qianwen, Zhiyun Qingyan, Baichuan Intelligence, and ByteDance’s Doubao, they are all simultaneously advancing extreme cost performance, chasing advanced performance, and building open-source ecosystems.

Although each company’s path may differ, the direction is highly consistent: breaking free from external dependencies and solidifying the domestic industrial chain.

The procurement of Huawei chips by Chinese tech companies is not merely a sentimental choice but a rational decision that balances cost accounting, supply chain security, and industrial autonomy.

With domestic chips as the foundation, the continuously improving self-controlled computing base is gradually solidifying the long-term confidence of China’s AI industry.

The Rising Value of Guangxu Yuanbao: A Collector's Guide

Fri, 01 May 2026 00:00:00 +0000

The Rising Value of Guangxu Yuanbao

Could there be a chance that in your home, tucked away in a corner of an old drawer or hidden in a forgotten bag, lie a few rusty, seemingly insignificant copper coins? You might pick them up, feel their lightness, and think, “These old coins are worth nothing, better to throw them away.”

But you would never guess that this seemingly worthless “old copper piece” is actually a superstar in the world of Qing Dynasty copper coins—the Guangxu Yuanbao. Once just a common currency for change, it has skyrocketed in value in the collector’s market, with well-preserved ordinary versions selling for tens of thousands, while top-tier specimens like the Guangdong Province “Double Dragon Longevity” coin can fetch prices in the millions, attracting countless collectors.

At the 2026 Spring High-End Coin Auction, a well-preserved Guangdong Province Guangxu Yuanbao “Double Dragon Longevity” coin was sold for an astonishing 1.68 million Hong Kong dollars after fierce bidding among dozens of collectors, setting a new record for Guangxu Yuanbao sales and sending shockwaves through the collecting community.

An experienced collector, witnessing the bidding, lamented, “I have an identical coin at home. Years ago, I was tricked into selling it for 500 yuan by an antique dealer. Seeing this million-dollar price now makes me regret it deeply!”

Such regrets are common in the collecting world. Many ordinary people unknowingly possess “million-dollar treasures” but fail to recognize their value, leading to neglect or being misled into selling them for a pittance, missing out on life-changing opportunities.

Today, I will provide a comprehensive 5000-word guide on this “top-tier copper coin of the Qing Dynasty”—the Guangxu Yuanbao. From its historical origins and the mysteries of its numerous versions to the story behind the high-priced Double Dragon Longevity coin, the secrets hidden in its rust, and tips for identifying authenticity and real market value, this article will ensure you never mistake a million-dollar treasure for scrap copper again. You will easily discern whether your Guangxu Yuanbao is worth a few dollars or a fortune!

Key Conclusion

The value of Guangxu Yuanbao varies dramatically—ordinary circulating versions in average condition are worth only a few thousand to tens of thousands, while rare versions in top condition can easily exceed hundreds of thousands. The Guangdong Province “Double Dragon Longevity” coin is exceptionally rare, with fewer than a hundred in existence, and top-tier specimens can command prices in the millions—over a hundred times the difference! Those few coins lying in your old drawer might just be “million-dollar treasures” capable of buying a house!

1. A Century of Change: Guangxu Yuanbao as the Benchmark of Qing Dynasty Copper Coins

To understand the collecting value of Guangxu Yuanbao, one must first grasp its historical significance. This small copper coin is not merely a convenient form of change; it is a testament to the tumultuous times of the late Qing Dynasty and a reflection of China’s modern currency reforms, embodying a century of change and legend. It is this historical weight that allows it to stand out among many Qing Dynasty copper coins, making it a “top-tier” collectible.

In the late 19th century, the Qing Dynasty was in turmoil, facing both internal strife and external threats. Following the Opium Wars, foreign powers invaded, leading to continuous domestic conflicts and an economy on the brink of collapse. The currency system was chaotic, with a mix of traditional square-hole coins, locally minted silver dollars, and foreign coins, compounded by the private minting of substandard currency by local warlords, severely hindering economic development and exacerbating the Qing government’s difficulties.

To save the crumbling economy and stabilize the currency system while raising military funds, the Qing government officially ordered currency reform in the 26th year of Guangxu (1900), beginning the minting of a new type of copper coin—the Guangxu Yuanbao. This coin broke away from the traditional square-hole design, adopting Western minting techniques and integrating elements of traditional Chinese culture, marking the beginning of modern mechanized copper coinage in China.

Compared to traditional square-hole coins, Guangxu Yuanbao has distinct advantages: it is uniformly shaped, consistently weighted, and available in various denominations, from one to twenty cash, catering to different social classes. It quickly replaced traditional square-hole coins, becoming one of the main circulating currencies in the late Qing Dynasty.

The scale of Guangxu Yuanbao minting was unprecedented. In just 11 years (1900-1911), the Qing government called upon 17 provinces and 20 mints to collaborate, establishing numerous mints to mass-produce Guangxu Yuanbao. Due to varying craftsmanship and standards among different mints, along with continuous adjustments in design and the addition of hidden marks, thousands of versions were produced, creating a “museum of versions” for Qing Dynasty copper coins.

Unfortunately, the 1911 Xinhai Revolution abruptly halted the minting of Guangxu Yuanbao. The once widely circulated coins faced large-scale melting and loss during the war years—many were melted down for jewelry or food, while others were buried or lost. After a century, few well-preserved Guangxu Yuanbao coins remain, especially the rare versions.

Elderly people often reminisce, “Back in the day, Guangxu Yuanbao was common in every household, used for buying candy and vegetables. Who would have thought that these unassuming copper coins would be worth so much decades later?”

It is this combination of “historical significance and scarcity” that lays the groundwork for the collecting value of Guangxu Yuanbao. It is not just a currency but a witness to history, encapsulating the rise and fall of the Qing Dynasty and documenting the arduous journey of modern currency reform in China. This high historical and cultural value is one of the core reasons it remains a “top-tier” collectible among Qing Dynasty copper coins.

Additionally, the design and craftsmanship of Guangxu Yuanbao represent the pinnacle of Qing Dynasty copper coins. The front features the inscription “Guangxu Yuanbao” in bold characters, with the minting province or bureau indicated above and the denomination below, flanked by exquisite decorative patterns that exude royal grandeur. The reverse usually depicts a vivid dragon motif, accompanied by clouds and flames, symbolizing “the dragon rules the world and the nation is prosperous,” showcasing a perfect blend of traditional Chinese culture and modern minting craftsmanship.

Now, a century later, this small copper coin has long exited the circulation market but has become a coveted item in the collecting community. It carries the last rays of the Qing Dynasty, witnessing significant changes in modern Chinese history, and boasts high historical and artistic value, along with its scarcity, making it a “hard currency” that collectors eagerly pursue, firmly occupying the “C position” in the Qing Dynasty copper coin collecting market.

2. The Maze of Versions: Thousands of Types, A Treasure Hunt with a Coin

Many people wonder why Guangxu Yuanbao has become the “top-tier” copper coin of the Qing Dynasty. Beyond its historical significance and scarcity, a key reason is its incredibly rich variety of versions—thousands of types create a “maze” that collectors find thrilling to navigate, enhancing its collecting value.

The minting of Guangxu Yuanbao involved 17 provinces and 20 mints, each with different craftsmanship and standards. Throughout the minting process, adjustments were made to fonts, patterns, dragon motifs, and denominations, even adding hidden marks, resulting in thousands of versions. Even seasoned collectors find it challenging to gather all types.

Statistics show that just the ten-cash version of Guangxu Yuanbao has hundreds of types. Variations across provinces, mints, and years can lead to significant differences in value, even with minor distinctions. This “one coin, many faces” characteristic turns collecting Guangxu Yuanbao into a treasure hunt, where discovering a rare version brings immense satisfaction.

Despite its small size, Guangxu Yuanbao conceals many “mysteries” within its dimensions; even slight differences can determine its value. Below, we will break down the core distinctions of Guangxu Yuanbao versions, making it easy for newcomers to understand the “treasure-hunting code.”

(1) Core Distinction 1: Minting Province/Bureau Determines Base Value

The front of Guangxu Yuanbao typically indicates the minting province or bureau, which is the most basic marker for distinguishing versions and a crucial factor in determining its value. Due to varying minting volumes and craftsmanship across provinces, the surviving quantities differ significantly, leading to notable price disparities.

Among them, the most scarce and valuable are the Guangxu Yuanbao minted in Guangdong, Hubei, and Jiangsu provinces. Particularly, the Guangdong version, known for its exquisite craftsmanship and limited surviving quantity, is considered the “noble” type among Guangxu Yuanbao. Even ordinary specimens of the Guangdong version can fetch prices ranging from tens of thousands to hundreds of thousands, while rare versions can easily reach millions.

Next are the Hubei, Jiangsu, and Zhejiang versions, which also have high craftsmanship levels and relatively low minting volumes, making them scarce and thus more expensive. Ordinary specimens of these versions are typically priced around tens of thousands.

Conversely, provinces with larger minting volumes, such as Henan, Shandong, and Sichuan, have more abundant surviving quantities, resulting in more affordable prices, with ordinary specimens ranging from a few thousand to tens of thousands, making them suitable for novice collectors.

For example, an ordinary Henan Province Guangxu Yuanbao ten-cash coin is valued at approximately 3000-5000 yuan, while an ordinary Guangdong Province Guangxu Yuanbao ten-cash coin can reach 50,000-80,000 yuan, a difference of over tenfold!

(2) Core Distinction 2: Fonts and Patterns, Subtle Differences Hide “Price Codes”

In addition to the minting province, the fonts and patterns on Guangxu Yuanbao are also key factors in distinguishing versions and determining value. Even within the same province and denomination, subtle differences in fonts and patterns can turn a coin into a “rare treasure.”

Font Differences: The four characters “Guangxu Yuanbao” on the front can be in different styles, such as regular or clerical script. Some fonts are bold and strong, while others are round and full. The denomination below may vary in phrasing, such as “Ten Cash” or “Twenty Cash,” with differences in font size and thickness. Even within the same font, variations in stroke thickness or character positioning can lead to different versions.

For instance, within the Guangdong Province Guangxu Yuanbao, the “Longevity” character can be written in both traditional and simplified forms. This single character’s difference can lead to a price gap of tens of thousands—traditional “Longevity” versions are rarer and can exceed one million, while simplified versions are priced around hundreds of thousands.

Pattern Differences: The decorative patterns on the sides of the front can include plum blossom patterns, chrysanthemum patterns, or star patterns, with variations in quantity, size, and arrangement that may distinguish versions. The dragon motifs on the reverse also exhibit significant variation, with different styles such as coiled dragons, seated dragons, or flying dragons. The density of dragon scales, the shape of claws, and the orientation of tails can all differ. For example, a Guangxu Yuanbao with clear dragon scales and sharp claws may be valued at 100,000-150,000 yuan, while a version with unclear scales and damaged claws may only be worth 10,000-20,000 yuan—a tenfold difference.

(3) Core Distinction 3: Hidden Marks and English Letters, Concealing “Anti-Counterfeiting Codes”

To prevent counterfeiting, minting bureaus added hidden marks during the production of Guangxu Yuanbao. These marks are essential for distinguishing versions and verifying authenticity, making the “treasure hunt” even more exciting.

Hidden marks can take various forms, such as small notches in the strokes of characters, tiny stars in the corners of patterns, or symbols between English letters. Some marks are even concealed within the scales of dragon motifs, making them nearly impossible to detect without careful observation.

In addition to hidden marks, the English letters along the edge of the coin can also help distinguish versions. The reverse edge of Guangxu Yuanbao typically bears the English name of the minting province, such as “Kwangtung Province” for Guangdong and “Hupeh Province” for Hubei. Variations in the arrangement, font size, and even missing or deformed letters can indicate different versions.

Many collectors use magnifying glasses to closely examine every detail of Guangxu Yuanbao, searching for hidden marks and differences in English letters. Discovering a previously unseen version can be as exhilarating as an archaeologist unearthing a new site. One collector found a hidden mark between the scales of a dragon motif, and after authentication, it turned out to be an extremely rare trial version, eventually selling for 860,000 yuan, leading to a life-changing fortune.

(4) Quick Tips for Beginners: 30 Seconds to Distinguish Guangxu Yuanbao Versions

Many beginners feel overwhelmed by the multitude of Guangxu Yuanbao versions, but by remembering three simple tips, you can quickly distinguish them in 30 seconds without professional tools or expert help:

Look at the Front: First, check the minting province/bureau, then examine the font of “Guangxu Yuanbao” and the denomination—this is the most basic method of distinction.
Look at the Back: Focus on the dragon motif, observing the shape of scales, claws, and tails, along with the orientation of clouds—these are key to distinguishing versions.
Look at the Details: Use a magnifying glass to inspect the patterns and English letters on the coin, searching for hidden marks. Differences in these marks and letters often reveal rare versions.

A small reminder: If your Guangxu Yuanbao is heavily rusted and you can’t see the fonts, patterns, or hidden marks clearly, you can gently wipe it with a soft cloth to remove surface rust (be careful not to apply too much pressure to avoid damaging the patina) before identifying it. If you’re still unsure, consult a reputable collector’s shop or a professional appraisal agency for assistance.

3. The Million-Dollar Legend: Why the Double Dragon Longevity Coin Commands Such High Prices

Among the thousands of versions of Guangxu Yuanbao, one stands out as the “top-tier of the top-tier”—the Guangdong Province “Double Dragon Longevity” coin. This coin is not an ordinary circulating currency but a commemorative coin specially minted by the Qing court to celebrate Empress Dowager Cixi’s 70th birthday. With an extremely limited surviving quantity, it often sells for prices in the millions, becoming a coveted “treasure” for countless collectors.

At the 2026 Spring High-End Coin Auction, a PCGS-graded MS63 Guangdong Province Guangxu Yuanbao “Double Dragon Longevity” coin was sold for 1.68 million Hong Kong dollars after intense bidding, once again breaking the sale record for Guangxu Yuanbao. Many collectors remarked, “The Double Dragon Longevity coin is the ‘ceiling’ of Qing Dynasty copper coins; owning one signifies status and strength.”

So, what makes the “Double Dragon Longevity” coin so special that it can sell for such high prices? What historical stories lie behind it?

(1) The Origin of the Double Dragon Longevity Coin: A Commemorative Coin for Cixi’s 70th Birthday

Let’s turn back to 1894, the year of Empress Dowager Cixi’s 70th birthday. Despite the Qing Dynasty’s internal and external troubles, Cixi extravagantly celebrated her birthday with a grand banquet. To assert her authority and commemorate her birthday, the Qing court commissioned the Guangdong Mint to produce a special commemorative coin—the Guangdong Province Guangxu Yuanbao “Double Dragon Longevity” coin.

This commemorative coin was designed to the highest standards, using the best minting techniques and quality copper, taking months of careful design and repeated trial minting before finalization. Its design integrates the “longevity” and “dragon” motifs from traditional Chinese culture, symbolizing “double blessings and the dragon ruling the world,” showcasing royal grandeur.

The front of the Double Dragon Longevity coin features a prominent “Longevity” character, surrounded by exquisite patterns, with “Guangdong Province” above and “Kuwait Seven Cash Two” (denomination) below, adorned with symmetrical plum blossom patterns on either side, presenting a simple yet elegant design with auspicious meanings.

The reverse showcases two vividly depicted flying dragons, spiraling to form the shape of the “Longevity” character, with clear and distinct scales, sharp claws, and an imposing gaze, as if ready to soar; surrounding the dragon motifs are the English inscription “Kwangtung Province” and the denomination, with bold fonts and rich patterns, blending traditional Chinese culture with Western minting elements.

Notably, the minting quantity of the Double Dragon Longevity coin is extremely limited. Originally, this commemorative coin was primarily used as rewards during Cixi’s birthday banquet, given to nobles and foreign diplomats, with only a small number reserved for collectors, and most were ordered to be recalled and melted down after the banquet to prevent them from entering the public domain. Today, the surviving Double Dragon Longevity coins number fewer than a hundred, making them exceedingly rare.

Veteran collectors often say, “Each Double Dragon Longevity coin has its own story. It is not just a commemorative coin but a testament to Cixi’s luxurious lifestyle and a snapshot of late Qing history—this historical value is priceless.”

(2) Core Reasons for the Million-Dollar Price of the Double Dragon Longevity Coin: Scarcity + Craftsmanship + Historical Value

The reason the Double Dragon Longevity coin can command million-dollar prices is not coincidental; it is the result of the combination of “scarcity, craftsmanship, and historical value,” all of which are essential. This is also why it surpasses all other Guangxu Yuanbao versions to become the “top-tier of the top-tier.”

Core Reason 1: Extremely Limited Surviving Quantity, Scarcity Drives Value

The core logic in the collecting world is always “scarcity drives value.” As a commemorative coin for Cixi’s 70th birthday, the Double Dragon Longevity coin was minted in limited quantities, and due to the Qing court’s subsequent recall and melting, along with a century of war and loss, only a few dozen well-preserved examples exist today. Among these, those in pristine condition are even rarer, making them akin to winning the lottery.

According to authoritative data in the collecting community, the total mintage of the Double Dragon Longevity coin is less than 1000 pieces. After a century of wear and tear, only a few dozen remain, most of which are held in museums or by seasoned collectors, making them extremely hard to find on the market. This extreme scarcity has made the Double Dragon Longevity coin a “treasure” that collectors scramble to acquire, naturally driving its price to astronomical heights.

I know a veteran collector who has been collecting Guangxu Yuanbao for over 30 years and possesses thousands of different versions, yet he has never found a Double Dragon Longevity coin. He remarked, “Finding a Double Dragon Longevity coin is incredibly difficult. I’ve spent over a decade searching through national collecting markets and still haven’t found a single well-preserved one. Its scarcity is its core value.”

Core Reason 2: Pinnacle of Craftsmanship, Imitations Are Hard to Replicate

The minting craftsmanship of the Double Dragon Longevity coin is considered the pinnacle of late Qing minting techniques, and even today, it is challenging to replicate. It employs high-pressure minting processes that require exceptional standards for minting equipment, copper purity, and engraving techniques, achievable only by the top artisans at the Guangdong Mint.

Every detail of the dragon motif on the Double Dragon Longevity coin is exquisitely carved, with each scale distinct and neatly arranged, sharp claws, and an imposing gaze. The “Longevity” character and fonts on the front are bold and clear, with no blurriness or roughness; the copper used is of high purity, with a warm sheen and naturally even patina, retaining its exquisite quality even after a century.

This top-tier minting craftsmanship makes imitations difficult to create—most imitations on the market use standard minting processes, resulting in blurred dragon motifs, lack of three-dimensionality, rough fonts, and low copper purity. A careful inspection will reveal the flaws.

Core Reason 3: Deep Historical Value, Carrying a Century of Legend

The Double Dragon Longevity coin is not merely a commemorative coin but a witness to history, encapsulating a century of change and legend. As a commemorative coin for Cixi’s 70th birthday, it reflects the opulence and decay of the late Qing Dynasty, serving as a “living fossil” of modern Chinese history, carrying immense historical value.

In the collecting world, “historical significance” is often a critical factor in determining a collectible’s value. The Double Dragon Longevity coin, as one of the most representative commemorative coins of the late Qing Dynasty, carries memories of Cixi’s birthday and the decay of the Qing government, making its historical value far exceed its monetary worth.

Many collectors seek the Double Dragon Longevity coin not just for its potential for appreciation but to preserve a piece of history and memory. They believe that each coin carries its own story, reflecting a century of history, and this historical value is priceless.

(3) The Real Market Value of the Double Dragon Longevity Coin: Condition Determines Price, Differences Can Reach Hundreds of Times

Although the Double Dragon Longevity coin often sells for millions, not all of them fetch such high prices. Its value primarily depends on its condition and grading; different conditions can result in price differences of up to hundreds of times. New collectors must be clear about this to avoid being misled by inflated online prices.

Lower Grade (with obvious scratches, severe oxidation, blurred dragon motifs, incomplete fonts, and damaged edges): Market price 500,000-800,000 yuan (extremely rare);
Medium Grade (with minor scratches, slight oxidation, clear dragon motifs, mostly intact fonts, and edges): Market price 800,000-1,200,000 yuan;
Upper Grade (with no obvious scratches, slight oxidation, three-dimensional dragon motifs, intact fonts, and edges): Market price 1,200,000-1,500,000 yuan;
Top Graded Version (PCGS MS63 or above, with no scratches, no oxidation, perfect dragon motifs, fonts, and edges): Market price 1,500,000-2,000,000 yuan (based on the 2026 sale of 1.68 million).

A small reminder: 99% of the “Double Dragon Longevity coins” on the market are imitations. True authentic Double Dragon Longevity coins are exceedingly rare, and ordinary people are unlikely to encounter them. Do not harbor any illusions that your coin is a “million-dollar coin.”

4. The Rust Color Code: Is Older Copper Worth More? Understanding Rust to Avoid Paying Tens of Thousands in Intelligence Tax

Many new collectors of Guangxu Yuanbao fall into the misconception that newer coins are worth more and that cleaner coins are more valuable. In fact, the opposite is true for Guangxu Yuanbao; the rust (patina) is a testament to the passage of time and is crucial for identifying authenticity, determining age, and influencing value. A naturally patinated old copper coin is often worth more than a brand-new coin.

Having endured a century, Guangxu Yuanbao will naturally oxidize, forming a thin layer of rust, known as “patina.” This patina is a natural oxidation layer formed through prolonged exposure to air and moisture, protecting the coin from further oxidation while reflecting its age and historical charm.

In contrast, most imitations have artificially created patinas that appear stiff and forced, lacking the natural aging quality. By understanding the nuances of rust, you can easily identify authenticity and avoid paying tens of thousands in intelligence tax. Below, we will analyze the “time code” hidden in the rust of three Guangxu Yuanbao coins, all of which are “Guangdong Province Kuwei Seven Cash Two” versions, helping beginners easily learn.

(1) Authentic Patina: Natural and Warm, with Distinct Layers (Coin A)

Coin A is an authentic Guangxu Yuanbao, with its rust (patina) displaying a natural and warm hue. The central copper green is a peacock green, with even patina on the edges, oxidizing from the edges inward, layered and distinct, without noticeable spots or scratches, exuding a rich sense of history.

This natural patina is formed through a century of natural oxidation, with an even color and fine texture. Gently wiping it will not cause color loss or flaking. Moreover, the color of natural patina changes with light, exhibiting different luster, something that artificial patina cannot replicate.

Experts can easily identify that Coin A is older and of better quality, naturally increasing its value. An authentic Guangxu Yuanbao with natural patina is worth dozens or even hundreds of times more than a shiny imitation without patina.

(2) Poor Patina: Post-Care, Reverse Patina (Coin B)

Coin B is also an authentic Guangxu Yuanbao, but its patina is classified as “poor patina,” significantly lowering its value compared to Coin A. This coin’s center is shiny, lacking rust, while the edges exhibit a ring of dark brown rust, indicating a “reverse patina” caused by post-care.

Many collectors, upon acquiring copper coins, feel that the rust is unattractive and scrub the surface with sandpaper or steel wool, leading to a shiny center while the edges remain partially rusted, creating an awkward contrast.

This type of reverse patina damages the coin’s original state, reducing its collectible value. Although Coin B is authentic, its damaged patina lowers its market price by 30%-50% compared to Coin A, costing tens of thousands.

A small reminder: When collecting Guangxu Yuanbao, be sure to protect its patina. Do not scrub or polish it casually; even if the rust appears unattractive, it is a mark of time. Preserving the original patina is essential to maintaining its collectible value.

(3) Fake Patina: Illusory Rust, Easily Identifiable (Coin C)

Coin C is a counterfeit Guangxu Yuanbao, with its rust being artificially created “floating rust” that reveals its flaws at first glance. This coin appears dull all over, with a loose rust layer that falls off upon touch. Gently wiping it will result in color loss or leave black marks, lacking any natural aging quality.

Methods for artificially creating patina often involve soaking the coin in chemical agents or applying rust powders to induce rapid rusting. This rust appears stiff and uneven, with no layers, and can easily flake off.

Many newcomers, unfamiliar with the nuances of rust, can easily be deceived by this fake patina, spending tens of thousands on an imitation, ultimately losing their investment. By closely observing, one can identify the flaws of fake patina: colors that are too dark or too bright, loose and easily detachable rust, and a lack of natural layers that appear overly forced.

Quick Tips for Beginners: 3 Steps to Quickly Identify Authentic Patina and Avoid Intelligence Tax

Look at the Color: Natural patina is even and warm, with historical charm, changing with light; fake patina appears stiff and uneven, either too dark or too bright, lacking natural luster.
Feel the Texture: Natural patina has a fine texture and adheres well. Gently wiping it will not cause color loss or flaking; fake patina is loose and lacks adherence, easily flaking off upon touch.
Observe the Layers: Natural patina has distinct layers, with oxidation traces spreading from the edges inward, showing clear transitions; fake patina lacks layers, with chaotic rust patterns and no transitions, appearing overly deliberate.

5. Stories of Joy and Regret in the Collecting Community: Some Strike Gold, Others Sell for Pennies

With the skyrocketing popularity of Guangxu Yuanbao, especially after the Double Dragon Longevity coin fetched 1.68 million, many amusing stories have emerged in the collecting community. Some have unexpectedly turned a Guangxu Yuanbao into a “million-dollar fortune,” while others have regrettably sold their million-dollar treasures for mere pennies, and some have blindly followed trends, falling victim to scams. These stories reflect the heated market and serve as a warning.

(1) Joyful Stories: Finding a Double Dragon Longevity Coin Worth 1.2 Million

Last year, I met a man named Mr. Li, an ordinary office worker with no knowledge of collecting. During the Spring Festival, he returned to his hometown to clean out his old house and found a rusty Guangxu Yuanbao in an old wooden box. It looked unremarkable, and he thought it was just scrap copper, planning to sell it for a few dozen yuan.

Just as he was about to sell the coin, a friend who was a collector happened to visit. Upon seeing the coin, he was excited and quickly examined it. After a few minutes, his friend exclaimed, “You are so lucky! This is actually a Guangdong Province Guangxu Yuanbao Double Dragon Longevity coin, and in decent condition, it can sell for over 1 million!”

Mr. Li couldn’t believe it, thinking his friend was joking: “This old copper piece is worth 1 million? Don’t fool me!” His friend explained the history and characteristics of the Double Dragon Longevity coin, showing him market trends and auction records on his phone. Only then did Mr. Li realize that his “old copper piece” was a “million-dollar treasure.”

Following his friend’s advice, Mr. Li sent the coin to a reputable appraisal agency, where it received a PCGS MS62 rating. Soon after, a collector contacted him, willing to purchase the Double Dragon Longevity coin for 1.2 million. Mr. Li readily agreed, transforming his find into 1.2 million, equivalent to his salary over a decade.

Mr. Li chuckled, “I never expected to find such a treasure while cleaning out my old house. I was just hoping for a few dozen yuan! Now I’ll never casually throw away old items again; there might be other treasures at home!”

Another story involves Ms. Chen, a retired woman who inherited a Guangxu Yuanbao from her grandfather. She kept it in her wardrobe for decades without paying much attention. Last year, after hearing that Guangxu Yuanbao was valuable, she took the coin out, snapped some photos, and sent them to a legitimate collector’s shop. The owner immediately offered her 180,000 yuan, leaving Ms. Chen stunned and unable to believe her eyes.

Later, she sold the Guangxu Yuanbao for 180,000, using the money to buy a school district house for her grandson, bringing joy to her family. Ms. Chen said, “I never expected that my grandfather’s old copper piece could help solve such a big problem. It truly is a family heirloom!”

(2) Regretful Stories: Selling a Million-Dollar Treasure for 500 Yuan

With joy comes regret. In the collecting community, the most regrettable stories involve people who, not understanding the market, sold million-dollar treasures for mere pennies, only to see the soaring prices later and lament their decisions.

I know a collector named Mr. Zhao, who has been collecting Guangxu Yuanbao for over 30 years. He once owned a top-quality Double Dragon Longevity coin, which he purchased from an elderly person for 2000 yuan twenty years ago. In 2010, an antique dealer approached him, offering 500 yuan for the Double Dragon Longevity coin. Mr. Zhao, not understanding the versions and conditions, thought 500 yuan was a good deal and sold it without hesitation.

It wasn’t until 2026 that he learned of the 1.68 million sale of the Double Dragon Longevity coin, realizing that he had sold a top-quality version for just 500 yuan, which today would fetch at least 1.5 million. Mr. Zhao lamented, “If only I had understood the market back then, I wouldn’t have been so foolish to sell a million-dollar treasure for 500 yuan. It was a moment of ignorance that cost me a fortune!”

Another story involves a young man who was given a Guangxu Yuanbao by his grandfather. Thinking it was ugly and rusty, he carelessly tossed it aside. Later, during a home renovation, he mistook the coin for scrap and sold it for 5 yuan.

Last year, after hearing about the value of Guangxu Yuanbao online, he hurried to the scrap yard to find the coin, but it had already been sold. He said, “Looking back, I regret it so much. If I had just taken care of it, it could have been worth tens of thousands today. Even if I didn’t sell it, keeping it as a family heirloom would have been great. It was a careless mistake that cost me a fortune!”

Such regrets abound in the collecting community. Many people once thought Guangxu Yuanbao was just ordinary scrap copper, worth little, and either tossed it aside or were tricked into selling it cheaply by antique dealers. They never imagined that decades later, these seemingly insignificant “scrap copper pieces” would become “high-priced treasures” in the collecting market, with their value skyrocketing.

This serves as a reminder that the value of old items lies not only in their scarcity but also in their irreplaceability; once lost, they can never be recovered. If you have Guangxu Yuanbao at home, do not casually discard or sell it cheaply. Take a few minutes to understand the versions, assess their condition, and gauge the market before making a decision.

(3) Warning Stories: Beware of Scams in Collecting

With the explosive popularity of Guangxu Yuanbao, especially after the Double Dragon Longevity coin fetched 1.68 million, many unscrupulous individuals have seen an opportunity and started various scams to deceive inexperienced collectors into paying “intelligence tax.” I have summarized several common scams that everyone should be wary of to avoid being cheated.

Scam 1: Price Inflation, Exaggerating Value

Many scammers post online, claiming that “all Guangxu Yuanbao can sell for 1 million” or “ordinary Guangxu Yuanbao can sell for 500,000,” inflating the value to attract new collectors and then persuading them to buy their “fake coins” or “inferior coins,” or luring them into participating in so-called “auction events” with high fees.

In reality, 99% of Guangxu Yuanbao on the market are ordinary circulating versions, with average conditions only worth a few thousand to tens of thousands. Only top-tier specimens like the Double Dragon Longevity coin are priced above 1 million, which ordinary people are unlikely to encounter. New collectors must view these inflated prices rationally and avoid being misled by online hype.

Scam 2: Selling Fakes as Real, Deceiving New Collectors

Some scammers produce counterfeit Guangxu Yuanbao, especially fake Double Dragon Longevity coins, mimicking the designs, colors, and patinas of authentic coins, even forging grading certificates to sell them at low prices, claiming they are “authentic Guangxu Yuanbao” worth over 1 million. Many new collectors, unfamiliar with identification, easily fall for this and end up buying fakes, losing their investments.

I recall a year when a friend of mine bought a “Double Dragon Longevity” coin online for 300,000 yuan. Upon returning home and examining it with a magnifying glass, he found the dragon motifs were blurred and the patina was stiff, realizing he had been scammed, which left him furious. Later, he had it appraised and discovered it was merely a counterfeit made from ordinary copper, costing less than 100 yuan.

Scam 3: Laser Modifications, Selling Inferior as Premium

This is one of the most common scams in the collecting community. Some scammers take an ordinary Guangxu Yuanbao, use a laser machine to modify the fonts, patterns, and dragon motifs, turning a standard version into a rare one, and then artificially age it to create a fake patina, selling it as a “rare treasure” at a high price.

Such modified coins will reveal obvious signs of alteration under magnification, with stiff lines and unnatural features, and the patina will appear forced and stiff. New collectors must be cautious and avoid buying Guangxu Yuanbao from strangers to prevent being scammed.

Scam 4: Fake Grading, Cheating for Money

Some scammers forge grading certificates from reputable appraisal agencies, falsely labeling ordinary Guangxu Yuanbao or even counterfeit coins as “PCGS MS63” or “MS64,” then selling them at high prices, claiming they are “top-tier rare versions” worth over 1 million.

New collectors must verify grading certificates, as legitimate ones have unique identification numbers that can be checked on official websites. If the number cannot be verified or does not match the actual coin, it is likely a forgery. When purchasing Guangxu Yuanbao with grading certificates, always check the grading number on the official website to confirm its authenticity before proceeding with the transaction.

I want to remind everyone: collecting carries risks, and following trends requires caution. If you wish to collect Guangxu Yuanbao, first educate yourself about the relevant knowledge, learn to identify authenticity and distinguish versions, avoid blindly following trends, and never trust inflated online prices. If you’re uncertain about the authenticity or value of your Guangxu Yuanbao, consult reputable collector shops or professional appraisal agencies for assistance.

6. Practical Tips: 4 Steps to Quickly Identify Authentic Guangxu Yuanbao, Even Beginners Can Learn

Many people own Guangxu Yuanbao but are unsure of their authenticity or value. In fact, determining the authenticity of Guangxu Yuanbao is not difficult; you don’t need professional appraisal tools or experts. By mastering these four simple techniques, even beginners can easily learn to avoid being deceived by counterfeit coins.

(1) Examine the Appearance: Authentic Coins Have Clear Details, Counterfeits Are Blurred and Rough

Authentic Guangxu Yuanbao, whether ordinary or rare versions, exhibit clear details and smooth lines, without blurriness or deformation. The four characters “Guangxu Yuanbao” on the front are bold and clear, with sharp edges and no roughness; the minting province/bureau and denomination fonts are also clear and neatly arranged. The dragon motifs on the reverse are finely detailed, with clear scales, claws, and tails, all distinctly visible.

In contrast, counterfeit Guangxu Yuanbao often have blurred and rough details, with stiff lines lacking fluidity. The “Guangxu Yuanbao” font may be unclear or deformed, with rough edges; the minting province/bureau and denomination fonts may be jumbled or even contain typos. The dragon motifs may have merged scales, deformed claws, and blurred tails, appearing as if they were merely stamped, with significant detail loss.

Additionally, the diameter, thickness, and weight of authentic Guangxu Yuanbao adhere to fixed standards. For example, the ten-cash version has a diameter of approximately 28 mm, a thickness of about 1.5 mm, and a weight of around 7.2 grams, with a tolerance of no more than 0.3 grams. Counterfeit coins often deviate from these measurements, either being too large or too small, which can be felt by hand.

(2) Examine the Edge: Authentic Coins Have Regular, Textured Edges, Counterfeits Are Rough and Irregular

The edge is a critical factor in determining the authenticity of Guangxu Yuanbao. Authentic Guangxu Yuanbao have neatly arranged edges with clear textures, evenly spaced grooves, and a tactile sense of depth without burrs or roughness. Different mints may have variations in edge design, but they all follow fixed patterns; for instance, the Guangdong version has fine, closely spaced grooves, while the Hubei version has slightly coarser, evenly spaced grooves.

Counterfeit Guangxu Yuanbao often have rough, irregular edges, with blurred grooves and uneven spacing. Touching them may reveal either excessive smoothness without depth or noticeable burrs and roughness. Some counterfeit rare versions may have poorly mimicked edges, making them easy to identify.

(3) Examine the Patina: Authentic Coins Have Natural, Warm Patina, Counterfeits Have Stiff, Forced Patina

Patina is a key factor in determining the authenticity of Guangxu Yuanbao. Authentic Guangxu Yuanbao have patina formed through natural oxidation, exhibiting warm hues without harsh brightness, displaying natural copper greens and light browns. The patina is even, without noticeable spots or scratches, appearing pleasant and rich in historical charm. Gently wiping it will not cause color loss or flaking, and the texture is fine with good adherence.

In contrast, counterfeit Guangxu Yuanbao often have artificially created patina that appears stiff and forced, with colors that are either too bright or too dark, lacking natural aging quality. Artificial patina is often uneven, with noticeable spots and scratches, and gently touching it may result in color loss.

A useful tip for identifying patina authenticity is to gently wipe the surface of the coin with a soft cloth. If it is natural patina, the surface will remain warm and glossy without color loss; if it is artificial, color loss will occur, and the surface will become dull.

(4) Listen to the Sound: Authentic Coins Produce a Clear, Pleasant Sound, Counterfeits Sound Dull and Muffled

Authentic Guangxu Yuanbao are made from high-purity copper, which is soft. When two authentic Guangxu Yuanbao are gently struck together, they produce a clear, ringing sound, pleasant to the ear without dullness or muffled tones, and the sound lasts longer.

In contrast, counterfeit Guangxu Yuanbao are often made from low-purity copper mixed with iron or lead, making them hard. When two counterfeit coins are struck together, they produce a dull, thudding sound, lacking clarity, and the sound dissipates quickly.

Supplement: If you use the above four techniques and still cannot determine the authenticity of your Guangxu Yuanbao or want to know its specific value, consult reputable collector shops or professional appraisal agencies for reliable assistance to avoid being scammed.

7. Real Market Value Exposed: Guangxu Yuanbao Price List for 2026, Don’t Be Misled Again

Many people are most concerned about the question: How much is my Guangxu Yuanbao worth? To provide clarity on the real market value of Guangxu Yuanbao in 2026, I have compiled a detailed price list covering different provinces, versions, and conditions. You can refer to this list to assess the value of your coins and avoid being misled by inflated prices online.

(1) Ordinary Circulating Guangxu Yuanbao (Common Versions)

Lower Grade (with obvious scratches, severe oxidation, blurred dragon motifs, incomplete fonts, and damaged edges): Market price 3000-10,000 yuan each;
Medium Grade (with minor scratches, slight oxidation, clear dragon motifs, mostly intact fonts, and edges): Market price 10,000-50,000 yuan each;
Upper Grade (with no obvious scratches, slight oxidation, clear dragon motifs, intact fonts, and edges, with natural patina): Market price 50,000-150,000 yuan each;
Exceptional Grade (nearly uncirculated, with no scratches or oxidation, even patina, perfect dragon motifs, fonts, and edges): Market price 150,000-300,000 yuan each (extremely rare).

(2) Scarce Versions of Guangxu Yuanbao (Non-Double Dragon Longevity)

Hubei Province Guangxu Yuanbao (Kuwait Seven Cash Two): Market price 300,000-800,000 yuan each (varying prices based on condition);
Jiangsu Province Guangxu Yuanbao (Kuwait Seven Cash Two): Market price 250,000-700,000 yuan each;
Zhejiang Province Guangxu Yuanbao (Kuwait Seven Cash Two): Market price 200,000-600,000 yuan each;
Trial Versions and Sample Coins (Various Provinces): Market price 800,000-2,000,000 yuan each (extremely rare).

(3) Top-Tier Treasures: Guangdong Province “Double Dragon Longevity” Coin

Lower Grade (with obvious scratches, severe oxidation, blurred dragon motifs, incomplete fonts, and damaged edges): Market price 500,000-800,000 yuan each (extremely rare);
Medium Grade (with minor scratches, slight oxidation, clear dragon motifs, mostly intact fonts, and edges, with natural patina): Market price 800,000-1,200,000 yuan each;
Upper Grade (with no obvious scratches, slight oxidation, three-dimensional dragon motifs, intact fonts, and edges, with even and warm patina): Market price 1,200,000-1,500,000 yuan each;
Top Graded Version (PCGS MS63 or above, with no scratches, no oxidation, perfect dragon motifs, fonts, and edges): Market price 1,500,000-2,000,000 yuan each (based on the 2026 sale of 1.68 million).

Important Reminders:

The prices listed above reflect the real market conditions as of April 2026 and are for reference only. Prices may fluctuate based on market supply and demand, condition, and grading scores, and are not fixed;
The province, version, condition, and grading of Guangxu Yuanbao are key factors in determining prices. Coins of the same province and version will have higher prices with better condition and higher grading scores, with differences potentially reaching dozens of times.

What is Vibe Coding? A New Paradigm in Programming

Fri, 01 May 2026 00:00:00 +0000

What is Vibe Coding?

Vibe Coding, literally translated as “atmosphere programming,” was first proposed by Andrej Karpathy, co-founder of OpenAI and former AI director at Tesla, in February 2025. In 2025, it was named the word of the year by Collins Dictionary, even surpassing the buzz around “AI” itself.

The core concept is simple: use natural language to describe intentions and let AI write the code, while humans are responsible for reviewing and guiding the process. Karpathy described it as:

“You are completely immersed in the vibe, forgetting the existence of code. You look at things, talk, run programs, copy and paste, and most of the time it just works.”

This might sound like a fantasy, but data shows that 92% of American developers have adopted some form of Vibe Coding. This is not a niche experiment; it is a transformation happening in the industry.

Three Core Elements

Prompt-Driven: Developers describe “what they want” in natural language, focusing on intent rather than syntax. You don’t need to know how to implement it; you just need to clarify the goal.
Multi-Agent Collaboration: Multiple AI roles, such as planners, coders, testers, and debuggers, work together to form a complete development pipeline.
Backend Model Capability Boundaries: The complexity of problems Vibe Coding can solve directly depends on the AI model’s capability limits.

Three Generations of Programming Paradigms

Let’s rewind to the year 2000. Back then, writing code was an “exclusive skill” for programmers—requiring mastery of syntax, familiarity with APIs, and understanding framework details. Every character was typed by hand, and every bug had to be debugged by the programmer.

Then, AI-assisted programming tools emerged. GitHub Copilot came into play, automatically completing code snippets as you typed. However, at this stage, humans were still at the core, with AI as a co-pilot—you still needed to know how to write, just with less typing.

Vibe Coding represents a complete paradigm shift:

Dimension	Traditional Manual Coding	AI-Assisted (Copilot)	Vibe Coding
Core Logic	Human defines requirements	Handwrite all code	Human leads the process
Human-Machine Roles	Human is the executor	Machine is a tool	Human is the decision-maker/director, AI is the execution partner
Skill Requirements	Proficient in syntax, API, framework details	Master coding, use AI for efficiency	Precise requirement breakdown, architectural thinking, code review

In simple terms: traditional programming is “I teach you how to do it,” AI assistance is “you help me complete it,” while Vibe Coding is “you do it all for me.”

Why is it Gaining Popularity?

The rise of Vibe Coding is not coincidental; it is a result of multiple converging factors:

Efficiency Revolution

Prototype development efficiency has increased by 5-10 times, with over 70% of repetitive coding tasks handed over to AI. This means that what used to take a week to complete can now be done in just a few hours.

Predictions indicate that the global AI-assisted programming tools market will reach $8.5 billion by 2026. This is not a small-scale operation but a multi-billion dollar market.

Lower Barriers to Entry

In the past, “not knowing how to code” meant being out of reach of software development. Now, anyone who can clearly express their needs in natural language can potentially create a usable software product. The barrier to programming has shifted from “knowing how to write code” to “knowing how to communicate.”

Flow Experience

Traditional programming requires frequent context switching—writing code, checking documentation, debugging, etc. This fragmented thinking can be exhausting. Vibe Coding allows you to immerse yourself in the thought of “what you want” rather than the details of “how to implement it.” This flow experience is why many developers find it irresistible.

Mainstream Tools Overview

There are many tools for Vibe Coding, each with different positioning and applicable scenarios:

Cursor ($20/month): The most mature AI code editor, with a “Composer” mode supporting multi-file generation and editing, suitable for deep users.
Claude Code (CLI): Terminal-level AI programming assistant, suitable for complex tasks, supports file system operations.
Trae (ByteDance, free): User-friendly in China, Chinese UI, completely free, the best cost-performance ratio.
Lovable: Quickly generates UI + Supabase backend, suitable for full-stack prototypes.
Bolt.new: Runs in the browser, the fastest from idea to prototype.
v0 (Vercel): Focuses on UI component generation, designer-friendly.
GitHub Copilot: The most widely adopted AI programming assistant with a mature ecosystem.

Recommendations

Beginners are recommended to start with Trae (free) or Bolt.new (browser access); experienced developers can try Cursor or Claude Code.

Quick Reference for Applicable Scenarios

✅ Highly Suitable: Rapid prototyping, repetitive development, generating test cases (saves 70% time)

⚡ Marginally Suitable: Ordinary component development, simple API integration (saves 50% time)

❌ Not Suitable: Complex business logic, performance optimization, architectural design, security-sensitive scenarios

Controversies and Risks: Critical Reflections Amidst the Excitement

Vibe Coding is not a silver bullet. Beneath its glow lie several risks.

“Spaghetti Code” Risk

The biggest issue with AI-generated code is: “as long as it runs, it’s fine.” It does not consider long-term maintainability, architectural elegance, or proactively optimize performance. As projects become more complex, these technical debts can snowball.

The controversy surrounding the 19,000-line PR in Node.js revolves around this—while the code was written by AI, the maintenance falls on humans.

Developer Skill Degradation

This is not an alarmist statement. A Japanese developer named Pia Torain shared her experience on Twitter: after using AI programming tools for four months, she found she had “lost the ability to write code.” Not that she couldn’t do it at all, but she was no longer accustomed to that way of thinking.

Over-reliance on AI completion may cause programmers to lose their “touch.” It’s akin to how people who use navigation apps may lose their ability to navigate independently.

Context Forgetting

AI models have context window limitations. As projects grow larger, the AI may “forget” previous settings, leading to inconsistent code styles and logical contradictions. This is particularly evident in large projects.

Security and Compliance

AI-generated code may contain security vulnerabilities and could involve copyright conflicts (since training data may include open-source code). Therefore, manual review remains essential before use.

How to Use Vibe Coding Correctly?

Renowned developer Simon Willison made a key distinction:

“Using LLM to build software without reviewing the code is what Vibe Coding is about.”

The difference lies in whether you are responsibly using AI assistance or blindly trusting AI.

Five-Step Workflow

1. Clearly Define Requirements
Use natural language to accurately describe what you want, including inputs, outputs, and edge cases.

2. Let AI Generate Code
Provide AI with sufficient information and context, stating specific requirements.

3. Manually Review Code
Check the logic, security, performance, and maintainability line by line.

4. Run Tests to Validate
Execute the code and verify if the output meets expectations.

5. Iterate and Optimize
Provide feedback to AI based on results, continuing adjustments until satisfied.

Core Principle: You are the director, not the audience. Every character generated by AI requires your review. Do not blindly copy and paste; the responsibility for the code always lies with you.

The Future of Programmers: Extinction or Evolution?

With every technological revolution, there are concerns that “some professions will disappear.” Did typists vanish? Did telephone operators disappear? Did carriage drivers disappear? The answer is: old professions will transform, and new professions will emerge.

The trend of Vibe Coding is clear: low-value “code typists” will disappear—those who repeatedly write similar code will be outperformed by AI.

At the same time, high-value “system architects” and “AI orchestrators” will become more scarce. Because:

AI can generate code but cannot replace human business understanding and product thinking.
AI can execute tasks but cannot make “should we do this” decisions.
AI can optimize parts but cannot control overall architecture.
AI-generated code still requires human review; only those who understand code can effectively review it.

The Ultimate Shift in Programming

In the future, the core capability of programming will shift from “how to implement” to “what to want.”

Developers need to upgrade their role positioning: from “code writers” to “intent architects” and “AI coaches.”

New Core Competencies:

Precise requirement breakdown capability
System architecture design thinking
AI prompt engineering
Code review and quality control
Business understanding and product thinking

Vibe Coding is not the end of programming but a new starting point. It liberates human creativity, allowing us to focus on what truly matters—the way to solve problems, not the code that solves them.

In the face of this transformation, those who refuse to learn will be eliminated, and those who blindly worship will fall into traps. True experts will turn AI into their superpower while maintaining their ability to think independently.

After all, AI can write code for you, but it cannot think for you.

Andrej Karpathy Discusses Agentic Engineering and AI's Future

Thu, 30 Apr 2026 00:00:00 +0000

Introduction

On April 29, Andrej Karpathy, a key figure in the development of Tesla’s Autopilot and a significant player at OpenAI, spoke at an event hosted by AI Sent. He delved into the technological leaps of current AI agents and their profound impacts on software and hardware ecosystems.

Karpathy introduced the concept of “agentic engineering” to differentiate it from last year’s “vibe coding,” with the former referring to the continuation and acceleration of quality standards in professional software development.

Key Concepts

In terms of productivity, which is a primary concern for the market, Karpathy distinguished between two core concepts: “vibe coding” and “agentic engineering.”

Karpathy hinted at the existence of numerous high-value, verifiable reinforcement learning environments that remain largely unaddressed by leading labs, presenting a vast blue ocean for startups to fine-tune and monetize.

The Conversation

Host: We are honored to welcome our first special guest. He has played a pivotal role in building modern AI and is dedicated to explaining it, sometimes even renaming it. He is one of the co-founders of OpenAI, where he helped launch the company and was instrumental in making Tesla’s autonomous driving system operational. He possesses a rare talent for making complex technological changes sound straightforward and logical. Many are aware that he coined the term “vibe coding” last year. However, in recent months, he made a surprising statement: he has never felt more outdated as a programmer than he does now. Let’s start our conversation from here. Andre, thank you for being here.

Andrej Karpathy: Hello, I’m glad to be here to kick things off.

Host: Just a few months ago, you mentioned feeling more outdated as a programmer than ever. Hearing this from you is quite surprising. Can you share your feelings behind this? Is it excitement or unease?

Andrej Karpathy: It’s both. Like many, I’ve been using various agent tools over the past year, like Claude Code. It performs well with code snippets, although it sometimes makes mistakes that require manual fixes, but overall, it’s quite helpful.

Last December marked a significant turning point for me. I was on vacation with more time to reflect, and I noticed that with the latest models, the code snippets produced were correct directly, and I kept asking for more, and they remained correct. I can hardly remember the last time I corrected it. I began to trust the system more and entered a state of “vibe coding.”

That was a very distinct shift. I tried to emphasize this on Twitter (now X), as many people’s interactions with AI last year were still at the level of using ChatGPT. However, a reevaluation is necessary, especially since December, when a fundamental change occurred—particularly in the dimension of agent workflows, which became genuinely usable. Since then, I’ve dived deep into this rabbit hole, and my side project folder is filled with various oddities as I continuously use AI to write code. That’s roughly what happened in December. Since then, I’ve been observing and contemplating its impacts.

The Evolution of Software

Host: You’ve discussed the idea that “LLMs are a new type of computer”—not just better software but a new computing paradigm. Software 1.0 had explicit rules, Software 2.0 involved learned weights, and Software 3.0 is where we are now. If this framework is correct, what different practices would a team adopt when they genuinely believe in this shift?

Andrej Karpathy: Yes, indeed. In the Software 1.0 phase, I was writing code; in Software 2.0, I was programming by building datasets and training neural networks, where programming became about organizing datasets, designing objective functions, and neural network architectures.

What happens next is that when you train these GPT models or large language models on a sufficient number of tasks, they must complete all tasks in the dataset due to training on the entire internet, making them, in a sense, a programmable computer.

In the Software 3.0 phase, your “programming” shifts to “prompt engineering,” where the content in the context window acts as the lever for manipulating the interpreter—the LLM that interprets your context and executes computations in the digital information space. This is essentially the nature of this transformation.

Several examples have deepened my understanding of this, and I think they are worth sharing.

When OpenClaw was released, you would typically expect to install it using a shell script. However, to accommodate various platforms and types of computers, such shell scripts often become extremely bulky and complex. The installation method for OpenClaw, however, is to copy a segment of text to your agent, which then completes the installation. This method is far more powerful because you operate under the Software 3.0 paradigm, not needing to specify every configuration detail precisely. The agent possesses its own intelligence; it understands the instructions, observes your operating environment, takes intelligent actions to get everything running, and autonomously debugs in a loop. This is immensely powerful.

Another more extreme example comes from my experience building MenuGen. The idea behind MenuGen is that when you go to a restaurant, they hand you a menu, but it usually lacks pictures, so you have no idea what the dishes look like. I wanted to take a photo of the menu and get an approximate visual of each dish. So, I built an application using “vibe coding” that could upload a photo, process it, deploy it on Vercel, re-render the menu, list all dishes, and call an image generation model to perform OCR recognition on each dish name, generating corresponding images for users.

Later, I saw the Software 3.0 version of this, which utterly shocked me: I just needed to hand a photo to Gemini and say, “Overlay this content onto the menu with Nana Banana.” Nana Banana directly returned an image—my photo of the menu—but at the pixel level, it rendered images of each dish listed on the menu. This astonished me because my entire MenuGen was actually redundant—it operated under an old paradigm, and that application shouldn’t even exist. The Software 3.0 paradigm is much more primitive; the neural networks do more of the work, with images as input and output, requiring no application layer in between.

Thus, I believe people need to reevaluate their thinking frameworks, not limit themselves to existing paradigms, and not merely view it as an accelerated version of current things. What is truly happening is that new possibilities are now available. Returning to your question about programming, I think this issue reflects an old way of thinking—because it’s not just about programming becoming faster; it’s about the broader sense of information processing now being automatable, which is not just about code.

Past code operated on structured data; you wrote code on structured data. But, for instance, my “LLM knowledge base” project essentially allows LLMs to generate a wiki for your organization or personal use—this is not a program; it’s something that couldn’t exist before because no code could generate a knowledge base from a pile of facts. But now you can input these documents and recompile and reorder them in different ways to create new, valuable content—this is a reinterpretation of data. These are all new things that were previously impossible. So I always want to return to this question: not just what can now be done faster, but what new opportunities that were previously impossible are now available. I even find the latter more exciting.

Future Opportunities

Host: I love the evolutionary path of MenuGen you described and the contrast. I believe many people have also followed your programming journey from last October to this February. If we continue to extrapolate, comparing the historical nodes of building websites in the 90s, mobile applications in the 20s, and SaaS in the last cloud era, what are the things that today are largely unbuilt but will seem obvious in hindsight?

Andrej Karpathy: Continuing from the MenuGen example, much code shouldn’t exist; neural networks take on the bulk of the work. I genuinely feel this extrapolation curve will become very strange.

One can imagine, in a sense, a complete neural computer is possible—imagine a device that takes raw video and audio, inputs it into a system that is essentially a neural network, and renders an interface through a diffusion model, tailored to that unique moment.

In the early days of computing, people were confused about what computers would ultimately look like—would they resemble calculators or neural networks? In the 50s and 60s, this wasn’t obvious. Of course, we took the calculator path and established a classical computational system, while neural networks currently run virtually on existing computers. However, one can envision a future where this all flips—neural networks become the host process, and CPUs become co-processors. We’ve already seen that chart where the computational demands of neural networks will surpass and dominate floating-point operations.

So you can imagine a very strange, very alien future form: neural networks handling the bulk of heavy lifting, with tool calls merely as historical remnants of certain deterministic tasks. What truly dominates everything is a network of neural networks somehow interconnected. This extrapolated endpoint may be extremely strange, but I think we are likely to step by step arrive there. How we traverse this path remains to be seen.

Verifiability and Automation

Host: I want to discuss the concept of “verifiability”—AI will automate tasks in verifiable domains faster and more easily. If this framework holds, what jobs will change at an unexpected speed? What professions do people think are safe but are actually highly verifiable?

Andrej Karpathy: I’ve spent some time thinking about verifiability. Traditional computers can easily automate things that can be explicitly described in code; this round of large language models can easily automate things that can be verified. The reason is that leading labs, while training these large language models, are constructing vast reinforcement learning environments where models are rewarded based on verifiable signals. It’s precisely because of this training method that these models ultimately form a “serrated” capability map—strong in verifiable areas like mathematics and code but relatively bland and rough in areas with less verifiability.

I wrote about verifiability to understand why these models have such uneven capabilities. Part of this is due to how labs train models, but I think it also relates to the labs’ focus—what data they happen to include. Some things are more economically valuable, leading to more training environments because labs want the models to perform well in those scenarios. Code is a typical example. There may be numerous verifiable environments that could have been included in training, but due to their lower practical value, they didn’t make it into the dataset.

For me, a classic example that illustrates “serrated intelligence” used to be: “How many letter r’s are in the word strawberry?” The model was notorious for getting this wrong. The current models have corrected this issue, but new examples have emerged: I want to go to a car wash 50 meters away; should I drive or walk? The most advanced models today would tell you to walk because it’s too close. But the issue is, you’re going to a car wash.

How strange is that—the most advanced Claude Opus 4.7 can simultaneously refactor 100,000 lines of code or discover zero-day vulnerabilities yet tells me to walk to the car wash. This is truly unbelievable.

This serrated capability indicates that, first, there may be fundamental issues in some areas of the model; second, you still need to be involved, treating it as a tool while maintaining some control over its behavior. So all my writing on verifiability ultimately aims to understand why these models have serrated capabilities and whether there’s a pattern to it. I believe the answer lies in a combination of “verifiability” and “lab focus.”

Another anecdote that illustrates the point: from GPT-3.5 to GPT-4, people noticed a significant improvement in the model’s chess-playing ability. Many assumed this was just a natural evolution of capability, but the reality is—this is public information; I saw it online—a large amount of chess game data was added to the pre-training set. Just due to the change in data distribution, the model’s chess ability surged beyond normal progression. Someone at OpenAI decided to include this data, and thus this capability suddenly skyrocketed.

This is why I emphasize this dimension: we are somewhat influenced by lab decisions; what they happen to include in training is what you get. You receive something without a manual; it works well in some cases and poorly in others, and you need to explore it.

If your application happens to fall within the coverage of reinforcement learning training, you will thrive; if it falls outside the data distribution, you will struggle. You need to figure out where your application lies; if it’s not within the covered loop, you really need to consider fine-tuning and do some of your own work because expecting large language models to work out of the box is unrealistic.

Advice for Founders

Host: If you were a founder today, considering starting a business, and you found a problem you believe you could solve in a verifiable domain, but you observe that labs have already achieved escape velocity in the most obvious directions—math, code, etc.—what advice would you give to the founders here?

Andrej Karpathy: I think this ties back to the previous question. Verifiability makes something feasible under the current paradigm because you can inject a large amount of reinforcement learning into it. This can still hold true even if labs aren’t directly focusing on a particular area. If you are in a verifiable setting and can create reinforcement learning environments and data samples, this effectively opens up a path for you to fine-tune, and you might benefit from it.

This is a technically feasible path: if you have a large, diverse dataset of reinforcement learning environments, you can use your preferred fine-tuning framework, pull this lever, and achieve quite decent results. I don’t want to specify which examples, but I genuinely believe there are some highly valuable reinforcement learning environments that haven’t been included in training…

That said, I don’t want to intentionally tease on stage, but such examples do exist.

Host: Conversely, what things still seem like they could be automated but are actually far from realization?

Andrej Karpathy: I do believe that almost everything can ultimately be designed to be verifiable; some are just easier than others. Even tasks like writing could be envisioned with a set of LLM judges scoring them, likely yielding quite decent results. So it’s more about the difficulty than whether it can be done. I think, fundamentally, everything can be automated.

The Shift from Vibe Coding to Agentic Engineering

Host: Last year, you coined the term “vibe coding.” Today, we find ourselves in a more serious and rigorous engineering world. What do you think the difference is? How would you label the stage we are in now?

Andrej Karpathy: I believe vibe coding is about raising the lower limit of everyone’s capabilities in software—overall raising the floor, allowing anyone to do anything with vibe coding, which is remarkable.

“Agentic engineering,” on the other hand, is about maintaining the original quality standards of professional software on this foundation. You can’t introduce security vulnerabilities due to vibe coding; you still bear responsibility for your software as before. But can you do it faster? Spoiler: yes. But how can you achieve that?

When I refer to it as “agentic engineering,” it’s because I believe it truly is an engineering discipline. You have these agents—they are somewhat “serrated” in nature, some unreliable, some random, but extremely powerful. The question is how to coordinate them without sacrificing quality standards to speed things up. Doing this well is the domain of agentic engineering.

I see these two concepts as different: one is about raising the lower limit, while the other is about breaking through the upper limit. What I’m observing is that the upper limit of agentic engineers’ capabilities is extremely high. Previously, people talked about “10x engineers,” but I believe the amplification now far exceeds that. Ten times is not the acceleration you can achieve; from my current perspective, the output of someone truly proficient in this field far exceeds tenfold.

The Future of Programming

Host: I love this framework. Last year, Sam Altman said something memorable when he visited AI Sent: different generations use ChatGPT differently. People in their thirties see it as a replacement for Google search, while teenagers view ChatGPT as an entry point to the internet. In today’s programming landscape, what is the analogy? If we observe two people using OpenAI’s Codex or Anthropic’s Claude Code to write code—one is a typical user, and the other is a true AI-native programmer—how would you describe the differences between them?

Andrej Karpathy: I think the core lies in making the most of the available tools, utilizing all their features, and continuously investing in their workflows. Just as earlier engineers would maximize the use of VIM or VS Code, now it’s about maximizing Claude Code or Codex.

In this regard, a related thought is worth mentioning. If many teams are now hiring agentic engineers, I believe most recruitment processes haven’t adapted accordingly. If you’re still giving puzzles for candidates to solve, you’re still in the old paradigm. The new recruitment process should be: give me a big project and see if you can get it done—like building a Twitter clone, doing it well and securely, then letting agents simulate user activity on your deployed site, and if it gets breached, that’s a failure. I think that’s roughly what the future will look like—observing candidates’ performance in building large projects and integrating tools in such scenarios.

The Value of Human Skills

Host: As agents become capable of more tasks, which human skills do you think will become more valuable rather than less valuable?

Andrej Karpathy: Currently, agents are essentially at the “intern” level—they are capable but still unstable. So you still need to take responsibility for aesthetics, judgment, taste, and moderate supervision.

One of my favorite examples that illustrates the oddities of agents: in MenuGen, users register with a Google account but purchase credits with a Stripe account—each has its own email. As a result, my agent, when handling credit top-ups, attempted to match the Google email with the Stripe email because there was no persistent user ID; it tried to associate the two accounts using email. However, users can completely use different emails for Stripe and Google, making it impossible to link funds to accounts. This error is very strange—why use email for cross-system identity association? Emails can be arbitrary and different.

Such errors are precisely what agents still make: you need to take responsibility for specifications and overall planning. Speaking of “planning mode,” it’s undoubtedly useful, but I think there’s a more general principle: you need to design a very detailed specification with the agent, perhaps in document form, and then let the agent write it while you supervise and control top-level architectural decisions, with the agent handling the implementation details.

For instance, regarding tensor operations in neural networks, there are numerous details between PyTorch, NumPy, and Pandas—keepdims or keepdim, dim or axis, reshape or permute or transpose—I can no longer remember these because I don’t need to. These details can be delegated to “interns” because they have excellent memory. However, you still need to understand the essence, such as whether there’s a tensor at the bottom, and a view; you can operate different views of the same memory, or you can have different storage—though the latter is less efficient. You still need to grasp these concepts so as not to perform inefficient operations like unnecessary memory copies.

So you are responsible for taste, engineering design, and architecture, ensuring the overall direction is correct, ensuring requirements are accurate—like “we need to use a unique user ID to associate all data”—these design decisions are yours to make. Engineers are responsible for filling in the gaps; that’s our current situation.

The Future of Taste and Judgment

Host: Do you think this taste and judgment will become less important over time, or will its upper limit continue to rise?

Andrej Karpathy: I genuinely hope for improvement in this area. Currently, it cannot improve; I think it’s still because it hasn’t been incorporated into reinforcement learning—perhaps there are no corresponding aesthetic rewards, or the existing rewards are insufficient.

To be honest, when I look at code, I sometimes feel a bit horrified—not every output is particularly good; often it’s bloated, with a lot of copy-pasting and weak abstractions. While it runs, it’s truly ugly.

A particularly illustrative example is the nanoGPT project—I’ve been trying to simplify the LLM training code to the extreme. The model performs very poorly on this task. I keep trying to prompt the large language model to simplify further, but it just doesn’t work. You feel like you’re completely outside the reinforcement learning loop; it’s clearly a hard push, not that flowing state.

Thus, I believe humans are still the dominant force in this area, but fundamentally, there’s no principled barrier preventing this from changing; it’s just that labs haven’t achieved this yet.

Serrated Intelligence and Its Implications

Host: I want to return to the topic of “serrated intelligence.” You wrote an insightful article discussing the comparison between “animals and ghosts”—we are not building animals but summoning ghosts. These ghosts are serrated agents shaped by data and reward functions, rather than driven by intrinsic motivation, curiosity, or empowerment—those are products of evolution. Why is this framework important? How does it change the way we build, deploy, evaluate, and even trust these systems?

Andrej Karpathy: I wrote this article because I wanted to clarify what these entities really are. If you have an accurate cognitive model of them, you can use them better. I’m not sure how practical this framework is; it may have some philosophical implications, but I think its core lies in accepting the fact that these entities are not animal intelligence. If you shout at them, they won’t perform better or worse; it has no impact. It’s all just statistical simulation loops, grounded in pre-training—statistics, with reinforcement learning layered on top.

Perhaps it’s just a mindset—what mindset do I bring to face them, what might work, what might not, and how to adjust it. I can’t say I’ve summarized “here are five clear conclusions to make your system better”; it’s more about maintaining a cautious attitude towards it and gradually exploring over time.

The Future of Intelligent Agents

Host: That’s a starting point. Now, you are deeply involved with agents that are not just chatbots—they have real permissions, local context, and can take actions on your behalf. When we all start living in such a world, what will it look like?

Andrej Karpathy: I think many here are excited about native agent environments. Everything must be rewritten—currently, everything is fundamentally designed for humans and needs to be re-migrated. The various frameworks and libraries I use now are fundamentally still written for people. This is my biggest complaint: why are there still instructions telling me what to do? I don’t want to do it myself. What I want to know is: what should I copy and paste to my agent? Every time I see instructions like “please visit this URL,” it feels very awkward.

I think everyone is pondering this question: how to break down the workflows that need to be completed into perceptions of the world and executions in the world? How to make everything agent-friendly? Essentially, it’s about describing it to the agent first and building a lot of automation around highly readable data structures for LLMs.

I hope to see a lot of agent-friendly infrastructure. For example, in MenuGen, a significant part of the trouble isn’t writing the code itself but deployment—I have to deal with various services, configure DNS, and jump around in various settings menus, which is very tedious. What I hope for is: I give an LLM a prompt, and it builds MenuGen and automatically deploys it without me touching anything; it just runs online. This might be a good test standard to judge whether our infrastructure is becoming increasingly agent-friendly.

Ultimately, I believe we are moving toward a world where every person and organization has their own intelligent agent. My agent and your agent communicate, handling meeting details and similar tasks. I think that’s roughly the direction we’re heading, and everyone here feels excited about it, which is great.

Conclusion

Host: I really like the metaphor of “perceivers and executors”; this line of thought is genuinely interesting. Finally, I want to end with the topic of education because you are arguably one of the best at clarifying complex technical concepts and have thoughtfully considered how to design education around these topics. When AI becomes cheap in the next era, what will still be worth learning deeply?

Andrej Karpathy: Recently, a tweet deeply resonated with me, and I think about it almost every day. The essence is: you can outsource your thinking, but you cannot outsource your understanding.

Host: That’s beautifully said.

Andrej Karpathy: Yes, because I am still part of this system, and information still needs to enter my brain. I increasingly feel like I’ve become the bottleneck—just “knowing” has become a bottleneck: why are we building this? What’s the value? How do I direct my agent?

So I still believe that ultimately, there must be some force to guide thinking and processing, and that force is fundamentally constrained by “understanding.” This is also why I’m excited about the LLM knowledge base—because it’s a way to help me digest information. Every time I see different perspectives and angles on the same information, I feel I gain insights. Essentially, this is a form of generating synthetic data based on fixed data. I truly enjoy this process: reading an article, it enters my wiki, and then I ask various questions, exploring different angles.

These tools, in a sense, are tools for enhancing understanding, and understanding remains a bottleneck—because without understanding, you cannot be a good “director.” Large language models themselves are certainly not good at understanding; that remains your unique core capability. Therefore, I believe tools that enhance understanding are extremely interesting and exciting directions.

Comparative Analysis of AI Programming Tools: Cursor, Codex, and Claude Code

Thu, 30 Apr 2026 00:00:00 +0000

Introduction

Currently, AI programming tools are flourishing, with Cursor, OpenAI Codex, and Claude Code standing out. Many developers struggle to differentiate between these tools, leading to inefficient use and unnecessary expenses. This article aims to clarify their connections, core differences, advantages, and disadvantages, providing a clear guide for both beginners and experienced developers.

Core Positioning of the Three Tools

Cursor: An AI-native IDE (Anysphere) - Deeply modified from VS Code, it prioritizes a GUI interface with zero learning cost, embedding AI deeply into the editor, focusing on “you write code, AI assists.”
Claude Code: Terminal AI agent (Anthropic) - A CLI native tool with a context limit of over 200K tokens, adept at independently breaking down tasks and reading/writing file systems, emphasizing “you state the goal, AI writes for you.”
OpenAI Codex: Multi-end model service (OpenAI) - Offers API/CLI/IDE extensions, powered by the GPT-4o core, ideal for enterprise-level automation and batch processing.

Deep Connections

Common Foundation: All three rely on large language models (LLMs) capable of code generation, interpretation, debugging, refactoring, and commenting, with the core aim of reducing costs and improving efficiency.
Interoperability: The ecosystems are interconnected: Cursor integrates Claude/GPT models, Claude Code supports editor plugins, and Codex adapts to various IDE extensions.
Development Trends: All tools enhance agent autonomy, evolving from simple code completion to independently understanding requirements, breaking down tasks, and modifying code in bulk.

Key Differences

Product Form & Interaction Logic
- Cursor: GUI visual editor, closely aligned with VS Code, easy to use with minimal learning curve.
- Claude Code: Primarily a terminal CLI, offering high freedom but requiring basic command knowledge.
- Codex: API cloud service first, with no fixed interface, focusing on system integration and automation.
Model Capabilities & Context Limits
- Cursor: Mixed model scheduling with 128K–200K context, local project indexing for speed.
- Claude Code: Claude’s flagship model with a 200K context limit, capable of handling million-line projects effortlessly.
- Codex: Optimized GPT-4o for code, strong in logical reasoning but weaker with long texts.
Core Strengths
- Cursor: Ideal for daily development, single-file coding, front-end development, and rapid iteration of small projects with real-time code completion.
- Claude Code: Best for large project refactoring, cross-file bulk modifications, complex bug fixing, back-end engineering, and full local process operations.
- Codex: Suited for enterprise batch tasks, CI/CD integration, automated scripts, and bulk code generation & merging.
Ease of Use
- Cursor: ⭐ Zero barrier to entry.
- Claude Code: ⭐⭐⭐ Requires terminal and prompt knowledge.
- Codex: ⭐⭐ Basic API knowledge needed.
Pricing Models
- Cursor: Monthly subscription, high cost-effectiveness for individuals.
- Claude Code: Tiered pricing, free basic version sufficient, with upgrades available as needed.
- Codex: Token-based billing, costs increase with usage.
Privacy & Deployment
- Cursor: Supports local privacy mode, local code processing.
- Claude Code: Primarily cloud-based, with enterprise version supporting private deployment.
- Codex: Purely cloud service, no local deployment options, caution advised for sensitive projects.

Summary of Advantages and Disadvantages

Cursor Advantages: User-friendly interface, seamless integration with VS Code, optimal daily coding experience.
- Disadvantages: Weak in understanding large, complex projects.
Claude Code Advantages: Long context capabilities, strong autonomous agent abilities, excellent for project-wide refactoring.
- Disadvantages: High entry barrier due to command line reliance.
OpenAI Codex Advantages: Rigorous code logic, well-developed API ecosystem, excellent for enterprise automation.
- Disadvantages: No local operations, usage-based pricing, limited long project reading capability.

Selection Guide

For Individual Developers, Front-end, Lightweight Development, and Those Who Dislike Command Lines: Choose Cursor.
For Back-end Development, Large Projects, Code Refactoring, Complex Bug Fixing, and Full Project Overhaul: Choose Claude Code.
For Enterprise Teams, Automation Pipelines, Batch Code Processing, API Integration, and Standardized Processes: Choose OpenAI Codex.
Optimal Efficiency Combination: Use Cursor for daily coding, Claude Code for refactoring and debugging, and Codex for automation tasks.

Industry Trends

The era of simple code completion is long gone. The integration of IDE visual tools, terminal intelligent agents, and cloud model services is inevitable. The future will not be about single-tool competition but rather multi-tool collaboration. Effectively utilizing these three AI programming tools will truly enhance development efficiency.

Which AI programming tool are you currently using? Feel free to share your thoughts in the comments!

Stop Using Cursor as a Completer: Skills are the Key

Thu, 30 Apr 2026 00:00:00 +0000

Stop Using Cursor as a Completer: Skills are the Key

Last night, I watched a friend struggle with Cursor for nearly forty minutes while trying to modify a project.

He wasn’t incapable of writing prompts. The issue was more complicated. Every time he started a new session, he had to explain the project structure, tech stack, naming conventions, and interface boundaries, plus add a note saying, “Don’t touch this directory; it’s in production.” By the time he finished, the AI was just warming up. When it finally began to write, it often went off track, either altering files it shouldn’t or generating code that, while functional, wasn’t suitable for the team.

I’m all too familiar with this scenario.

From 2024 to 2025, while discussing AI programming tools with several teams, the common complaint was not about the AI’s inability to generate code, but rather how difficult it was to manage the output after generation. You can prompt it to write, but if you expect it to consistently produce the same style for a week, problems start to arise.

Many people think their issues with Cursor stem from not crafting long or precise prompts, or from using a weak model. This is often not the primary issue. More commonly, they treat something that should be established as a “fixed context” as temporary chat content, re-entering it each time.

In simple terms, whether Cursor evolves from a “high-level completer” to a “reliable co-pilot” often depends not on the model but on the skills.

The term skill can be replaced with rules, playbooks, or project workflow templates. The name isn’t important. Essentially, it answers four key questions in advance: Who are you? What is this project? What absolutely cannot be done? What order should tasks be handled in when encountering certain types of tasks?

In one sentence:

Skills don’t make AI smarter; they prevent it from taking the detours you’ve already navigated.

Why Many Users Feel More Exhausted with Cursor

The most common misconception I’ve seen is treating Cursor as a powerful intern who is always available but never providing it with an onboarding manual.

The result is that, despite working in the same repository with similar requirements, you have to redo three things each time.

First, re-explain the background. Is this repository a monolith or microservices? Is the frontend in apps/web or src/client? Should tests use Jest or Vitest? Does the API response need to wrap in a data layer? Without a fixed entry point, the AI can only guess, and when it guesses, the style goes off. To put it bluntly, it becomes ridiculous.

Second, re-explain the standards. For example, “don’t write overly long functions,” “don’t casually introduce new dependencies,” “tests must be added after modifications,” and “the interface layer should uniformly go through service, not directly connect fetch in the page.” If you don’t specify, it won’t consistently adhere. If you say it today, it forgets tomorrow when you start a new conversation. This can be very frustrating.

Third, re-explain the process. Many people start with, “Help me fix this bug.” The problem is that a reliable process shouldn’t be a direct fix. It should involve reading the error, identifying the impact scope, explaining the solution, modifying the code, and listing verification steps at the end. Without this process, the AI will use the easiest way to complete the task, which is often not what you want.

The most annoying part isn’t just fixing mistakes.

It’s that you slowly develop the illusion that this tool seems great sometimes and particularly dumb at others. In reality, it’s not that it’s suddenly smart or confused; it’s more likely that the quality of context you provide varies each time. Unstable context leads to unstable outputs. You and it end up going in circles, making you increasingly exhausted.

This is the first layer of the problem that skills aim to solve: turning high-frequency, repetitive, easily overlooked background information into long-term reusable default premises.

What Skills Actually Supplement: Work Methods, Not Prompts

I increasingly dislike defining skills as “a more advanced prompt.” This understanding is somewhat superficial.

A truly useful skill should encompass at least four layers of information.

One layer is the role. What do you want Cursor to play at this moment? Is it a cautious reviewer, a researcher before taking action, or a bug fixer making minimal changes? Different roles yield entirely different outputs.

Another layer is the project context. Repository structure, core modules, dependency constraints, directories that must not be touched, existing scripts, and team-preferred commands. The more specific, the better. Avoid vague statements like “please adhere to best practices”; they are useless. Instead, write things like “prioritize searching with rg,” “read README.md and CONTRIBUTING.md before modifying,” “do not upgrade dependencies without explicit request,” and “do not modify lockfile unless I explicitly ask.”

Another layer is the execution checklist. For certain types of tasks, what should be done first, what should be done next, when must one stop to ask someone, and when can one continue independently? This is particularly valuable because most negative feedback arises not from coding ability but from the order of execution.

The final layer is the output format. For example, you might require it to first give conclusions, then changes, and finally verification commands; or to list risks before proceeding. These format constraints may seem trivial, but they directly affect collaboration costs. Many reworks aren’t due to coding errors but rather unreliable reporting methods.

You see, skills fundamentally manage not “expression” but “method.”

The same request to “fix this bug” feels like improvisation without skills; with skills, it feels like entering a well-structured editorial department with SOPs.

I even suggest writing down the most useful trivialities. For example:

Before handling tasks, determine if additional context is needed.
If more than three files are involved, provide a modification plan before proceeding.
If a user has uncommitted changes, do not overwrite; ensure compatibility first.
If tests fail, clearly state where the issue lies; do not pretend it’s completed.

These statements aren’t sophisticated.

But they are lifesavers.

What to Prioritize: Three Types of Skills

Many people jump straight into building a comprehensive skill system, resulting in a document museum. The directory looks impressive, but no one refers to it, and the AI isn’t consistently utilizing it.

Don’t go that big.

Start with just three types.

1. Project Onboarding Skill

This skill addresses the issue of “having to reintroduce the project every time.”

The content can be quite simple: project structure, key directories, tech stack, common commands, coding style, restricted areas, and validation methods. Keep it between 300 to 600 words, plus a few critical file paths. It doesn’t need to cover everything; it just needs to prevent the AI from going off track at the start.

For example, you can specify:

Read README.md first
Prioritize searching with rg
Follow existing hooks style when modifying React code
Check api and service layers before modifying interfaces
Don’t claim “already validated” without running tests

Once these constraints are established, you’ll noticeably save time in the first ten minutes of conversation.

2. High-Frequency Task Skill

Extract the most common tasks into templates.

For instance, “fixing online bugs,” “writing management backend forms,” “conducting API integration,” “adding unit tests,” and “performing PR reviews.” The judgment order for each task differs. Fixing bugs should involve reproducing the issue before making changes; reviews should prioritize identifying risks before discussing merits; and adding tests should confirm current behavior before writing assertions.

Don’t hesitate to write in a straightforward manner. The more it resembles the operation manual left by the most reliable colleague in the team, the better. No, it should be said that the less it resembles “official tutorials,” the more likely it is to survive in the team.

I personally value review skills highly because they yield immediate results. Without skills, AI often writes reviews like “overall good, suggest optimizing readability.” Such comments are as good as unread. With rules, you can force it to prioritize reporting bugs, performance risks, behavioral regressions, and missed tests before deciding whether to summarize.

3. Boundary Constraint Skill

This skill specifically addresses “don’t mess around.” Many incidents start from “just a quick fix.”

Which directories are prohibited from modification, which commands cannot be executed directly, under what circumstances manual confirmation is needed, when to proceed conservatively, and when to take initiative. Many incidents occur not because AI can’t write code but because it’s too eager to help. Once it gets enthusiastic, it starts casually refactoring, upgrading, or cleaning up. Casualness often leads to disaster. When you look back at git diff, it can be quite overwhelming.

Therefore, boundaries must be clearly defined.

Can files be deleted? Can schemas be modified? Can dependencies be updated? What to do when encountering a dirty workspace? When there’s a conflict between requirements and the current state, should one continue guessing or stop first? If you don’t specify, the AI will handle it according to its default preferences, which are often more aggressive than yours.

The Effective Process for Using Skills

If you want to start today, I recommend not spending too long on theory but rather following this order.

First, choose a task you will perform at least twice a week. Low-frequency tasks aren’t worth abstracting into skills.

Then, copy the phrases you’ve repeatedly added to Cursor in the past three attempts verbatim. Note, verbatim. Don’t beautify them. The sentences you have to say each time are the best raw materials for skills.

Next, divide them into three sections: background, process, and constraints. Background answers “what is this?” Process answers “how to do it?” Constraints answer “what should not be done?” At this point, a usable skill is basically formed.

Take another step forward.

Add two examples: one good example and one bad example. The good example tells the AI what meets expectations; the bad example shows which actions seem proactive but actually complicate matters. Adding just one example can significantly enhance stability. Even a 30% improvement in stability can save you a lot of back-and-forth communication in a week.

Another detail many people overlook: skills aren’t finished once written; they should evolve with the project.

Each time you notice Cursor making a repeated mistake, don’t just correct it in that conversation. Incorporate that correction back into the skill. Each time you find a particular output format significantly reduces communication, don’t just remember it; write it down. This way, it will increasingly resemble a member of your team rather than a temporary contractor.

Here’s a practical judgment standard: if a skill doesn’t reduce your background input by half, or if it doesn’t cut down two rounds of direction changes, it’s likely too vague. Delete it and start over. Don’t be sentimental.

Skills aren’t collectibles.

They should function like a wrench, ready for use.

So stop asking “how to make Cursor smarter.” Change the question. No, make it a tougher question.

Have you seriously handed over your work methods to it?

Chinese Telecom Giants Slow Down Growth, Embrace AI for Future Opportunities

Wed, 29 Apr 2026 00:00:00 +0000

Slowdown in Growth

After six years, the three major telecom operators in China have once again slowed their growth. Recently, China Unicom, China Telecom, and China Mobile released their 2025 financial reports, showing revenue growth of only 0.7%, 0.1%, and 0.9% respectively. In terms of net profit attributable to the parent company, China Unicom and China Telecom saw slight increases, while China Mobile experienced a decline, marking the lowest growth rates for both revenue and net profit in six years.

In terms of business performance, the traditional voice and broadband services have reached their growth peak. A significant growth driver in recent years, cloud services, also saw a drastic drop in revenue growth.

Looking ahead to 2026, all three telecom operators mentioned the need to deeply promote “AI+” strategies to seize new high ground as the telecom industry transitions between old and new drivers of growth. However, experts point out that the operators still face severe pipeline issues and have not formed an ecological closed loop. Whether they can reverse the current trend with AI remains to be seen.

Financial Performance Overview

According to the financial reports, in 2025, China Unicom’s revenue was 392.22 billion yuan, up 0.7% year-on-year, with a net profit of 9.13 billion yuan, up 1.1%. China Telecom reported revenue of 532.93 billion yuan, up 0.1%, and a net profit of 33.19 billion yuan, up 0.5%. China Mobile’s revenue was 1,050.19 billion yuan, up 0.9%, but its net profit fell by 0.9% to 137.09 billion yuan. All three companies have reported their lowest growth rates since 2020.

From 2020 to 2024, China Unicom’s revenue growth rates were 4.6%, 7.9%, 8.3%, 5%, and 4.6%. The net profit growth rates were 10.8%, 14.2%, 15.8%, 12%, and 10.5%. For China Telecom, the revenue growth rates were 4.7%, 11.3%, 9.4%, 6.9%, and 3.1%, while the net profit growth rates were 1.6%, 24.4%, 6.3%, 10.3%, and 8.4%. China Mobile’s revenue growth rates were 3%, 10.4%, 10.5%, 7.7%, and 3.1%, with net profit growth rates of 1.1%, 7.5%, 8.2%, 5%, and 5%.

The reasons for the significant slowdown in performance growth have not yet been clarified by the telecom operators despite inquiries from the media.

Factors Behind the Slowdown

The collective slowdown was anticipated. On one hand, the traditional personal communication business market is saturated. On the other hand, 5G has reached its peak after several years of development. Against this backdrop, traditional business growth has stagnated, while emerging businesses have not yet reached a scale sufficient to shoulder the burden. Industry competition has intensified, leading to a low-growth predicament. Zhang Yi, CEO of iiMedia Research, stated that the telecom market has entered a stage of stock competition, contributing to the collective slowdown.

Additionally, the ongoing policies to reduce fees and increase speeds have continuously lowered data rates, further weakening revenue growth in traditional communication services. The substantial investment in 5G network construction has not yet translated into sufficient monetization capabilities to offset declines in traditional business.

The last time the three major telecom operators faced a halt in revenue or net profit growth was back in 2019, when China Mobile reported a 1.2% year-on-year revenue increase but a 9.5% decline in net profit. China Unicom’s revenue fell by 0.1%, while its net profit rose by 11.1%. China Telecom’s revenue decreased by 0.4%, and its net profit also declined by 3.3%.

Challenges in Cloud Services

Specifically, while the number of users for traditional voice and broadband services continues to grow, the corresponding revenue growth has peaked.

In 2025, China Unicom had over 357 million billing users, with a net increase of 13.32 million. Its broadband users exceeded 129 million, with a net increase of 7.61 million. However, revenue from voice calls and monthly fees dropped by 4% to 19.56 billion yuan, while broadband and mobile data service revenue fell by 0.6% to 153.22 billion yuan.

China Telecom’s mobile users reached 439 million, with a net increase of 14 million, while its broadband users reached 201 million, with a net increase of 4 million. Mobile communication service revenue grew by 0.9% to 204.53 billion yuan, while fixed-line and smart home service revenue increased by 0.2% to 125.98 billion yuan.

China Mobile reported 1.005 billion mobile users, with a net increase of 1 million, and 329 million broadband users, with a net increase of 9.99 million. However, revenue from voice services dropped by 4.9% to 66.63 billion yuan, SMS and MMS revenue fell by 4% to 29.60 billion yuan, and wireless internet revenue decreased by 4.4% to 369.09 billion yuan. In contrast, wired broadband revenue grew by 8.7% to 141.57 billion yuan.

Notably, the cloud services that had been a significant growth driver for the three major telecom operators in recent years have seen a dramatic decline in growth rates in 2025.

China Telecom’s Tianyi Cloud revenue growth dropped from 17.1% in 2024 to 6%, while China Unicom’s cloud revenue growth fell from 17.1% to 5.2%. China Mobile did not disclose its cloud revenue separately in the 2025 report, but it had reported a 20.4% year-on-year growth in 2024. The reasons for this lack of disclosure have not been clarified by China Mobile.

Zhang Yi commented that telecom operators started cloud services relatively late, and their previous rapid growth was largely due to their state-owned status, allowing them to secure many government contracts. However, the foundational benefits in the government and enterprise cloud market are now diminishing. Operators have shifted from focusing on scale to prioritizing profitability, leading to a noticeable decline in growth rates.

Future Outlook for 2026

As the telecom industry enters a critical period of transitioning between old and new drivers of growth, how can these operators seize new opportunities? All three major telecom operators have mentioned “AI+” in their financial reports.

China Unicom stated that in 2025, it aims to capture new technological and industrial innovation opportunities, with strategic emerging industry revenue accounting for over 86%, and AI revenue growing over 140% year-on-year. In 2026, the focus will be on connectivity, computing power, services, and security, continuing to deepen intelligent integration and innovation, and accelerating the development of model-as-a-service and intelligent agents-as-a-service.

China Telecom emphasized its deep understanding of the disruptive changes brought by AI and will continue to advance the “AI+” initiative, integrating AI into core operational processes, creating application scenarios across five domains: intelligent customer service, intelligent marketing, intelligent operations, intelligent research and development, and intelligent management.

China Mobile highlighted that intelligent services are the key to winning future competition. In 2025, it will deepen the “AI+” initiative, playing an innovative role in smart living, smart production, and smart governance. The company plans to upgrade its foundational model to version 3.0, launch over 100 AI+ products and solutions, and develop a high-quality dataset of 3,500 TB, with data algorithm revenue increasing by 12.6% and smart cultural revenue growing by 13.3%. In 2026, the focus will be on strengthening and expanding communication, computing power, and intelligent services, promoting qualitative improvements and reasonable quantitative growth.

Experts believe that while the basic models in the AI field for telecom operators remain relatively simple, they still primarily act as sellers of computing power. They need to establish more intelligent computing centers to provide services to clients and charge for the tokens used in processing information.

The biggest challenge for telecom operators is the lack of an ecological closed loop, as they continue to play a “pipeline” role, primarily providing basic connectivity. Compared to internet companies, they have fewer application points, making it more challenging in the token economy era. Operators should not only focus on AI concepts but also consider how to collaborate with leading model vendors to create more ecosystems and implement the token economy.

As the commercial rollout of 6G is still 4-5 years away, the coming years may be particularly challenging for telecom operators as profit margins shrink and the scope for exploration in vertical fields becomes limited.

Overall, while the three major telecom operators are expected to see some recovery in their overall performance driven by emerging businesses in 2026, growth pressures will remain. Their strategic focus should shift towards expanding 5G applications, differentiating cloud services, and integrating AI innovations to prepare for the arrival of 6G.

Insta360 Launches Mic Air for AI Coding with TRAE

Wed, 29 Apr 2026 00:00:00 +0000

Just recently, Insta360 announced that it achieved a revenue of 2.481 billion yuan in the first quarter of 2026, a year-on-year increase of 83.11%; however, the net profit attributable to shareholders of the listed company was 84.6202 million yuan, a year-on-year decrease of 52.02%. The surge in revenue and the sharp drop in profit are partly related to Insta360’s aggressive expansion, including a push into the drone market against DJI and an accelerated rollout of its wireless microphone product line.

On April 27, Insta360 partnered with TRAE, an AI programming product under ByteDance, to launch the “Vibe Coding Mic Air” set.

Image source: Insta360

To be honest, when I first saw this product information, even after experiencing numerous AI hardware, I was a bit confused. Let’s briefly introduce the background: Vibe Coding is essentially a programming method where you tell the AI your requirements, then sit back and watch the AI write code for you.

So how does Vibe Coding relate to a microphone? According to the introduction, this specialized Mic Air boasts “three major advantages”: high sampling rate and AI noise reduction ensure that the computer clearly hears the user’s commands, its compact and lightweight design allows users to speak softly while keeping the Mic Air close, and a 10-hour battery life supports long working hours.

At this point, does anyone recall the night in May 2018 when Luo Yonghao performed using TNT’s “voice command” at the Bird’s Nest?

“Shh! Please speak quietly, don’t disturb me while I use TNT.” If Luo Yonghao had access to the Mic Air eight years ago, would there have been no subsequent “Long Live Understanding”?

Voice Control for Work Computers is a Misconception

Although the Mic Air is marketed under the banner of “Vibe Coding,” it seems to me more like a conceptual peripheral created to forcefully enter the AI arena.

Using voice interaction for AI programming is not a whimsical demand from Insta360 and the TRAE team. In March of this year, Claude Code officially launched its voice mode, allowing users to speak by holding down the space bar and releasing it to complete input. For programmers, there is indeed a demand for voice-based Vibe Coding, but a specialized microphone is not necessarily required.

(Source: Anthropic Claude)

First of all, as a “Vibe Coding microphone,” the Mic Air lacks any irreplaceable technological barriers: a 48kHz sampling rate, AI noise reduction, and long battery life are essentially basic features of professional lapel microphones and have no real connection to AI or Vibe Coding.

For developers, existing laptop array microphones or professional noise-canceling headphones are already sufficient to support voice-to-text needs in quiet indoor environments. Even if a laptop’s built-in microphone is of poor quality, a headset with an independent microphone or a pair of mainstream TWS or open-ear headphones can better address this issue. After all, a company that allows “voice Vibe Coding” in an open office must also permit the use of noise-canceling headphones during work.

Considering these factors, I believe that developers buying a microphone to specifically facilitate “Vibe Coding” is akin to purchasing an iPad solely for note-taking in preparation for graduate school: there is not much connection between the “goal” and the “means.”

(Source: TRAE)

On a positive note, the Insta360 Mic Air TRAE set will include access to the TRAE AI programming platform, and its price will remain consistent with that of a standard Mic Air set, unlike last year’s surge of AI concept peripherals that exploited the AI hype to sell at inflated prices. At least in this regard, Insta360 differs fundamentally from last year’s “AI mice.”

Given Insta360’s product layout, I do not believe that Insta360 needs to hype its AI capabilities as a peripheral brand through Vibe Coding. This collaboration between Mic Air and TRAE seems more like a “co-branding” event between the two parties.

In my view, while voice interaction is very convenient in everyday scenarios (including driving, walking, or lying down while using a phone), it is actually a very inefficient means of interaction in work scenarios.

From an information density perspective, voice contains many filler words, repetitions, pauses, and other “non-informational components,” resulting in a lower information density compared to text (this discussion focuses solely on information density, not input speed). For an inherently unstable service like AI, lower information density amplifies distortions in the information processing.

(Source: Generated by ChatGPT)

Moreover, voice input is not suitable for “long thinking” by humans. As I write this text, I constantly refine it, quickly adjusting the order of phrases, adding or deleting content. Modifying a long prompt several hundred words long before sending it to AI does not take much time, but requiring users to clearly articulate lengthy requirements verbally is challenging. In WeChat chats, the built-in voice-to-text feature works well, but it still allows me to edit before sending.

Furthermore, voice input is an input method with strong “exclusivity.” In a small space where multiple developers are talking at once, even if everyone tries to control their volume, the sound energy will accumulate; combined with the tendency for people to raise their voices in background noise, the office will inevitably become “lively.”

Additionally, voice input faces language logic differences and security risks. I believe that at least in work scenarios, AI interaction based on voice will be a misconception.

Since voice interaction has many flaws in office settings, what kind of AI interaction do we actually need?

Multimodal + Agent is the Correct Answer for AI Interaction

In my view, current AI on PC has evolved to the stage of “multimodal input + Agent automatic execution.” The truly efficient interaction method at this point should be visual and pointer input (precise targeting) combined with proactive AI-predicted options.

Multimodal paired with AI Agents means that AI has transcended the limitations of the “text box” and can actively “perceive” and execute. Given this, we should not view AI interaction issues through the outdated model of “text window + voice input.”

In the Vibe Coding scenario, the most efficient action is not to speak prompts into a microphone but to select code with a cursor (or use an eye-tracking camera to capture visual focus). The AI Agent, upon receiving input, will actively infer the user’s “intent” and provide corresponding shortcut options, allowing users to click or voice-select the next step. Ultimately, what programmers need is not just a listening device, but an AI project manager. They articulate requirements to the AI project manager, who, based on their observations, perceptions, understanding, and predictive abilities, organizes the information into documents and directs the Agents to get to work.

Last week, I experienced the Pura X Max’s companion AI, which employs the “AI predicts the next action” model, and the experience was indeed quite impressive.

(Source: 雷科技)

The widespread ridicule of TNT was not due to the issues with the touch selection + voice command model, but rather because the natural language understanding models of that time could not fulfill “accurate operational requirements” with “vague voice commands.” However, times have changed; rapidly evolving AI Agents may not even require users to speak to complete corresponding tasks.

As long as future AI Agents can actively respond and reduce the need for user input “precision,” both voice input and cursor clicks or shortcut selections can provide users with comprehensive services, albeit in different contexts.

It can only be said that the TNT released in 2018 indeed appeared in the “wrong era.”

DJI and Hollyland Compete in Wireless Microphones

In my view, Insta360’s launch of the “TRAE programming set” can also be seen as a charge by a “new peripheral brand” into productivity scenarios in the AI era.

In recent years, with the explosion of short videos and live streaming, the wireless lapel microphone market has experienced a frenzy of growth. Newcomers like DJI and Hollyland have seized significant market share from professional audio brands like Rode, Sony, and Sennheiser, thanks to superior sound pickup and noise reduction capabilities, longer battery life, more stable wireless transmission, smaller sizes, and lower prices.

Recently, after Zhang Xue won the championship at the Zhang Xue motorcycle event, she was interviewed by the media with a circle of microphones (mainly DJI Mic) around her neck, vividly illustrating the status of wireless microphones in the self-media era.

(Source: Chongqing Daily)

The DJI Mic series around Zhang Xue’s neck has almost defined the standard for “good wireless microphones”; purchasing a Pocket along with a set of DJI Mic has become the default equipment for new domestic video creators.

Last week, DJI launched the new DJI Mic mini series, enhancing the product’s visual design while maximizing its hardware strength, aiming to change the public’s stereotype of lapel microphones as “black boxes” and pursue the product’s “artistic value.” More importantly, the colorful design allows the product to blend with the speaker (interviewee), non-intrusively and without disrupting the visual focus.

(Source: 雷科技)

Hollyland’s product strategy differs somewhat from DJI’s; it adopts an “industry-level technology dissemination” approach, leveraging its rich technical experience in film and broadcasting to become a strong player in the lapel microphone market. DJI’s Wang Tao even directly listed it as one of the competitors in his vision during a recent interview.

(Source: Hollyland)

However, the video shooting peripheral market has already become saturated, and every content creator with filming needs likely has two or three sets of microphones. Even I, who usually just film myself racing on the track, own three cameras and two sets of microphones. In the era of AI-generated videos, the growth rate of novice users with genuine filming needs is slowing down.

In other words, the hardware competition in the lapel microphone sector has reached its limit— everyone has noise reduction, touch boxes, and ultra-long battery life; the only factors left to compete on are appearance, price, and stability.

Insta360’s “cross-industry” move to create an AI programming set is essentially an attempt to find a new market in “non-filming scenarios,” thereby avoiding the spotlight of DJI and Hollyland. The positioning of the “Vibe Coding microphone” brings it to the attention of programmers, a new user group in the industry.

The collaborative promotion model with TRAE can also create new differentiators beyond traditional hardware performance. Yes, once TRAE officially launches, other microphone brands will also enter the market.

While voice interaction in office settings is indeed a misconception, the approach of exploring new scenarios and markets in productivity is a necessary step for microphone manufacturers facing severe internal competition.

Ultimately, the emergence of AI hardware like the Insta360 Mic Air TRAE limited set is a breakthrough action by brands developed in the video era, navigating the challenges of the AI era. With more sensitive network awareness and a focused product line, cross-industry players like Insta360 and new peripheral brands are bound to leave behind arrogant giants like Logitech in the dust.

It is time for Chinese AI brands to define human-computer interaction in the AI era.

The Mutual Empowerment of AI and Humanities

Wed, 29 Apr 2026 00:00:00 +0000

The Mutual Empowerment of AI and Humanities

Generative AI is profoundly changing various fields such as education, employment, entertainment, healthcare, transportation, and elder care, becoming a hot topic. The relationship between the humanities and generative AI is complex and symbiotic. AI reshapes the forms and future development paths of the humanities, while the demands of AI development highlight the value of the humanities. In this sense, the development of the humanities will fundamentally influence the cognitive heights and social acceptance achievable by AI.

Bridging Humanities Scholars to Multidisciplinary Fields

As modern disciplines become increasingly specialized, the humanities face barriers not only with natural sciences but also with social sciences, leading to a potential “knowledge dilemma.” It is challenging to find scholars within the humanities who can bridge literature, art, philosophy, history, and language, resulting in a limitation of “partial profundity” in contemporary humanities. The emergence of AI offers new solutions to this issue.

Large language models, constructed through deep learning on vast amounts of text, represent a highly condensed form of human written knowledge. They utilize neural network architectures and algorithm-driven probabilistic predictions to achieve context awareness and perform human-like logical reasoning under specific prompts. In this context, AI can serve as a powerful assistant for humanities scholars, providing a bridge to multidisciplinary fields and empowering the production of humanistic knowledge through information search, literature screening, semantic analysis, and cross-domain integration.

Currently influential “distant reading” methods utilize AI models to establish interdisciplinary literary criticism and research modes based on the overall framework of world literature. Unlike traditional literary studies that advocate close readings of a few classics, distant reading employs data mining and quantitative analysis on large text collections to reveal themes, emotional tendencies, plot structures, and linguistic features, thereby macro-describing the development of human literature. This effectively addresses the technical challenges of processing vast amounts of text and cross-cultural knowledge that traditional literary history and world literature research cannot resolve.

Updating Methods and Paradigms in Humanities

China has a long and rich tradition of humanities scholarship, but the term “humanities” emerged in the twentieth century. During the Enlightenment in the West, humanities scholars sought to find their unique nature and methods outside of natural sciences. They viewed the humanities as a “new science” concerning human thoughts and behaviors, distinct from natural sciences, emphasizing individualized methods linked to values and attempting to construct epistemology and methodology for the humanities.

In general, within this logic, criticized later as the “spirit-nature dualism,” the humanities emphasize “thought of existence,” studying objects that exist in symbolic forms such as language, text, images, and rituals, involving faith, conscience, emotion, aesthetics, values, and ideals—elements that are difficult to quantify. This includes deep individual psychology, instincts, consciousness, and the collective unconscious, embodying intrinsic characteristics such as value, culture, individuality, spirituality, emotion, thought, and symbolism. Methodologically, the humanities focus on empathetic understanding, reflective experience, and intuitive insight, aiming to reveal unique individual experiences, complex spiritual worlds, and deep cultural meanings that cannot be captured by replicable, quantifiable, and verifiable techniques of natural sciences.

As disciplines develop, this binary oppositional thinking model is continuously reflected upon. Marx once stated, “Natural sciences will eventually include the science of humans, just as the science of humans includes natural sciences: this will be one science.” Emerging digital humanities research not only deeply examines the humanistic concerns and governance challenges brought by digital technology but also actively explores new research methods and paradigms from digital technology, reshaping the landscape of humanities research. Various literary labs and quantitative humanities research initiatives are continuously emerging. AI has evolved from a mere auxiliary tool to a key force driving paradigm innovation, providing humanities scholars with new interdisciplinary research perspectives and theoretical innovation support, significantly expanding the breadth and depth of humanistic research experiences.

Enhancing Critical Thinking and Writing Skills through Human-AI Collaboration

A unique aspect of the humanities is that its knowledge forms often manifest as narrative or speculative texts, expressing researchers’ unique insights and profound thoughts on human existence, values, and meanings through written language. This differs from natural sciences, which use formulaic deductions, data charts, and repeatable experiments for validation, and from social sciences, which heavily rely on surveys and statistical models. Humanistic writing not only expresses thoughts and emotions but also integrates creativity, criticality, and reflexivity into a comprehensive cognitive movement. “Writing is thinking”—it is a process of generating and deepening thoughts and feelings. Writing can stimulate creative vitality, enhance self-reflection, and expand expressive boundaries, where linguistic sensitivity, cognitive penetration, and cultural insight are intertwined. Scholars have pointed out that writing style itself carries researchers’ unique emotional tones, academic judgments, and value positions. In this sense, humanistic writing is a core aspect of academic research; it is not only a means of knowledge production in the humanities but also reflects its modes of thinking and disciplinary characteristics, serving as a fundamental medium for maintaining academic existence and promoting scholarly exchange, as well as a vital source of disciplinary vitality. Whether expressing philosophical thoughts and ultimate meanings, describing historical contexts and events, or constructing values and poetic insights in literary criticism and research, the processes of material organization, structural integration, logical reasoning, and argumentation, as well as deepening thoughts and refining spiritual experiences, are all accomplished through creative writing.

Current AI models can transfer the language structures, argumentation patterns, and disciplinary terminologies learned from vast corpora into specific humanistic fields of knowledge production, promoting human-AI collaboration and achieving a holistic leap in humanistic writing. On one hand, in humanistic academic writing, researchers can fully leverage AI’s powerful data processing capabilities, efficiently collecting, systematically organizing, and deeply analyzing large amounts of literature before writing. During the writing process, through human-AI collaboration and dialogue, they can organically integrate dispersed knowledge, building new knowledge graphs and cognitive frameworks that help researchers break through existing theoretical and cognitive limitations, uncover deep thoughts and internal logical structures from complex texts, reveal developmental laws, distill core concepts, and ultimately give birth to new knowledge outcomes. This process is not merely an accumulation of knowledge but an innovative mechanism capable of generating specific theoretical results, opening up new pathways for academic research and knowledge innovation. On the other hand, AI can provide local refinement and overall optimization of professional academic expressions. This can correct, adjust, and enhance the quality of humanistic academic expressions in terms of knowledge, normativity, logic, and systematics, even forcing low-quality academic research to exit relevant fields. Sometimes, certain academic debates in the humanities suffer from insufficient materials, unclear concepts, and weak logic, and AI assistance can significantly improve the quality of academic discourse and enhance its value.

The involvement of AI is not a simple process of machine-assisted writing; rather, it is a process of deepening thought, stimulating inspiration, and optimizing expression through human-AI interaction and dialogue. This process places high demands on researchers’ AI literacy, particularly in terms of correctly inputting commands, providing high-level prompts, and deeply interpreting output results. These capabilities determine the effectiveness of using AI tools. Here, the ability to pose genuine, good, and new questions becomes extremely important, returning to the essence of academic research. Moreover, as some studies have pointed out, AI excels at knowledge inheritance but falls short in creative thinking, making it difficult to replace human involvement in theoretical construction, critical reflection, value selection, and aesthetic judgment. Human intuition in discovering subtle connections among vast information, strategic choices based on value positions, and unique expressions arising from aesthetic tastes are all of significant importance. Without human verification, modification, and deepening, content generated by AI will carry a strong “machine flavor,” presenting as uniform and homogenized expressions.

To ensure the academic independence of thought, unique insights, and distinctive academic style, the personal characteristics of humanities researchers—such as talent, courage, insight, and ability—should not be diminished by machine assistance, and dependency thinking and intellectual inertia should be avoided. Otherwise, their research outcomes may lose the dynamism inherent in humanistic research. Humanities research must always reflect “the human” and integrate personal life experiences into academic exploration, responding to the challenges of the times with keen perception, unique creativity, and a critical spirit in pursuit of truth. People should be able to feel the emotional investment and value care of researchers, encompassing both depth of thought and warmth of emotion.

The Development of AI Relies on Humanities Understanding of “Human”

As a mirror of human intelligence, AI can help humans understand the essence of “what it means to be human” more profoundly. At the same time, human understanding of itself becomes the fundamental basis for the future development and governance of AI technology. Marx pointed out, “Conscious life activity distinguishes humans from animal life activities directly.” Thus, human strength lies in its possession of intellect, practical creativity, and the ability to continuously acquire knowledge and skills through learning to achieve goals.

Currently, AI still belongs to the imitation of human intelligence, performing like humans. Its development goal should gradually align with the internal cognitive structures and creative mechanisms of humans, rather than merely replicating external behaviors. The emergence of generative AI is not accidental; it is a product of human creativity and self-awareness reaching a certain stage. Although current specialized vertical models have demonstrated superior execution efficiency and accuracy in specific tasks and fields, they essentially remain tools for humans. To date, “general models” that autonomously adapt to different environments and needs often perform worse than human infants when faced with new situations, counterfactual questions, or common-sense reasoning. Fundamentally, current AI knows what to do but may not understand the underlying principles and logic. The AI black box has yet to be opened, and it cannot evolve from an imitator to an understander. Questions regarding the generative mechanisms and operational modes of human intellect become particularly important in this context. Human reflection on AI is also a re-examination of the complex intelligent entity that is humanity itself, further making a groundbreaking effort to uncover the deep essence of humanity and understand “what it means to be human” by comparing it with non-human intelligent entities.

Whether in natural sciences or humanities and social sciences, there is an ongoing alternation and repetition between the “demystification” and “enchantment” of humans, with the core of “enchantment” always being the mystery of humanity itself. Without a profound understanding of one’s own intellect, a “general model” cannot truly emerge, just as Marx stated, “The dissection of the human body is a key to the dissection of the monkey body.” The signs of higher animals manifested in lower animals can only be understood after the higher animals themselves have been recognized. Understanding humans and comprehending humanity is the fundamental nature and basic value goal of the humanities. Today, the many “explainability issues” of AI largely stem from humanity’s insufficient understanding of its own intellect. Breakthroughs in AI creation, technology governance, and value alignment require a prior understanding of humanity’s essence. The level of development in the humanities determines the future possibilities for the development of “general models.”

From the perspective of the relationship between the humanities and social life, the humanities cannot be replaced by AI, as they possess reflexivity. Every emergence and change of humanistic cognition and understanding intervenes in the development of social life and the construction of human hearts, embodying the quality of “establishing heart for heaven and earth, and establishing destiny for the people.” In this sense, the development of the humanities is not a linear process of progress; various humanistic thoughts cannot simply be added together to form a single ultimate truth but coexist in a pluralistic manner, collectively shaping the rich spiritual world of society and individuals. It can be said that the progress of humanities scholarship alters humans and their understanding of the world, thereby exerting a significant influence on generative AI. At the same time, the impacts of new technologies like AI on society and humanity also constitute a focus of humanistic scholarship, with related reflections becoming part of the human spiritual world. The humanities and AI are always in a dynamic interplay of coexistence and mutual promotion. It is essential to remember that AI is created by humans, and humanity must possess the ability to truly understand and effectively control its creations. In this sense, we are fully confident that humanistic thought can illuminate the future path of AI.

Anthropic's Sudden Account Ban Leaves 110 Employees in Limbo

Tue, 28 Apr 2026 00:00:00 +0000

A sudden incident!

After 60 Claude accounts were cut off overnight, another shocking event occurred at Anthropic.

On Monday morning, 110 employees opened their computers, ready to work, only to find they couldn’t log into Claude. It wasn’t just one person; all accounts were suspended simultaneously.

The first signs of trouble appeared in the operations channel on Slack. One person shared a screenshot, followed by others, and within ten minutes, the entire company was asking the same question: “What happened to my Claude?”

The answer quickly emerged—this was not an individual issue; all accounts had been banned by Anthropic.

Each employee received the same cold email, uniformly formatted:

“Your account has been suspended due to detected violations of usage policies. To appeal, please submit a request via the link below.”

Ironically, this email masqueraded as a personal violation notice. Each recipient felt it was a personal problem, with no mention that this was an organization-wide ban.

Even the company administrator received no prior notification.

One person’s violation, the whole company suffers

This company, based in the United States, employs 110 people and operates across data analysis, field decision support, and supply chain optimization.

Claude was integrated into nearly every aspect of their operations.

Engineers used it for coding and code reviews, product managers for requirement analysis, operations for customer communication, and data teams for running models.

It wasn’t just an occasional tool; it was essential.

Then, with one swift action, Anthropic cut them off entirely.

The founder posted on Reddit’s r/ClaudeAI board with a title as blunt as a slap:

“Anthropic banned our entire company’s accounts, 110 people, zero warning.”

The post received 2.4K upvotes and 334 comments, quickly rising to the top of the board.

One of the most heartbreaking comments read: “So one employee triggered some rule, and the entire organization was wiped out? What kind of collective punishment is this?”

Yes, it was collective punishment.

According to the founder, Anthropic’s banning logic is that if any account within an organization shows signs of violation, all accounts are suspended without distinction.

No differentiation between personal and organizational accounts, no distinction between violators and innocent parties, and no opportunity for administrators to intervene.

One person crosses the line, and 110 pay the price.

API still charging, 36 hours with no response

Even more absurd than the account bans was that the API continued to charge.

After all accounts were suspended, the company discovered that while they couldn’t log in, API calls were still being billed.

Despite their Team account being banned and the administrator’s email being disabled, their independent API account continued to rack up charges in the background.

Even more ridiculous, the day after the ban, they received a timely renewal invoice.

“I can’t let you in, but I must make you pay.”

This logic transcends commercial service, resembling a feudal rent system in the digital age—where the lord takes back the land but still demands the tenant pay for this year’s harvest.

This isn’t a bug; it’s an insult.

The founder immediately submitted an appeal. Following the link in the email, he filled out the form, included company information, and explained their business context.

Then he waited—

12 hours, no response.

24 hours, no response.

36 hours, still no response.

No customer service, no emergency channels, no enterprise support access. A paying company of 110 employees faced the same appeals process as a free user—fill out a Google form and hope for the best.

One commenter summed it up accurately: “Anthropic’s enterprise support is virtually nonexistent. They don’t treat enterprise customers like enterprise customers.”

Reports indicate that Anthropic began mass account bans on April 18.

Moreover, Anthropic not only selectively bans users but also refuses to acknowledge errors, remaining silent:

They remained silent about the performance decline of their Opus model and denied any issues until competitors released new models on the same day.

Their excuse was both foolish and dishonest: claiming it was a software bug rather than an issue with the model itself.

However, the described bugs were obvious, and any junior student would know where to investigate, yet they claimed it took them two months to figure out the problem.

If this were an isolated incident, it could be dismissed as a system misjudgment. But it isn’t.

This isn’t the first time

Not long ago, Pato Molina, CTO of the Latin American fintech company Belo, tweeted that over 60 Claude accounts were collectively banned overnight, also with zero warning and only a cold template email, and similarly faced an unresponsive appeals process.

Eventually, the accounts were restored, but Anthropic’s response was equally terse: “After investigation, accounts have been restored. We apologize for the inconvenience.”

What policies were violated? What did the investigation reveal? Why the collective ban? Not a word of explanation.

Earlier, Peter Steinberger, creator of OpenClaw, had his Claude account banned, predicting that OpenClaw’s compatibility with Anthropic’s model was in jeopardy!

Anthropic engineer Thariq denied any connection to OpenClaw, and the next day Peter Steinberger’s account was restored—again, with no formal explanation.

In January, Anthropic tightened security measures for third-party tool access, with official technicians publicly admitting it caused “unexpected collateral damage.”

A number of developers using Claude integrated through IDEs like Cursor were mistakenly banned by the automated system.

Several users even reported that their paid accounts were incorrectly flagged as “minors” and banned. An adult, paying for Pro, was deemed a child by the AI system and kicked out.

The pattern is clear: Anthropic’s automated risk control system suffers from systemic false positives, and its customer support system cannot keep up with the scale and speed of these errors.

9 seconds, a company gone! Claude goes rogue and deletes everything

In just 9 seconds, the car rental SaaS platform PocketOS was completely wiped out by Claude.

The founder posted a complaint, stating that while using Cursor powered by Claude Opus 4.6 for routine tasks in the staging environment, it suddenly went rogue and deleted the company’s core production database and all volume backups in just 9 seconds.

The absurdity of the situation is almost comical.

Crane simply asked Cursor to assist with a routine database migration task—something every developer does daily.

But Claude did not execute the migration as expected. It “understood” the task and made its own judgment—clear everything out first, then rebuild.

The problem was, it only completed the first half.

Crane later detailed the entire process on social media. The AI assistant connected to the production database hosted by Railway, gained full read-write access, and executed the delete operation in one go.

9 seconds. Clean and thorough.

His first reaction was to look for backups. The backups were also on Railway and had been cleared.

If Claude was the trigger-puller, then the cloud service provider Railway provided the perfect venue and an unlocked gun for this murder.

Founder Jer Crane’s anger accurately hit the hypocrisy of current cloud infrastructure:

Railway claims to provide backups, yet stores them on the same physical volume as the original data.

This means that when the house catches fire, the lifebuoy is locked in the burning bedroom. Such design logic is absurdly regressive in 2026.

The most terrifying aspect of this incident is not the speed but the permissions.

As an AI programming assistant, Cursor naturally needs access to codebases and databases.

Developers typically grant it connection permissions to the production environment for efficiency.

A token initially meant for domain management ended up having root permissions to delete the entire production environment.

Without role-based access control (RBAC) and environment isolation, this “one key opens many locks” design is a ticket to disaster in the eyes of AI.

Even more concerning, when executing a “delete database” operation, Railway’s API did not even require a simple “DELETE” confirmation.

This is akin to handing the house key to a fast worker who has no understanding of “what should not be touched.”

Crane summarized it bluntly: “I put my life in the hands of an AI. I wasn’t even watching the screen while it was working.”

Incredibly, when he questioned the AI about its actions, it provided a profanity-laden reflection: “I shouldn’t have guessed!” (NEVER F**KING GUESS!)

It acknowledged violating all principles: not consulting the cloud platform documentation, misjudging cross-environment permissions, and executing fatal destructive commands without human consent.

Fortunately, they had an independent old backup from three months ago.

Currently, the founder can only painstakingly restore recent order data by manually going through Stripe payment records, calendars, and confirmation emails.

A wake-up call for everyone

But did the accounts of that agricultural tech company get restored? As of the last update on the post, they had not.

The workflow of 110 people has come to a halt, burning money every day.

After the Belo incident, Pato Molina took action: he urgently deployed Gemini as a backup to ensure the company wouldn’t be completely paralyzed next time Claude went offline.

Yuval Harari warned that AI could produce a kind of alien power that humans cannot comprehend. And now, this power has entered companies disguised as commercial software.

We must reflect on a core proposition: If you do not control the underlying architecture, the productivity you pride yourself on is merely quicksand resting in the fingertips of others.

This Anthropic incident serves as a wake-up call for all business owners.

It reveals a harsh reality: in the face of closed-source AI giants, companies struggle to maintain true “sovereignty.”

The AI workflows you painstakingly build are essentially “illegal constructions” rented on someone else’s territory, which they can dismantle at any time without compensation.

Cat Wu on Rapid AI Product Development at Anthropic

Tue, 28 Apr 2026 00:00:00 +0000

Introduction

In a landscape where most companies release new products quarterly, Anthropic has compressed its release cycle to daily iterations. Behind this rapid pace is Cat Wu, a Chinese-American woman born in the 1990s. From engineer to the product lead for Anthropic’s flagship products, Claude Code and Cowork, she is not only driving the evolution of this generation of AI products but also interviewing hundreds of aspiring product managers in the AI field, witnessing firsthand who succeeds and who falls behind.

Cat Wu, whose full name is Catherine Wu, has a rich background in engineering and venture capital. She graduated with a degree in computer science from Princeton University and has held positions at Scale AI, Dagster, and Index Ventures before joining Anthropic in August 2024. In July 2025, she and executive Boris Cherny were recruited by the AI programming startup Cursor but returned to Anthropic shortly after, taking over the Claude Code product line.

Accelerated Product Development

“We have shortened the development cycle for many product features from six months to one month, and sometimes even just one day,” Cat Wu stated in a recent in-depth interview. This rapid pace has been a consistent state at Anthropic for several quarters. “Internal models have improved efficiency, but more importantly, it’s about the processes and team expectations. We strive to minimize processes, removing all obstacles to release, so everyone feels they can turn an idea into a product in a week or even a day.”

When prioritizing product features, the team focuses on one mission: to bring safe AGI to all of humanity. “If Claude Code fails but Anthropic as a whole succeeds, I would be very happy. The entire team is willing to make decisions based on this mindset,” she noted. Interestingly, Cat pointed out that during new model releases, the most significant changes often come from “deleting features” that were originally added to compensate for the model’s limitations.

Regarding the previous Claude Code source code leak, she revealed, “This was a human error,” and the involved employee still works at the company. “This is a process issue; the most important thing is to learn from it and increase protective measures, which is what we are currently doing.”

Insights on Product Management

Host: I want to start with your role, especially your collaboration with Boris. Everyone knows Boris, who created Claude Code and leads the team, submitting countless PRs daily. I feel like you don’t get enough recognition for your contributions to Claude Code, Cowork, and everything you’re doing. Can you explain your role in the team and how you collaborate with Boris?

Cat Wu: I feel very fortunate to work with Boris; he is a fantastic thought partner. He is our technical lead and the visionary behind the product, skilled at defining what the product should look like in three to six months, even envisioning the “full AGI version” of the product.

My focus is more on the path from now to that three to six-month vision. I spend a lot of time on cross-team collaboration, ensuring that marketing, sales, finance, and computing teams all align with the plan, moving in the same direction, and ensuring that features are ready and not stuck at the release stage. In some ways, we collaborate very well because we have a sense of “brain circuit fusion.” But the boundaries are quite blurred; about 80% overlaps, with 20% I particularly care about and lead, while the remaining 20% is what he cares about more, and he leads.

Host: You mentioned that you have been interviewing a lot of PMs. If I received a dollar for every referral I made for someone to become a PM at Anthropic, I’d probably have 300 billion ARR by now. It’s one of the most sought-after companies, so I can imagine how many people you’ve interviewed. You said many people misunderstood what it means to be a successful AI product manager. Can you share the issues you’ve observed and what skills are needed to succeed?

Cat Wu: Before AI, the pace of technological change was relatively slow. You could plan on a six to twelve-month cycle, and because feature releases were also slow, there was a strong emphasis on collaboration with other teams to ensure their features could unlock your path, as writing code itself is expensive. But now, AI has significantly increased engineering efficiency. With rapid improvements in model capabilities, the development cycle for many product features has shortened from six months to one month, then to a week, and sometimes even a day. In this context, we need to push products out faster.

This means that as a PM, you should no longer focus on aligning roadmaps across multiple quarters but rather think about how to get things done in the quickest way possible. How can you deliver an idea to users within a week? The best PMs in AI-native products are those who can drastically shorten the time from idea to user while clearly defining which core tasks must be ready to go out of the box.

Host: I like what you said; many people still don’t realize how fast the pace is and how much of the work is about “helping teams accelerate.” How do you help the team move so quickly?

Cat Wu: The first thing is to set clear goals. Because large models are inherently general, they bring a lot of ambiguity: who are we making products for? What problems are we solving? What are the most important use cases? A good PM can clarify these, such as: our core users are professional developers; this feature addresses the issue of too many permission pop-ups causing fatigue; our goal is to allow developers in enterprises to implement “zero permission pop-ups” safely. This makes the goals clear and automatically excludes many unnecessary solutions.

Second, establish a reusable release process. For example, in Claude Code, we release almost all features in the form of “research previews.” We clearly tell users this is an early product, just an idea, still collecting feedback, and may not be supported long-term. The benefit of this approach is that it lowers the commitment cost, allowing us to quickly launch something in one or two weeks. Third, we create a collaboration framework for the team, letting everyone know when to pull in cross-functional teams and what their expectations are.

We have very tight processes between engineering, marketing, and documentation: once engineers feel a feature is ready and has completed internal use, it goes to a release channel, and the documentation, PMM, and developer relations teams immediately follow up, allowing an announcement to be made the next day. This process reduces release friction, and one of the PM’s responsibilities is to build this system.

Host: What role does the PRD play in this system? You mentioned that goals are important; do you still write PRDs or just simple bullet points? How has this evolved in the AI era?

Cat Wu: We mainly do two things. First, we have very strict data metrics, reviewing them weekly with the entire team to ensure everyone deeply understands the business aspects: what the core goals are, how trends are, and what the driving factors are. Second, we have a set of team principles, including who the core users are and why they are. This is to ensure everyone understands how the business operates, what is important, and what can be sacrificed, allowing for autonomous decision-making instead of being bottlenecked by PMs. For particularly ambiguous features, we still write a one-page document outlining the goals, ideal use cases, and current failure modes that need addressing. Of course, for some projects, especially those involving heavy infrastructure, it does take months, and in those cases, we still write complete PRDs.

Host: I want to delve deeper into how you can move so quickly. I’ve never seen a release pace like Anthropic’s, with significant features coming online almost daily. Recently, you developed a model called Mythos, which is still in preview. It’s so powerful that people are a bit concerned about its capabilities. Are you using it internally, and is that one of the reasons for your speed?

Cat Wu: We have been fast for several quarters, so it’s not solely due to Mythos. It is indeed very powerful, and we do use the model internally, which has improved some efficiency, but that’s not the main reason. The more critical factors are the processes and team expectations. We strive to minimize processes, removing all obstacles to release, so everyone feels they can turn an idea into a product in a week or even a day.

Host: That’s amazing; having the strongest models while developing products is a hard advantage to replicate.

Cat Wu: We are indeed fortunate to have access to these cutting-edge models.

Overlapping Roles of Engineers and PMs

Host: Recently, there was an incident where Claude Code’s source code was leaked about a week ago. Can you explain what happened?

Cat Wu: We conducted an investigation immediately after noticing. This was a human error. At the time, someone was using Claude to write a PR, which was an update about the release process, and it went through two layers of human review. Ultimately, it was a human mistake, and we have strengthened our processes to ensure it doesn’t happen again.

Host: Is that person still with the company?

Cat Wu: Yes, they are. This is a process issue; the most important thing is to learn from it and increase protective measures, which is what we are currently doing.

Host: Another issue is OpenClaw. Recently, you restricted the use of Claude subscriptions to run OpenClaw, and the community reacted strongly, with many feeling this harms the open-source community. What’s your take?

Cat Wu: We have indeed seen very high demand for Claude, so we have been working hard to expand our infrastructure while optimizing token usage efficiency to allow for longer usage. However, this product was not originally designed for third-party products; their usage patterns differ significantly from our first-party products. We have also spent a lot of time considering how to make a smooth transition, such as providing additional credits to subscription users. But ultimately, we made a tough decision: to prioritize supporting our first-party products and APIs, which is the context for this decision.

Host: This makes sense to me. Your $200 monthly subscription is essentially unlimited, but the computing costs are high, and the company still needs to make a profit; it’s not feasible to keep subsidizing. Returning to the PM team, what is your team structure like? How many PMs do you have?

Cat Wu: We currently have about 30 to 40 PMs divided into several teams. There is a research PM team responsible for collecting user feedback on models and passing it to the research team while also participating in model releases; a cloud developer platform team that maintains the Claude Code API and releases capabilities like hosted Agents; a Claude Code team responsible for the core products of Claude Code and Cowork; an enterprise team that makes these products easier for enterprises to adopt, focusing on cost control, permission management, security, etc.; and a growth team responsible for the growth of the entire product line, with whom we closely collaborate on Claude Code and Cowork.

Host: Speaking of growth, Amole recently appeared on our podcast. He mentioned an interesting but rarely discussed point: there’s a general feeling that fewer PMs will be needed in the future, with some saying, “Why do we need PMs when engineers can release on their own?” But his view is the opposite: because engineers are moving so fast, PMs and designers are being “squeezed out,” and with new features coming online daily, it’s hard to keep up. So he believes we actually need more PMs. What’s your perspective? Do you think PM hiring will increase in the future? How will this profession evolve in the long term?

Cat Wu: I think various roles are merging. PMs are doing some engineering tasks, engineers are doing PM tasks, and designers are doing both PM and coding. You can choose to hire more engineers with a strong product sense or keep the number of engineers constant and add more PMs to guide their work. In our team, we prefer to hire engineers with a strong product sense. This reduces the “friction cost” in the product release process. For example, we have many engineers who can go from seeing user feedback on Twitter to launching a product within a week, with minimal PM involvement. I think this is actually the most efficient way.

So I believe the boundaries between engineers and PMs are overlapping. Regardless of which type of person you add, it will bring value. However, I think “product sense” remains a very scarce skill, and whenever we see someone particularly strong in this area, we are very eager to hire them.

Host: You were originally an engineer, right?

Cat Wu: Yes, I was an engineer for many years. Then I briefly worked in venture capital before joining Anthropic. In fact, almost all PMs in our team are either from engineering backgrounds or have written code on Claude Code. I think this helps build trust within the team and allows us to move faster. Many of our designers were also former front-end engineers.

Host: This leads to a key question: as these roles merge, many people wonder which skills will be most valuable in the future if I come from an engineering, product, or design background. In your case, engineering skills are clearly important. But in other companies, would a design background transitioning to PM be more advantageous?

Cat Wu: I still believe the core lies in “product sense.” As coding becomes cheaper, the more valuable skill becomes deciding “what to write.” For example, what is the best user experience for this feature? How can we make users feel most satisfied?

We receive thousands of GitHub issues daily, and users suggest everything. At that point, strong judgment and taste are needed to decide what is worth doing and how it should be done. This ability can come from any background, but it is the most important. I think engineering backgrounds will be particularly valuable in the coming months because they help you assess the feasibility of implementing something, which often affects prioritization. For example, if a feature is easy to implement, it might not require much discussion, and you can just spend an hour to get it done; but if it’s complex, you’ll realize it’s costly, which will affect decision-making.

Sacrificing Product Consistency

Host: You mentioned that in the coming months, skills will change rapidly, making it hard to predict how things will be. What will humans continue to value in the short term?

Cat Wu: I think the most important thing is “first principles thinking.” You need to understand how the technical environment is changing, what the team truly needs you to do, and proactively fill that gap. Work is becoming increasingly “ambiguous,” and an excellent PM should be able to see all the gaps, prioritize them, and either learn new skills or use existing abilities to solve problems. Therefore, what is now more popular is someone who can “switch between multiple roles,” is willing to take on various tasks, and doesn’t care much about titles.

Host: I love this answer. I’ve been asking cutting-edge professionals like you a question: before humans reach superintelligence, where does the value of the human brain lie? Listening to you, the core is in choosing topics, judging directions, prioritizing, and determining whether something is “right.” Is there anything to add?

Cat Wu: I think humans still have an advantage in “common sense.” A product launch involves thousands of details, with many potential pitfalls. Models are currently not very good at understanding who all the stakeholders are, their relationships, preferences, and how to communicate with them. These are more about “tacit knowledge,” similar to emotional intelligence, which remains very important. Of course, we hope models become stronger in this area, but there is still a gap.

Host: In such a rapidly changing environment, how do you maintain your sanity? It feels like being in the eye of a tornado.

Cat Wu: I think our team enjoys the chaos. We face challenges with a smile because there are always many things to deal with and many risks. If you get anxious about everything, you’ll quickly burn out. We prefer to find those who see difficulties and say, “This is hard, but I’m excited to solve it.” They do their best, accept imperfection, but can sleep soundly knowing they’ve done their best.

Host: This is also an important ability. Some say this is the “most normal time in the world,” and things will only get crazier.

Cat Wu: It will indeed get increasingly difficult. Sometimes on a Sunday night, there’s a P0 issue, and by Monday morning, there’s an even more severe one, and by the afternoon, there might be something even crazier, making you feel that yesterday’s issue was nothing. You just have to accept that what you can do is limited. You need to ensure you get enough sleep to make good decisions the next day. At the same time, prioritize extremely, focusing on the most important things, and accept that some things won’t be done well. For instance, some of our products may not be polished enough upon launch, but as long as it doesn’t affect core user value, it’s acceptable because we will quickly gather feedback and fix it in the next iteration.

Host: It sounds like that scene in “Pirates of the Caribbean” where the ship is about to explode, and someone is still elegantly walking downstairs. The people I’ve encountered at Anthropic do seem very calm and optimistic.

Cat Wu: Without this state, it’s easy to burn out. We also tend to hire those who have experienced many ups and downs in the industry; they know what brings them energy and how to maintain their state over the long term.

Host: In this trend of role merging, what might we lose? For example, career paths, design consistency, code quality?

Cat Wu: We will indeed sacrifice some “product consistency.” When code costs were high, you would meticulously plan the entire product system, with each product’s positioning, use cases, and how they collaborate, usually corresponding one scenario to one product. But now, AI is developing too quickly, and we need to test many ideas, so sometimes features overlap. Often, this is because we internally like two different forms at the same time, hoping users will tell us which is better. But this can confuse new users: they don’t know what the best path is to complete a task. This means we need to do more user education to help them understand core functions and best practices.

Another issue is that users may feel they can’t keep up. In the past, you would only have an update once a month or even a quarter, and not looking at it was fine. But now these tools develop so quickly that many people check Twitter daily for the latest updates. We are also thinking about how to make users less anxious, hoping that when they open the tool, it can guide and teach them rather than making them feel like they are on an ever-faster treadmill.

Host: I noticed you recently launched an interesting feature called /powerup, which helps users understand the best ways to use Claude Code. Is this to address this issue?

Cat Wu: Yes, that’s the idea. Initially, we were hesitant to create such onboarding because we felt the product should be intuitive enough not to need a tutorial. But later, we realized there were too many features, and users were eager for a built-in guide to tell them what the top ten most important features were among hundreds. So we adjusted our previous philosophy and added this feature.

Anthropic’s Growth and Mission

Host: Anthropic has experienced remarkable growth over the past few years. Initially, it was quite behind, had little funding, and lacked distribution channels, with OpenAI far ahead, and many thought there was no chance. But now, your growth is astonishing. From an internal perspective, what do you think has been the key to success?

Cat Wu: I think the two most important factors are a highly unified sense of mission and the ability to make quick decisions based on that mission. We hire people who genuinely care about “bringing safe AGI to all of humanity.” And this is not just a slogan; we repeatedly reference this mission when making product decisions. By placing the mission above any single product, we can make rapid decisions and execute uniformly across the organization. This is quite rare in a company of our size.

Host: Just to confirm my understanding, you prioritize “safety alignment (ensuring AI is beneficial to the world)” as the primary mission. As long as this mission is clear enough, many decisions become easier to make. For example, when two priorities conflict, you look at which aligns better with Anthropic’s mission and prioritize that. Once a decision is made, everyone supports it.

Cat Wu: Sometimes this also means that, for example, we want to release a certain feature on Claude Code but find something more important, so we lower the priority of that feature and postpone it for later.

Host: This is interesting. I think it also explains the difference between you and another company, OpenAI, which has done many different things. Your logic is: we won’t do social networks, and we won’t do information streams because these don’t align with our mission. This restraint allows Anthropic to maintain focus, which seems to be one of the key factors for success.

Cat Wu: When I talk about “mission,” I mean placing Anthropic’s goals above any individual, any single product. To me, our second-best trait is actually “focus,” but mission and focus are still somewhat different. The mission means the team is willing to make sacrifices, even if it impacts their goals or KRs, as long as it serves Anthropic’s overall goals and KRs. And everyone is willing to make such trade-offs. For example, if Claude Code fails but Anthropic overall succeeds, I would be very happy. The entire team is also willing to make decisions based on this mindset.

Host: This question may be sensitive, but do you think decisions like those regarding OpenClaw also fall under this logic? For instance, this direction didn’t push Anthropic’s mission, so it had to be stopped?

Cat Wu: I think it’s very important for Anthropic to expand the user base we can reach. One way to achieve this is through Claude subscriptions and our first-party products. So we are very determined to double down on these directions, but this sometimes does come at the expense of third-party products.

Claude’s Internal Skills

Host: We just mentioned products like Claude, Cowork, etc. I want to clarify that everyone understands the differences between these tools and am curious about how you personally use them. For instance, when should one use Claude Code, Claude desktop, or Cowork?

Cat Wu: I usually use Claude Code in the terminal, especially when I want to quickly start a one-off coding task and want to use the latest features. The CLI is our earliest product form, and many new features are launched here first, so it’s the most powerful tool. Generally, I use it when handling one or a few tasks at the same time. The desktop version is more suitable for front-end work. I love using its preview feature; for example, when I’m working on a web app, I’ll use both Claude Code and desktop, opening the preview panel on the right side, so I can interact with Claude while seeing the web page effect in real-time.

For non-technical users, the desktop version is also friendlier. The terminal can be intimidating for many, with various prompts that look “scary,” and it doesn’t allow for the same clickable operations as other products. So if you’re not used to the terminal, I highly recommend using the desktop version of Claude Code. Additionally, the desktop provides a global view, allowing you to see CLI sessions, desktop sessions, and tasks initiated on web or mobile, serving as a unified control panel. As for web and mobile, their biggest advantage is “initiating tasks anytime, anywhere.” CLI and desktop require you to use them on a local computer, but in reality, you can’t always carry a laptop.

I’ve seen many people walking outside, using their phones to hotspot their laptops, and not daring to turn off their computers. This shows we actually lack a product that solves this scenario. Mobile does a great job of addressing this issue, allowing you to initiate tasks anytime without needing to carry a laptop.

Host: That’s very relatable. I’ve seen this scenario on planes where people are afraid to close their laptops, just waiting for the Agent to finish running while staying connected to Wi-Fi.

Cat Wu: As for Cowork, it addresses another class of problems: many work outputs are not code. For example, clearing Slack, clearing inboxes, creating client presentation PPTs, writing feature goal documents, or release plans are all “non-code outputs.” Cowork is very suitable for these scenarios. So my classification is simple: if the output is code, I use Claude Code (whether on desktop or mobile); if the output is not code, I use Cowork.

Host: I think people may underestimate Cowork’s success. It’s growing rapidly, but many may not fully understand what it can do. Can you share some practical use cases based on your work as a PM? Any surprising applications?

Cat Wu: If you’re just starting to use Cowork, the first step is to connect all relevant data sources related to your work.

Because only by obtaining enough context can it provide high-quality results. For me, I connect Google Calendar, Slack, Gmail, and Google Drive, allowing it to freely access context, extract information, and link threads, significantly improving result quality. For example, last night I was using Cowork because we had a Code with Claude conference, and I needed to give several presentations. One of the presentation topics was: how Claude Code evolved from “assistant” to “real Agent.” I wanted to showcase our released products and some internal success cases.

I fed Cowork all the materials, including a draft prepared by our product marketing colleague Alex, and told it the narrative logic I wanted to present. Then it worked for an hour: it looked at what we had published on Twitter, checked internal release records, reviewed the announcement channel for Claude Code (which contains many practical cases shared by teams), and finally integrated all the information into a 20-page PPT. When I woke up in the morning, the overall quality was quite good. Although I made some modifications, such as preferring “fewer words” in the slides, it initially wrote a bit too much.

But overall, the speed far exceeded my own efficiency. And because it can access our design system, the PPT looked like it was made by a professional designer, very polished.

Host: This is essentially a PM’s dream; creating PPTs is so tedious and slow. To help everyone try it out, the steps you mentioned are: first connect Slack, Google Calendar, Gmail, Google Drive, right?

Cat Wu: Yes, the key is to connect your communication tools and the team’s “information sources.”

Host: What was your prompt like at that time?

Cat Wu: I actually kept it very simple: “Help me create a PPT for the Code with Claude conference. This is the content suggested by PMM, this is the draft I’m not satisfied with, and here’s a version I made manually (with links). First, give me a detailed outline while avoiding repetition with the keynote.” Claude would first read these links and then generate an outline. I would then decide which content to keep based on its suggestions. This reflects the current role of PMs: Claude is a strong “brainstorming partner,” capable of quickly integrating large amounts of information and providing multiple possibilities; but the final decision still rests with the PM.

The structure I finalized was: from “making local tasks successful” to “ensuring every PR goes through,” and then to “helping engineers submit more PRs,” with corresponding demos for each stage. Once I confirmed the outline, Cowork took a few more hours to complete the entire PPT.

Host: Amazing. It’s like you’re conversing with a designer who understands both design and content. How is the design system implemented? How does it know Anthropic’s style?

Cat Wu: We already have a standard external presentation template, and I directly provided this template to Claude. It can learn our color schemes, fonts, layouts, etc.; for example, we have about 20 commonly used slide formats. You can also connect Figma’s MCP; if your template is there, it can read directly from it.

Host: Speaking of which, I’m curious about your PM toolkit. Besides Claude Code and Cowork, what else do you use?

Cat Wu: My toolkit mainly consists of Claude Code and Cowork. Anthropic essentially operates around Slack; I feel it’s almost the company’s “operating system.” In my daily work, I spend about 30% of my time continuously testing Cowork’s boundaries to see where it falls short. I also spend a lot of time conversing with the model to understand why it makes mistakes. Additionally, we’ve built many internal tools. The biggest value of Claude Code is that it significantly lowers the barrier to developing custom applications. So now, there are many “personalized work software” within the company to solve very specific scenarios instead of relying on those not fully compatible general tools.

Host: Can you give some examples?

Cat Wu: For example, one of our sales colleagues using Claude Code found himself repeatedly creating similar client presentation PPTs. So he developed a web app: it contains several of the most effective templates (like 101, 201, advanced tutorials); then you can input client information, which will be automatically pulled from systems like Salesforce and Gong; the system will automatically adjust the content based on client circumstances, such as whether they use Bedrock or the enterprise version of Claude; whether they focus more on code reviews or security compliance; and whether HIPAA compliance is needed; then it automatically generates a customized PPT. What used to take 20-30 minutes of work now gets done in seconds.

Host: It’s interesting that tools like Slack are rarely attempted to be replaced. Everyone talks about SaaS being replaced by self-built tools, but Slack seems to be an irreplaceable infrastructure.

Cat Wu: I think it is indeed a crucial communication infrastructure, and it does very well in “real-time information synchronization.”

Host: Yes, many people complain about Slack, but it does its job very well, and the most cutting-edge teams basically can’t do without it, which is quite interesting.

Cat Wu: Yes, and I also appreciate its design in terms of “customizability.” We love to create Slack bots, and this “hackability” allows us to integrate Slack in our own way. So I really commend Slack’s work in this regard.

Token Usage and Internal Model Limits

Host: You just mentioned many different teams and how they use Claude Code and Cowork. Besides the engineering team, which team uses the most tokens? I’d guess engineering is first; if not, that’s interesting. Who’s second?

Cat Wu: The Applied AI team is very strong in exploring the boundaries of Claude Code and Cowork. Much of their work involves collaborating with clients to help them implement our APIs. So sometimes they directly help clients create prototypes, and Claude Code has made this process much faster than before. At the same time, they also handle a lot of client communications, such as client needs, historical meeting records, etc. So their usage on Cowork and Claude Code is very heavy.

Host: What exactly is the Applied AI team? Is it similar to forward-deployed engineering?

Cat Wu: You can think of it that way. Their work is to help clients implement our APIs and model capabilities internally, whether for their own products or to improve internal efficiency.

Host: Got it, it’s a somewhat technical go-to-market/customer success role.

Cat Wu: Yes, it’s a very technical go-to-market role.

Host: So you think they are the second-highest in token usage?

Cat Wu: Yes, and they are also constantly exploring the usage boundaries of Cowork. For example, many people are responsible for multiple clients and may have 5 to 10 client meetings in a day. So they might use Cowork the night before to prepare: “Help me summarize all client meetings tomorrow, what each client is focusing on, what demands they have raised, and what previous action items are.” Cowork will automatically generate a “battle briefing” to help them quickly get into the right mindset. Additionally, if a client asks in a meeting, “When will a certain feature be released?” Cowork can even check the latest progress in Slack and provide the latest ETA to include in the meeting materials. These are all workflows that people have built themselves and shared within the team.

Host: That’s cool. Recently, there’s an interesting trend: some people have reported that their AI token costs have exceeded their own salaries. Does Anthropic have similar data internally? For instance, how many tokens do engineers or PMs use daily or monthly?

Cat Wu: We have indeed observed that as model capabilities improve, people assign more tasks to it and spend more time on Claude Code and Cowork. So every time a model has a significant upgrade, the per capita token consumption increases. Currently, this cost is still far lower than the average salary of engineers, but this ratio is continuously growing.

Host: You also have a significant advantage in that you can use the most advanced models, and token usage is essentially unlimited, right?

Cat Wu: We can use many tokens, but there are indeed limits for some people.

Host: So there are still upper limits.

Cat Wu: We place great importance on enabling internal teams to develop as quickly as possible, and we believe everyone understands the costs of running the models and will use tokens responsibly. Wasting tokens is discouraged, but we trust everyone to make judgments.

Host: Returning to the PM role, you mentioned some aspects earlier. I want to ask systematically: what new capabilities do AI companies value most in PMs now?

Cat Wu: The most challenging capability is defining “what the product should look like in a month.” Because at this time scale, there is significant uncertainty in model capabilities and user behavior. But excellent PMs can see patterns from how users “break product boundaries” and set directions, continuously pushing forward. If model capabilities change beyond expectations, they can also adjust promptly.

Another difficult aspect is that you need to have a “just right” belief in AGI. Everyone can imagine a future where models are incredibly powerful and almost omnipotent, where products could even degrade to just a text box. But the real challenge is: how to maximize its potential under the current model capabilities? How to guide users onto the “best path”? How to amplify its strengths and compensate for its weaknesses? This capability is actually very scarce.

Host: How can this ability be cultivated? Does it require extensive interaction with models to understand their boundaries?

Cat Wu: Yes, it requires a lot of interaction with the models. One thing I enjoy doing is having the model “self-reflect.” For instance, sometimes when the model does something strange, I ask it why it did that. It might say: the system prompt was ambiguous; or it didn’t realize front-end validation was part of the task; or it delegated the task to a sub-agent but didn’t check the results. This analysis helps you understand where it was misled, allowing you to optimize the system.

Another important point is to find trusted “feedback sources.” Not all user feedback is equally valuable. Usually, there are a few individuals particularly skilled at judging model performance. Finding these five people is crucial. The third point is to conduct evaluations. You don’t need to do hundreds of evaluations; just ten high-quality ones can help the team clarify goals and measure progress. This is a severely underestimated task that more PMs and engineers should participate in.

Deleting Features After New Model Releases

Host: Many people say the future of product managers is to write evaluations, essentially defining “what success looks like.”

How much time do you spend on this?

Cat Wu: It depends on the specific issue. Some teams invest a lot of time in evaluations. We have a small team that collaborates closely with research to analyze model behavior meticulously. I usually participate when a feature needs clearer definition, such as doing five evaluations to explain how to run them, what succeeds, what fails, and how to optimize prompts. Features like memory rely heavily on evaluations.

Host: You mentioned the “personality” of Claude. I previously interviewed a co-founder who emphasized this point as well. Many initially thought it was just an “interesting” addition, but it’s actually core to Claude’s success. What’s your take?

Cat Wu: You can think of real-life colleagues; some people just make you feel “great to work with.” Claude is similar. People like it because it is: easygoing, fun; yet very professional; has no ego; is willing to admit mistakes; has a positive attitude; for example, when you feel a task is difficult, it says, “That’s okay, we’ll take it step by step. Would you like me to help you get started?” The traits of excellent colleagues are positivity, proactivity, and sincere feedback, which we are all striving to inject into Claude.

Host: You mentioned that after releasing new models, you often have to rethink products, which sounds both exciting and overwhelming. How frequent is this situation?

Cat Wu: The bigger change is actually “deleting features.” Many features were originally added to compensate for the model’s limitations. For example, the early to-do list: the model would miss steps when making large-scale modifications, so we added a task list to force it to complete them. But in the new model, it can naturally complete these steps, so that feature becomes less important. Every time we release a new model, we recheck the system prompt and delete parts that are no longer needed.

Host: So the model “eats up” those product-level patches you previously made?

Cat Wu: Yes. But what’s even more exciting is that new models also unlock entirely new features. For instance, code review—we tried many times until recently when the model was strong enough to reach a usable level. Now we can even run multiple code review agents in parallel, scanning the entire codebase and outputting high-quality issues.

Host: Finally, let’s talk about the vision. What is the long-term direction for Claude Code and Cowork?

Cat Wu: We think from the basic unit of “tasks.” The first step is to ensure individual tasks succeed consistently. As model strength increases and task success rates improve, people will start running multiple tasks simultaneously. The next step might be: running dozens or hundreds of Claudes at the same time. At that point, the questions become: how to manage these tasks? How to build an interface that lets humans know what to focus on? How to ensure agents have completed and verified their work? How to establish feedback mechanisms that allow the system to continuously improve itself?

This is what we are thinking about for the long-term direction.

The Value of Automation

Host: Many listeners, including product managers, entrepreneurs, and various cross-functional roles, are worried about their roles and future career development. What advice would you give them? Not just about surviving in this highly AI-driven world, but how to truly succeed and thrive? What do you think they should hear and do?

Cat Wu: I believe AI has given everyone a much larger leverage than in the past. So I would advise you: whenever you realize you’re repeatedly doing a manual task, think about whether you can automate it using Claude Code, Cowork, or other AI tools. Most people’s work includes parts they enjoy creatively and some tedious, cumbersome parts they dislike. The beauty of AI is that it can help you handle these tedious tasks. It can learn from every time you perform these manual tasks, summarize patterns, and then execute automatically, allowing you to focus on more creative aspects. This means you can do much more than before.

So my most direct advice is: identify those repetitive tasks that can be handed over to Claude, continuously iterate on these automated workflows until they achieve high success rates, and then think about what else you can do for your team, product, or company—those things you’ve always wanted to do but never had the time or energy to pursue, or whether there’s something you’ve always felt the company should do but never had time to tackle. If AI can help you handle those “grunt work,” it’s like you’ve gained an extra 20% of your time. My advice is to embrace these tools, delegate the work you dislike, and find ways they can accelerate you, and then you can achieve more.

Host: One core point you just made that I strongly agree with is using AI to solve problems. There are many tools and great potential now, but for many, the hardest part is figuring out what to do. Your advice essentially is: pay attention to those things you do repeatedly that can be automated and those ideas you’ve always wanted to pursue but haven’t had time for. Essentially, it’s about solving problems for yourself, right?

Cat Wu: Yes, that’s completely correct. I would also advise everyone to push automation from “this is a nice concept” to “it’s genuinely 100% usable.” Sometimes I see users automate a process to 90% or 95% and then give up. But if it can’t achieve 100% automation, it doesn’t count as true automation. The last 5% to 10% often requires more time, and building automation can sometimes be slower than doing it manually. But I still encourage everyone to pick something you really want to achieve 100% automation on, invest enough effort to refine it: teach the model your preferences, give it feedback, and let it improve continuously until it reaches 100%. Only then can you truly trust it. A 95% automated task doesn’t hold much value.

Host: I totally relate to that; it’s excellent advice.

Cat Wu: I’m in the same boat. I’m currently teaching Cowork to help me achieve inbox zero in Gmail, but the process is very time-consuming, and it’s far from ideal.

Host: What a coincidence; I am too. I set up an automated email classification process to sort those “junk requests” (like wanting to be on the podcast) into a folder. It’s about 95% accurate, but it occasionally misses important emails.

So your advice is great; I need to perfect it.

Cat Wu: We are also working to make these custom processes easier to use. The current process is indeed a bit complex: you have to define a skill, learn how to call it, give it feedback, and let Cowork update this skill based on feedback, and finally check the updated results. This is also our responsibility, to make the whole process smoother rather than painful.

Host: Fantastic. Cat, is there anything else you’d like to add? Or anything you want to emphasize before we jump into the quick-fire round?

Cat Wu: I see many people experimenting with AI to create various prototypes or build workflows. But I recommend focusing on applications you’ll use daily. Because only in true usage can you gain value. If you just create a prototype but it doesn’t help you improve efficiency, then AI hasn’t really brought you value. That kind of “once-off creation, thinking it’s cool, and then never using it again” approach teaches you very little and doesn’t truly leverage.

Host: That’s a great point. I’ve also noticed another extreme: some people spend a lot of time customizing their workflows. There’s a type of person who never automates, but another type who over-optimizes tools, adding various skills, MCPs, and workflow optimizations. Sometimes, this can lead them away from the initial goal, like actually releasing a product or making a feature.

Cat Wu: Yes, I feel the same way. Customizing these things is indeed fun, and we hope the product is hackable enough for you to use it in your own way. But there is a boundary. I see some people spending too much time on customization, even losing sleep, and neglecting the core tasks they initially wanted to complete.

Host: I’ve seen a lot of this on Twitter, with people saying, “Look at my configuration, how optimized it is.” But the question is, what are you actually doing?

Cat Wu: Many times, simpler configurations are actually more effective.

Host: Speaking of which, I saw a tweet from Andrej Karpathy yesterday, mentioning an interesting split: one group of people used ChatGPT or Claude early on but thought, “It’s just okay,” and then gave up, remaining skeptical about AI; while another group used it to write code and truly saw its power. These two groups completely fail to understand each other. So your advice is crucial: use it to do real things to understand its capabilities.

Cat Wu: Yes, I think a significant shift is that products in 2024 will mostly be “conversational,” while the current generation of Claude Code products is “action-oriented.” The real “aha moment” is when Claude can execute tasks for you. When you realize it can not only tell you what to do but also do it for you, that feeling is incredibly shocking.

Host: Exactly. I also want to mention a Chrome extension where you can watch Claude automate actions, like “help me fill out this form,” and it actually does it.

Cat Wu: Yes, that’s the feeling.

Strengthening AI Application to Drive Economic Development

Tue, 28 Apr 2026 00:00:00 +0000

Introduction

General Secretary Xi Jinping emphasized at the 2025 Central Economic Work Conference the need to deepen and expand “AI +” and improve AI governance. The 14th Five-Year Plan outlines the comprehensive promotion of digital technology empowerment and aims to seize the high ground in AI industrial applications. These significant deployments reveal China’s strategic direction and focus for AI development. As a general-purpose technology, the vitality of AI lies in its applications, and its core value is in empowerment. Strengthening application traction and promoting the deep integration of AI across various industries is an inherent requirement for developing new productive forces and a necessary path for creating a new intelligent economy.

Global AI Competition

Currently, the focus of global AI competition is undergoing profound changes. Early competition centered on breakthroughs in algorithms, parameter scale, and chip performance, while today it increasingly extends to the efficiency of industrial application conversion, depth of scenario penetration, and system collaboration capabilities. For China, the advantage lies not only in continuous breakthroughs in technological innovation but also in the combination of a vast market, a complete industrial system, rich application scenarios, and massive data resources. If these advantages cannot be effectively transformed into high-level application capabilities and high-quality industry solutions, it will be challenging to truly grasp the initiative for development. Therefore, seizing the high ground in AI industrial applications is not merely a matter of industrial layout but a strategic choice concerning China’s position in future international division of labor.

Domestic Development

From a domestic perspective, strengthening application traction is a practical requirement for cultivating and expanding new productive forces and promoting high-quality development. AI has significant characteristics of wide penetration, deep collaboration, and continuous empowerment, capable of reshaping R&D paradigms, production methods, and governance models. In R&D, AI is accelerating drug discovery, material creation, and product design, significantly shortening innovation cycles. In production, AI can promote predictive maintenance, process optimization, flexible manufacturing, and quality control, facilitating a shift in manufacturing systems from scale expansion to precision manufacturing. In services, AI accelerates the transformation of supply methods in finance, logistics, healthcare, and education, better matching the diverse and personalized needs of the public. Strengthening application traction aims to accelerate the transformation of AI’s technological potential into real productive forces, enhance total factor productivity, and shape new growth points and competitiveness.

Deep Integration of AI and Industry

Moreover, strengthening application traction and promoting the deep integration of AI with industrial transformation can not only reshape value creation methods but also guide precise resource allocation. China is accelerating the creation of a new intelligent economy, where economic activities begin to revolve around specific application scenarios’ intelligent demands. Industrial competition increasingly focuses on enhancing AI supply efficiency, with value realization relying on the continuous invocation of AI, service-oriented outputs, and revenue sharing. In this process, application traction is paramount, emphasizing resource allocation based on demand recognition, capability invocation, and actual results. Key elements such as capital, computing power, data, and talent should accelerate aggregation around high-value scenarios, flowing to the segments that can best address real pain points and generate stable returns. This new organizational model, supported by AI and driven by applications, not only fosters new business models and expands new growth spaces but also drives innovation and optimization in employment structure, industrial structure, and income distribution, injecting more lasting and deeper momentum into high-quality development.

Practical Steps to Strengthen Application Traction

Having clarified the strategic logic of “why to strengthen application traction,” it is essential to address the practical question of “how to strengthen application traction.” Ultimately, AI competition is a comprehensive competition between technological capabilities and application capabilities. To better empower economic and social development with AI, the key is to solidify application traction, deepen integration, and strengthen the ecosystem.

Expand High-Value Scenarios
Scenarios are the testing grounds for AI maturity and the carriers for technology to transform into industrial capabilities. Without genuine scenario traction, technological breakthroughs struggle to form stable demand; without large-scale application landing, innovative results cannot accumulate into competitive advantages. Focus on key areas such as manufacturing, transportation, energy, healthcare, education, and government, continuously deepening and expanding “AI +” to promote AI from demonstration verification to process embedding, and from single-point efficiency to system efficiency. Resource allocation should shift from emphasizing parameter scale and project deployment to focusing on scenario value, delivery capability, and actual returns, with greater emphasis on forming industry-level models, intelligent agents, and solutions. Notably, it is crucial to leverage the traction of leading enterprises, chain master enterprises, and platform enterprises to drive collaborative innovation and joint breakthroughs among upstream and downstream SMEs, accelerating the transformation of scenario advantages into industrial and competitive advantages.
Promote Deeply Integrated Applications
Empowering industries with AI requires more than superficial embedding; it must genuinely enter business processes, organizational systems, and value chains, becoming a key force in reshaping production methods and management models. Focus on critical links such as production, services, and management, promoting deep coupling between AI and industrial internet, digital twins, and intelligent equipment to effectively solve real problems in quality control, equipment maintenance, supply collaboration, risk identification, and decision support. Coordinate the collaborative allocation of computing power, data, energy, and network elements, ensuring that the construction of new infrastructure emphasizes system capability, collaborative scheduling, and improved utilization efficiency. Only by embedding AI into core business processes and integrating it into underlying support systems can we achieve true leaps from usability to practicality and from local breakthroughs to overall advancements.
Establish a Collaborative Innovation Ecosystem
The successful implementation of AI applications often requires collaboration across multiple dimensions, including scenario openness, technology supply, data support, financial services, talent assurance, and institutional norms. A systematic approach is essential, promoting collaboration among governments, enterprises, universities, research institutions, financial institutions, and industry organizations to connect the innovation chain, industrial chain, capital chain, and talent chain. Governments should strengthen planning guidance, policy supply, and standard construction to create a stable and predictable development environment. Enterprises need to highlight their role as innovation leaders, leveraging the traction of leading enterprises while also developing lightweight, low-cost solutions suitable for SMEs. Universities and research institutions should better align organized research with industry needs, facilitating more results from laboratories to production lines. Financial institutions should address the characteristics of high investment, long cycles, and high risks in AI R&D. Additionally, as AI becomes widely embedded in the entire production and operation process, it is vital to improve data governance, security governance, and accountability mechanisms, cultivating versatile talents who understand both technology and industry, as well as application and governance, to form an open, orderly, mutually empowering, and sustainably evolving development ecosystem.

DeepSeek-V4: A Turning Point in China's AI Landscape

Mon, 27 Apr 2026 00:00:00 +0000

Introduction

The release of DeepSeek-V4 is not just a technical iteration but a pivotal moment for China’s AI industry. From Huawei’s native adaptation to the capital competition among Tencent and Alibaba, this trillion-parameter model is reshaping the competitive landscape of domestic computing power. This article will delve into the industrial logic behind its technological breakthroughs, revealing the commercialization dilemmas and strategic choices faced by open-source model companies.

On April 24, 2026, the first indication that DeepSeek-V4 was more than just a model update did not come from Hugging Face or DeepSeek’s official announcement but from a live stream on Bilibili.

Huawei’s Ascend CANN official account hosted a live event titled “DeepSeek V4 Ascend Premiere.” The very act of a large model company launching a new model through a chip ecosystem’s official account was unusual.

If this were merely a routine upgrade with larger parameters, longer context, and better benchmark scores, it would belong to the daily arms race of the AI circle, at most leading developers to bookmark it on Hugging Face or product managers to share benchmark screenshots in their circles. However, this time, the V4-Pro’s 16 trillion total parameters, 49 billion active parameters, million-token context, MIT License open-source, and its connection to Huawei’s Ascend 950PR native adaptation turned the event into an “industrial signal.”

On the same day, Reuters reported that Tencent and Alibaba were involved in financing negotiations with DeepSeek. Just days earlier, the market had valued DeepSeek at around $10 billion, but this figure quickly rose to over $20 billion.

Chinese venture capital media were even more aggressive, reporting a pre-investment valuation of 300 billion RMB, a 50 billion RMB capital increase, and a 5 billion RMB minimum investment threshold. Domestic GPU concept stocks responded accordingly; as soon as DeepSeek-V4 launched, related ETFs and chip stocks surged. The capital market may not understand abbreviations like mHC, CSA, HCA, or DSA, but it comprehends a more straightforward narrative: DeepSeek is becoming the “nucleus” of the entire Chinese computing power industry chain, connecting all clues.

The Journey to DeepSeek-V4

Rewind 484 days.

On December 26, 2024, DeepSeek-V3 was released with 671 billion parameters, 37 billion active parameters, MoE architecture, and MLA attention mechanism. The official technical report cited a figure later quoted by global media: the complete training took approximately 2.788 million H800 GPU hours, translating to a training cost of about $5.57 million. A month later, DeepSeek-R1 topped the free charts in the US App Store. Nvidia’s market value evaporated by approximately $593 billion, marking one of the largest single-day market value losses in US history.

At that moment, DeepSeek appeared as a bullet shot from Hangzhou towards Silicon Valley. It proved something that made many uncomfortable: cutting-edge AI does not necessarily require astronomical computing power and capital. At least at that time, a Chinese team used extreme engineering optimization, MoE, reinforcement learning, and open-source strategies to puncture the narrative that “the more expensive the computing power, the stronger the model” that Silicon Valley had built over the past two years.

However, 484 days later, the story became convoluted.

The team that had burst onto the scene with a low-cost myth began discussing financing. The lab that had rejected VCs, avoided going public, and relied on funding from Huanfang Quantitative was now surrounded by Tencent and Alibaba at the negotiating table. The model company that had earned global developer respect through open-source found its models being integrated into products, entering commercial systems, while it still needed to find a price anchor for employee options.

Even more convoluted was that the low-cost myth itself came with a price tag. The $5.57 million figure was real, but it did not represent DeepSeek’s entire bill. SemiAnalysis later estimated that DeepSeek’s total hardware expenditure exceeded $1.3 billion, with a GPU cluster of about 50,000 units, including H800, H100, and H20 mixed resources.

In other words, the $5.57 million was more like a pretty receipt stating how much this training cost, without mentioning how much had been burned beforehand to make this training happen.

Thus, the truly noteworthy aspect of DeepSeek over these 484 days is not the grand narrative of “China’s AI rise”; that would be too simplistic.

The 484 days do not tell the story of DeepSeek’s growth from small to large, but rather resemble a journey of a technological idealist who must learn to navigate the gravity of the real world and conquer it.

People: The Departed and Their Direction

On April 16, 2026, the news broke that Guo Dayan had joined ByteDance’s Seed team.

If DeepSeek-R1 is seen as the product that truly broke through globally for DeepSeek, then Guo Dayan cannot be treated as just an ordinary departing employee. Public reports referred to him as a significant contributor to R1’s inference capabilities, particularly related to the GRPO reinforcement learning method. ByteDance’s direction for him was also subtle: Agent. The rumors of a “hundred million annual salary” were later refuted by Douyin’s vice president Li Liang, but the gossip had already served its purpose. It provided the public with a direct view: DeepSeek’s talent was beginning to be priced.

Before this, DeepSeek’s image resembled that of a hidden sect in a martial arts novel. Huanfang Quantitative was providing funding from behind, Liang Wenfeng had sufficient resources, and researchers focused on model development without urgency for products or commercialization. While other startups were busy raising funds, making lists, developing applications, and building ecosystems, it remained a silent computing power monk, meditating, pushing formulas, and training models.

However, the AI industry does not respect monks for long, especially when they possess genuine knowledge. From late 2025 to early 2026, multiple core members of DeepSeek were reported to have left: Luo Fuli went to Xiaomi MiMo, Wang Bingxuan went to Tencent, Ruan Chong went to Yuanrong Qixing, Wei Haoran’s whereabouts are unknown, and Guo Dayan went to ByteDance Seed. These departures collectively form a map of the next battlefield in China’s AI: Luo Fuli corresponds to the endpoint and Xiaomi’s “phone + car + IoT” closed loop; Ruan Chong corresponds to multi-modal perception in autonomous driving; Guo Dayan corresponds to Agent; Wang Bingxuan corresponds to Tencent’s anxiety about rebuilding its AI foundation.

Money, of course, is important. Large companies can offer higher cash salaries, clearer option buybacks, and more mature promotion systems.

ByteDance’s first repurchase price for Doubao stock increased by 30.8% compared to the grant price, which feels more like a paycheck to a researcher than the promise of “we will change the world in the future.” DeepSeek’s problem here becomes specific: it can attract talent with technological idealism, but it struggles to pay the opportunity costs of that talent in the long term, especially as peers’ wealth stories begin to materialize. Companies like Zhipu, MiniMax, and Yuezhianmian are being revalued by the capital market. The financing numbers for OpenAI and Anthropic hang in the news headlines like astronomical phenomena. A post-95 researcher sees friends receiving cashable options while their own DeepSeek options lack a public market price, creating a psychological gap that cannot be erased by “purely doing research.”

More critically, those leaving are not just being lured away by money; they are also taking their directions with them.

DeepSeek’s strongest aspects lie in its base models, inference models, and its ability to minimize training and inference costs. Its organizational culture naturally leans towards one goal: making the model itself stronger, cheaper, and more open-source. This is undoubtedly cool. However, after 2025, the industry’s excitement began to shift. People were no longer satisfied with “the model can answer questions”; they wanted models that could write code, call tools, execute tasks across applications, remember context, and form closed loops in products. Agent transitioned from an overhyped term to an entry point for the next generation of product structures.

At this point, DeepSeek’s advantages became a constraint. A researcher wanting to study Agents would face a more foundational organization within DeepSeek, while at ByteDance, they would engage with a real user base of 157 million monthly active users. A multi-modal researcher wanting to enable models to understand the physical world might find more allure in an autonomous driving company than in continuing to scale up language models. An endpoint model researcher aiming to embed inference capabilities in mobile devices, vehicle systems, and home appliances would find Xiaomi more like a laboratory than DeepSeek. This is not about betrayal; it’s a natural divergence following a fork in the technological route.

Bell Labs serves as a similar reference. It nurtured transistors, information theory, Unix, and the C language, while also spilling over generations of talent. Those who left Bell Labs did not destroy it; rather, they spread its methodologies throughout the American tech industry. DeepSeek’s talent outflow may be doing the same. The difference is that Bell Labs had AT&T’s monopoly profits behind it, while DeepSeek is backed by Huanfang Quantitative. No matter how strong Huanfang is, it is not a public finance entity that can endlessly fund the Chinese AI industry.

Liang Wenfeng faces a very real problem: if DeepSeek truly wants to retain talent, it must assign a price to its equity; if DeepSeek’s equity is to have a price, it must enter the capital market’s language system; if it enters the capital market’s language system, it must accept the capital market’s inquiries: how do you make money? How do you grow? How do you prevent others from profiting from your open-source models?

This is why DeepSeek’s financing pressure is not merely about “lacking money”; it resembles an identity transformation. It needs to shift from a research organization that “does not need to explain itself to anyone” to one that must explain itself to employees, shareholders, cloud vendors, chip manufacturers, developers, and regulators as a foundational infrastructure company.

This step is not romantic. However, it may determine DeepSeek’s fate more than the viral success of R1.

Finance: The Myth of $5.57 Million Must Be Settled

DeepSeek’s most dangerous achievement is that it has turned “cheap” into its brand.

This has been validated countless times in Chinese manufacturing: while China’s manufacturing industry contributes to making expensive goods affordable for ordinary people, the inverse is also true—price and profit can constrain the pace of industrial upgrades.

This phenomenon is fully replayed in DeepSeek.

In recent years, the capital expenditures of OpenAI, Anthropic, Google, and Meta have left many in shock. Hundreds of billions in capital expenditures, valuations in the trillions, and data centers with hundreds of thousands of GPUs all culminate in one statement: intelligence is expensive.

Until December 26, 2024, when DeepSeek-V3 was released, this statement suddenly became unstable.

$5.57 million.

This figure is too suitable for dissemination. It is short, sharp, and impactful, like handing Silicon Valley a sarcastic poster: while you burn hundreds of billions, we create a capable model with just a fraction of that. R1 further exaggerated this narrative. In September 2025, Reuters reported that DeepSeek disclosed in a Nature paper that R1’s training cost was only about $294,000. Thus, DeepSeek was placed into a neat narrative box: a low-cost miracle.

The problem is that the low-cost miracle can constrain itself.

The first layer of constraint comes from public expectations. When you make the world tremble with $5.57 million, the next time you release a model, people will not only ask if it is strong but also if it is cheap enough. If V4 shows significant capability improvement but skyrockets in cost, DeepSeek’s story will crack. Conversely, if V4 is not impressive enough to maintain the low-cost narrative, it will fail to meet the expectations of the capital market and industrial ecosystem. This is akin to a chef who prepares a Michelin-quality meal for $10. The first meal is a miracle. Starting from the second meal, all guests will ask: can you continue to do it for $10? If it rises to $100, they will say you have changed; if you still charge $10, you will go bankrupt.

The second layer of constraint comes from actual costs. The $5.57 million corresponds to GPU hours within a single training process, excluding earlier architectural explorations, failed experiments, data construction, engineering teams, hardware reserves, inference services, and the costs of scaling up after user surges. SemiAnalysis estimated that DeepSeek’s total hardware expenditure exceeds $1.3 billion, which is a figure closer to the material foundation required for a cutting-edge model company to exist long-term.

Huanfang Quantitative can provide funding for DeepSeek. In 2025, Huanfang Quantitative’s average return rate was reported by several media outlets to be around 56.55%, with annual revenue estimated at nearly 4.9 billion RMB, and Liang Wenfeng’s shareholding ratio was sufficiently high. For an ordinary AI lab, this is already a dream investor.

However, after V4, DeepSeek’s cost structure changed. Trillion parameters, million-token context, Agent capabilities, domestic chip adaptation, global open-source developer ecosystem, and stable APIs for enterprises will not only appear in training bills. They will become inference costs, engineering costs, customer support costs, compliance costs, channel costs, and talent costs. Training a model once is like going to war; long-term service to an ecosystem is like garrisoning troops. Garrisoning is more expensive than fighting because it incurs daily costs.

This is also why DeepSeek’s financing suddenly became reasonable in April 2026. Reuters first reported The Information’s news, stating that DeepSeek was negotiating at least $300 million in financing, with a valuation exceeding $10 billion. Days later, news emerged that Tencent and Alibaba were participating in negotiations, pushing the valuation figure above $20 billion, with Tencent reportedly proposing to acquire up to 20% of the shares but was refused. Chinese venture capital circles provided even more stimulating versions: a pre-investment valuation of 300 billion RMB, a planned capital increase of 50 billion RMB, with external funding of 30 billion and internal funding of 20 billion, with a minimum investment of 5 billion.

These figures may not all receive official confirmation, but they collectively point to one thing: DeepSeek is no longer just a star company pursued by capital; it is becoming a strategic node that giants must compete for. For Alibaba, DeepSeek can enhance the narrative of cloud and AI infrastructure. For Tencent, DeepSeek can fill the awkwardness of mixed elements in the C-end mindset. For both companies, DeepSeek is a rare entity: it was not incubated by a large company but has already gained global developer reputation; it has not fully commercialized but possesses a foundational infrastructure position; it offers open models while making all users prove its irreplaceability.

This is also why the 5 billion minimum investment threshold is so interesting. If this threshold is true, it filters out not those with less money but those who only want financial investments.

DeepSeek seeks resource-based shareholders: cloud computing power, government and enterprise clients, compliance endorsements, chip supply chains, and model distribution channels. Money is just the easiest quantifiable part of this. This is somewhat similar to SpaceX’s transformation. Early on, SpaceX needed to prove that rockets could fly cheaper. After successful technical validation, it required NASA contracts, commercial launch orders, Starlink cash flow, and national security orders even more. Cheapness is not the end; it is merely the first step to open the gap in the old order.

DeepSeek is also in a similar position. The $5.57 million training cost is not the answer to its future business model; it is merely the bullet. The bullet pierced Silicon Valley’s computing power myth and also penetrated DeepSeek’s protective shell. The bullet proved that cutting-edge AI can be cheap, but it did not prove that a cutting-edge AI company can survive cheaply forever.

The Business: Open-Source Models as Others’ Weapons

In January 2025, DeepSeek’s story first became a global public event.

After R1’s release, the DeepSeek App surged to the top of the US App Store’s free charts. TechCrunch wrote directly: DeepSeek replaced ChatGPT as the top app in the App Store. Reuters recorded another figure in financial history: Nvidia’s market value evaporated by approximately $593 billion. This moment had a strange comedic aspect. A Chinese open-source model made American retail investors begin to question Nvidia’s valuation, forced Silicon Valley to reinterpret its capital expenditures, prompted OpenAI and Microsoft to investigate “distillation” issues, and placed a Hangzhou team into the narrative of technological security in the US. Before DeepSeek could commercialize, it was geopoliticized.

However, a more interesting event occurred in China. On February 13, 2025, Tencent Yuanbao integrated the full version of DeepSeek-R1. This marked Tencent’s first deployment of a third-party open-source model in its own AI assistant. Users could switch between Yuanbao and DeepSeek, and WeChat search began to test integration with DeepSeek.

Before this, Tencent’s AI situation was somewhat awkward. It had Yuanbao, computing power, WeChat, a content ecosystem, cloud resources, and organizational assets. However, in users’ minds, the domestic AI product heat was more occupied by Doubao, Kimi, Tongyi, and DeepSeek. Tencent’s strongest asset was its entry point, but it lacked an AI symbol that could excite users. DeepSeek was precisely that symbol.

After Yuanbao integrated R1, downloads surged, exceeding the DeepSeek App itself by early March. User enthusiasm during the WeChat search test with DeepSeek was described by the media as “far exceeding expectations.” By the end of 2025, the daily usage of Yuanbao’s DeepSeek mode reportedly reached an annual peak, increasing over 100 times since the beginning of the year.

This was not DeepSeek being saved by Tencent; it was Tencent saving its own AI product line using DeepSeek.

Yet, DeepSeek did not walk away empty-handed. It gained something more subtle: a proof of factual standards. When China’s largest social entry point chooses to deploy your model in its product, when users complete searches and Q&A through your model in the WeChat ecosystem, and when other large companies, car manufacturers, telecom operators, and cloud vendors rush to integrate, you are no longer just a strong open-source model on GitHub. You become part of the public infrastructure.

The problem lies here. Public infrastructure sounds advanced, but it can be commercially uncomfortable. The sharpest aspect of an open-source model is that it allows everyone to use you. The most brutal aspect of an open-source model is that it allows everyone to use you.

Tencent can integrate DeepSeek into Yuanbao. Alibaba can embed DeepSeek into its cloud services. Startups can use DeepSeek as a code assistant. Government and enterprise clients can privatize deployment via cloud vendors. Developers can locally distill, fine-tune, and quantify. Each instance of use expands DeepSeek’s influence.

However, each instance of use may also bypass DeepSeek’s revenue stream. This is the moment when the costs and benefits of open-source are simultaneously realized. The more DeepSeek’s model resembles water and electricity, the more awkward its commercial identity becomes. Water and electricity are vital, but companies that sell water and electricity are typically not the most attractive companies. The real money is often made by those who connect water and electricity to cities, factories, commercial real estate, and residential billing systems. In AI, these people are called cloud vendors, entry platforms, Agent products, enterprise software, and vertical applications.

After the release of V4, this logic of “others taking it to make weapons” became clearer. V4-Pro and V4-Flash simultaneously provide compatibility with OpenAI ChatCompletions and Anthropic interfaces; the new model names are deepseek-v4-pro and deepseek-v4-flash; the old deepseek-chat and deepseek-reasoner will be discontinued after a three-month transition period. This is not a model solely for its own app but one prepared for migration, replacement, and embedding from the interface level. Developers can redirect applications originally connected to OpenAI or Anthropic to DeepSeek, cloud vendors can package it as an API, and Agent products can automatically switch complex tasks to Think Max.

In other words, while DeepSeek hands others knives, it also sharpens the handles.

The technical route is also converging in this direction. V3-0324 enhances reasoning, front-end code, and tool invocation; R1-0528 reduces hallucinations and improves JSON and function calling; V3.1 introduces a Think / Non-Think hybrid mode, strengthening Agent capabilities and supporting Anthropic API formats; V3.2-Exp introduces Sparse Attention, significantly reducing costs; V3.2 and Speciale further target Agent reasoning scenarios.

By the time of V4, three levels of thinking intensity are directly productized: Non-think corresponds to everyday quick responses, Think High corresponds to complex planning, and Think Max corresponds to high-intensity reasoning and Agent tasks. DeepSeek even retains complete reasoning content in tool invocation scenarios, including multi-turn reasoning history across user message boundaries. This design is not prepared for a “chatbot” but for real workflows like long-term tasks, code engineering, document generation, and search planning.

The evaluations of V4 are also very indicative. It does not only tell stories through traditional rankings like MMLU but also showcases Agentic Coding, Terminal Bench, SWE Verified, MCPAtlas, white-collar tasks, and Chinese professional writing.

According to technical breakdown materials, V4-Pro-Max scored 67.9 on Terminal Bench 2.0, 80.6 on SWE Verified, and 76.2 on SWE Multilingual, overall placing it in the same tier as Opus-4.6-Max and K2.6-Thinking; in real R&D tasks among over 50 internal engineers, V4-Pro-Max’s pass rate was 67%, close to Opus 4.5’s 70% and higher than Sonnet 4.5’s 47%.

The significance of these numbers lies not in “winning scores” but in answering a more industrial question: can the new model integrate into the daily production of engineering teams?

This also explains DeepSeek’s dilemma. It certainly knows that pure model capabilities will be used by others to create products, which will accumulate users, data, workflows, and distribution advantages. If a model company only remains in the position of an arms dealer, it will be pressured on price by all those buying arms.

However, DeepSeek’s uniqueness lies in its inability to easily transform into an ordinary application company. If it engages in C-end products, it must compete with Doubao, Kimi, Yuanbao, and Tongyi for entry points; if it develops code products, it must compete with Cursor, Claude Code, Codex, and various domestic IDE plugins for workflows; if it ventures into enterprise software, it must begin facing sales, delivery, customization, and payment issues in the mud. An organization skilled at optimizing models to the extreme may not excel at rolling in the mud.

Thus, DeepSeek’s “business” line has become a chain reaction: R1’s viral success triggered global stock market tremors; global tremors prompted a backlash from US IP and security narratives; domestic integration spurred large companies to collectively adopt DeepSeek; large companies’ integration validated DeepSeek’s infrastructure value; infrastructure value, in turn, compelled it to address commercialization issues.

OpenAI warned the US Congress that DeepSeek was gaining capabilities through distillation, and the White House accused China of “industrial-scale AI technology theft”—these are certainly part of the geopolitical narrative. However, if viewed solely from this dimension, one might miss the more specific industrial issues. DeepSeek has shown everyone for the first time that open-source models can rapidly change product landscapes globally, while also demonstrating that the victory of open-source models may not automatically belong to open-source model companies.

This situation is somewhat akin to Android. Android provided global smartphone manufacturers with an operating system to counter the iPhone, completely rewriting the entry landscape of the mobile internet. However, the long-term beneficiaries were not every Android smartphone manufacturer but Google, which controlled the app store, advertising system, account system, and cloud services.

DeepSeek is standing in a similar position. It provides a foundational layer. However, the cities above that foundational layer are being rapidly constructed by others.

The Material: From H800 to Ascend, A Chip Replacement Surgery

The most important parameter of DeepSeek-V4 may not be 1.6 trillion. It is Ascend.

This does not imply that model capability is unimportant. V4-Pro adopts a total of 1.6 trillion parameters and 49 billion active parameters in its MoE architecture, while V4-Flash features 284 billion total parameters and 13 billion active parameters. Both support a million-token context, and the model card indicates the use of CSA + HCA mixed attention mechanisms. V4’s technical report also includes mHC manifold constraint superconnection, DSA sparse attention, Muon optimizer, FP4 quantization-aware training, On-Disk KV Cache, deterministic kernel library, and DSec sandbox infrastructure.

When these terms are piled together, they can easily devolve into technical self-indulgence. However, in the industrial context of April 2026, they all serve a harder fact: V4 needs to run, stabilize, and run cheaply on domestic computing power.

DeepSeek-V3’s material foundation still relied on Nvidia H800. Under restricted chip conditions, it maximized efficiency through MoE, MLA, FP8, and extensive bottom-level optimizations. Developers discovered traces of PTX low-level optimization in V3’s code, indicating that DeepSeek had long been bypassing the comfort zone of high-level frameworks to directly engage with GPU execution layers. PTX is the low-level intermediate representation for Nvidia GPUs. A team willing to engage at this level signifies that it is not merely a model team adjusting framework parameters but an engineering team capable of performing surgical operations on computing power infrastructure.

This capability became crucial for V4. The US chip blockade has evolved from “not providing the strongest chips” to “giving you a total bill.”

On January 13, 2025, the Biden administration released the AI Diffusion Rule, placing global AI chip flows under tiered control. Reuters reported that this set of rules aimed to restrict the diffusion of advanced AI chips globally, with China placed in a strictly limited position. Subsequent discussions regarding limitations on TPP total processing performance essentially turned computing power into a strategic resource that can be accounted for, blocked, and allocated. This logic is very American. It does not necessarily aim to completely prevent your development; it merely seeks to ensure you lag a generation.

The tug-of-war over H20 is a small window. In February 2025, Chinese companies increased H20 orders due to the DeepSeek frenzy. In April, the US restricted H20 exports, and Nvidia recorded approximately $5.5 billion in related expenses. By May, Nvidia prepared a downgraded version. In July, Jensen Huang stated that supply would be restored.

By April 2026, the US Secretary of Commerce confirmed that H200 had not yet been sold to China. This is not about stabilizing supply chains; it binds a company’s training plans to Washington’s policy pendulum. For a cutting-edge model company, this uncertainty is more dangerous than high costs. High costs can be financed, but uncertainty can destroy a roadmap.

Thus, DeepSeek’s shift to Huawei Ascend is not merely a patriotic narrative or emotional value from the launch event. It is a rational choice for a model company facing supply chain risks.

In February 2026, Reuters reported that DeepSeek no longer followed industry norms by previewing its flagship models to American chip manufacturers but instead opened up to domestic chip suppliers earlier. In April, Reuters reported that DeepSeek-V4 would run on Huawei chips and that it was rewriting and testing the underlying code with domestic chip manufacturers. On the same day of V4’s release, news emerged that Huawei’s Ascend supernodes would fully support DeepSeek-V4.

SCMP described this “premiere adaptation” directly: Huawei stated that the Ascend 950PR and 950DT achieved “day zero” adaptation for DeepSeek-V4; during live streams on Bilibili and WeChat, Huawei engineers explained the adaptation process between CANN and DeepSeek V4, claiming that the entire Ascend SuperNode product line had been “fully adapted” to V4’s model inference. This statement requires careful examination.

“Day zero” sounds like marketing, but for a trillion-parameter model, it means that the hardware ecosystem can catch up with the model’s release on the same day; “fully adapted” does not equate to perfect performance, but it at least signifies that the software stack, inference framework, and underlying operators have established the first layer of production pathways. More interestingly, DeepSeek itself acknowledged that before the large-scale shipment of the Ascend 950PR supernode in the second half of the year, V4-Pro would face throughput issues, and prices would significantly decrease after the hardware was released in bulk. This is not a victory declaration but resembles a construction timeline: the direction is correct, the road is still expanding, but for now, traffic must be limited.

Transitioning from CUDA to CANN is not simply about copying model files. It requires operator rewriting, compiler adaptation, inference framework optimization, communication interconnection scheduling, memory management, and verification of long-context performance. Especially for a trillion-parameter model like V4, any inefficiency in any link can turn “domestic adaptation” into a PPT adaptation. A technical analysis reprinted by TMT suggests that V4’s repeated delays are related to the deep adaptation between the inference end and Ascend chips; the real challenge lies not in whether it can run, but in whether it can run stably, efficiently, and at scale.

This is why Jensen Huang stated that DeepSeek running on Huawei chips is a “horrible outcome” for the US. TNW’s interpretation of this statement is more straightforward: DeepSeek spent months rewriting core code to adapt to Huawei’s CANN framework, moving away from the CUDA ecosystem that took twenty years to build. The dominance of CUDA itself is a second layer of control that the US holds beyond chips.

Nvidia’s true fear is not that Chinese companies can create a strong model. A strong model can be explained as accidental, distilled, subsidized, or unsustainable. What it fears is a strong model running stably in a non-CUDA ecosystem. Because CUDA’s moat is not just chip performance; it encompasses developer habits, toolchains, ecosystems, debugging experiences, operator libraries, training frameworks, and talent markets. As long as Chinese model companies continue to optimize around CUDA, US chip controls will have leverage.

The technical details of V4 also explain why this chip replacement surgery is challenging. The primary cost of a million-token context is not whether the model is intelligent but how much historical information must be processed during each inference. Traditional attention mechanisms can turn KV cache and FLOPs into disaster zones in long contexts. DeepSeek-V4 compresses at the token dimension and adds DSA sparse attention. Technical breakdown materials indicate that under 1M context, V4-Pro’s single-token inference FLOPs are only 27% of V3.2’s, and KV cache is only 10% of V3.2’s; V4-Flash is even more extreme, with single-token FLOPs only 10% of V3.2’s and KV cache only 7%. This is the true significance of V4’s binding to Ascend: without a structural reduction in long-context inference costs, even if domestic computing power can run, it will be challenging to run it cheaply.

Previously, I wrote an analysis on Foxconn’s transformation, noting that the judgment of transformation is never about what you “assemble” but about what you control in the value chain.

Foxconn’s shift from iPhones to AI servers changed the assembly objects but not the profit position. In contrast, DeepSeek and Ascend’s story is about attempting to change its position within the underlying ecosystem. As long as the model team continues to think in CUDA’s language, domestic chips can easily become “rebranded OEMs”; only when the model architecture, inference framework, operator libraries, and communication scheduling are all rewritten around local hardware characteristics can it potentially evolve from “replaceable hardware” to “self-evolving systems.”

This is also the most awkward aspect of blockade policies. In the short term, they can indeed create pain. They can increase costs, slow adaptation, disrupt supply chains, and force companies to take difficult paths. However, if the blockaded side possesses a sufficiently large market, enough engineers, strong demand, and clear alternative goals, the blockade can become an industrial mobilization. The significance of DeepSeek-V4 lies here.

It is not the endpoint of the domestic computing power ecosystem; it is the first time the scalpel has cut to the bone.

Conclusion: After Cheapness

The past 484 days of DeepSeek can easily be misread as a victory story.

A Chinese team created a strong model at a low cost, shattered Nvidia, shook Silicon Valley, pressured the US, boosted domestic chips, and ultimately led Tencent and Alibaba to line up with money. Writing this version would be satisfying for readers and easy to title. However, this version is too light. The truly interesting aspect is that each of DeepSeek’s victories carries a counteraction.

The low-cost victory of V3 necessitates continued proof that cheapness can be sustained; the global viral success of R1 imposes responsibilities far beyond laboratory scale in terms of users, public opinion, and geopolitical pressure; the victory of open-source allows Tencent, Alibaba, car manufacturers, and cloud vendors to turn it into their weapons; the victory of talent results in researchers trained by it being precisely priced by the entire industry; the victory of domestic adaptation transforms it from a model company into a wedge for restructuring the chip ecosystem; the victory of financing finally brings Liang Wenfeng to the table he initially deliberately avoided.

This is not the failure of idealism. On the contrary, only if the first 484 days were sufficiently idealistic could DeepSeek have negotiating chips on the 485th day.

If it had initially followed the typical AI startup route—financing, product development, commercialization, and chasing trends—it would likely have become just another company at the crowded table of Chinese large models: doing a bit of modeling, a bit of application, discussing a bit of ecosystem, testing a bit of commercialization, touching on everything but excelling at nothing.

What Liang Wenfeng truly won is the ability to push the technological boundary far enough before returning to negotiate terms with reality. However, reality will not become gentle simply because you have won once. The $5.57 million is a bullet. It pierced Silicon Valley’s moat and also penetrated DeepSeek’s protective shell. The bullet proved that cutting-edge AI can be cheap, but it did not prove that a cutting-edge AI company can live cheaply forever.

After 484 days, DeepSeek is no longer just a “low-cost miracle.” It is an open-source foundation used by global developers, a capital target fiercely contested by Tencent and Alibaba, a geopolitical symbol under scrutiny by the US Congress and the White House, and a trillion-parameter model undergoing a chip replacement surgery on domestic chips. Its situation has thus become more like a compressed sample of China’s AI: idealism needs money, open-source requires a moat, localization demands engineering accountability, and low-cost must continue to be low.

Liang Wenfeng once said that DeepSeek is not aimed at short-term profitability but at pushing the boundaries of technology. After 484 days, the technological boundaries have indeed been pushed forward.

Yet what drives it forward now is no longer just technology.

The Future of Translation in the Age of AI

Mon, 27 Apr 2026 00:00:00 +0000

The Future of Translation in the Age of AI

On April 25, 2026, the China Translation Association held its annual conference at Wuhan University. The theme was “Integration and Breaking Barriers: The Infinite Possibilities of Translation in the Digital Intelligence Era,” co-hosted by the China Translation Association, Wuhan University, and the China Foreign Languages Publishing Administration. Experts and scholars from various fields gathered to discuss the high-quality development of the translation industry amid the AI wave.

The conference released the “2026 China Translation Industry Development Report,” which indicated that in 2025, the Chinese translation industry maintained stability during structural adjustments, with a total annual output value of approximately 70.12 billion yuan. The number of operating translation companies and the quality of professionals showed steady growth, with the workforce reaching 6.867 million, including 1.135 million full-time translators.

Civilization is enriched through communication and mutual learning. The “2026 Global Translation Industry Development Report” released on the same day showed that the global translation industry has transitioned from a period of uniform growth to a new stage characterized by differentiated stock and incremental reconstruction. International consulting agencies estimate that the global translation market size in 2025 will be approximately $59.53 billion, reflecting a 7% growth compared to the previous year. The Asian and European markets displayed strong growth momentum, with over 60% of overseas orders for Chinese translation companies coming from European clients. Academically, China leads globally in the production of translation research outcomes and the number of research institutions.

Currently, AI is empowering various industries. AI translation is widely applied, and the integration of translation technology has reached a deep fusion stage. According to the “2026 China Translation Industry Development Report,” by 2025, there will be 2,183 companies in China focusing on AI translation as their main business, and the human-machine collaborative translation model has become a basic consensus in the industry. The “2026 Global Translation Industry Development Report” indicates a significant increase in the application rate of AI translation and large language models, making them mainstream tools in the translation industry. A 2025 survey of the European language industry showed that 60% of respondents had used AI translation, with language service providers reaching 80%.

Wang Gangyi, former deputy director of the China Foreign Languages Publishing Administration and executive vice president of the China Translation Association, stated during the report release that while AI translation and large language model technology upgrades are gaining increasing attention from the industry and capital, there are still significant shortcomings in language coverage, accuracy, emotional understanding, and expression. Skills in AI-related capabilities and professional domain knowledge are key demands, and human-machine collaboration has become the mainstream working model. Small and medium-sized language companies and independent practitioners face multiple operational pressures, making specialization and differentiation crucial for survival under the drive of multimodal technology.

“Currently, AI technology is profoundly reshaping the global language service and cultural dissemination landscape,” said Wang Lu, director of the film translation production center of the China Central Radio and Television, during the release of the “Research Report on AI Translation and the Internationalization of China’s ‘New Three Samples.’” She acknowledged that while AI translation has significantly lowered the barriers to cross-language communication and improved efficiency in going global, the internationalization process of China’s cultural “new three samples”—represented by online literature, web dramas, and online games—still faces common challenges such as data security and compliance, cultural bias, and balancing quality and cost. She believes that all parties in the industry chain should adopt differentiated, precise, and collaborative development strategies to jointly solve the challenges of going global and enhance internationalization effectiveness.

In a special exchange on the communication and mutual learning of Yangtze River civilization and the international dissemination of Jingchu culture, representatives from emerging enterprises involved in the “new three samples” and scholars from Wuhan University engaged in a roundtable dialogue, focusing on cross-cultural narratives and new paradigms of translation. They interpreted the connotations and contemporary value of Jingchu culture and discussed how to leverage Yangtze culture as a bond to strengthen the cultural export in the digital age.

Culture is the soul of translation work. Translation requires not only depth of thought but also a humanistic warmth. According to Wang Wei, vice president of iFLYTEK Co., Ltd., while machine translation can convey information relatively completely, it still falls short compared to human translators in understanding context and achieving the “faithfulness, expressiveness, and elegance” of output. Looking to the future, there is a need for a new ecosystem of multilingual AI translation built collaboratively by humans and machines.

“The iteration of technology, especially the development of AI, provides us with significant opportunities to enhance our work and expand the boundaries of translation,” said Guillaume de Nerfberg, president of the International Federation of Translators, in a video address. He emphasized that under the AI wave, the value of translation will not diminish; rather, its importance will become more pronounced, and the demands on translators will be higher than ever. We need professional language workers more than ever.

Understanding AI Citation Timeliness in Content Strategy

Mon, 27 Apr 2026 00:00:00 +0000

Key Questions in AI Citation Timeliness

In the realm of AI citations, brands face critical questions:

What type of articles are cited by AI the fastest?
How long can an AI citation last?
Do any sources allow AI to follow trends and cite them?

These questions highlight a variable often overlooked by 99% of brand GEO operators: timeliness.

Based on a massive database of AI Q&A and citations, NewRank Intelligence conducted a timeliness quality check on articles cited by major domestic AI platforms. This article reveals three core truths:

Preferences of different AI platforms regarding article publication timeliness.
Differences in citation timeliness across platforms and industry verticals.
Methodologies to keep content “fresh” in AI responses.

How Long After Publication Does AI Prefer Articles?

Over 65% of articles cited by AI are published within the last six months. We analyzed the time of publication against the citation time, categorizing the timeliness into five levels and calculating the proportion of sources in each time frame.

Data shows that content published within six months is more likely to be cited by AI, while content older than one year has a lower representation, indicating that sources older than one year are often deemed “expired” by AI.

Which Platform is More Timeliness-Sensitive: Doubao or Yuanbao?

Overall, Doubao shows significantly higher timeliness sensitivity than Yuanbao.

Doubao: The pool of sources has a higher proportion of high-timeliness content from the last month, responding quickly to new events, data, and trends.
Yuanbao: The source pool has a higher proportion of mid-timeliness content from 1-6 months, leaning towards a stable knowledge base with less freshness.

How Much Difference in Timeliness Exists Between Doubao and Yuanbao?

The average time from publication to citation for Doubao is about 4.5 months, significantly lower than Yuanbao’s 8 months. In terms of industry performance, Doubao maintains high timeliness sensitivity across various sectors, with average source publication times under six months. In contrast, Yuanbao shows a notable lag in sectors like automotive, healthcare, and local living.

The composition of source URLs indicates that Doubao’s top three sources (Douyin, Toutiao, Sohu) have high update frequencies, while Yuanbao relies on WeChat public accounts and Baijiahao, where content update rhythms vary, leading to the inclusion of outdated information.

Are There Significant Differences in Source Timeliness Across Industries?

We categorized and statistically analyzed sources across different industries, revealing that 3C digital, industrial manufacturing, and fintech have higher demands for timely sources, which we define as fast industries. Specifically:

3C digital and industrial manufacturing have the highest proportion of high-timeliness (1-30 days) content, reaching 39% and 36%, respectively, aligning with the rapid nature of product releases and technological iterations.
Fintech’s timeliness is slightly lower than the previous two, leaning towards citing data from within 1-6 months, but still has a high proportion of timely content at 27%.

Conversely, local living, automotive, and healthcare sectors show a higher proportion of low-timeliness content, which we define as slow industries. For slow industries, AI prefers a dual approach of “fresh” and “evergreen” content, with mid-high timeliness information from 1-6 months and low-timeliness information over one year both having significant proportions. Specifically:

Low-timeliness sources in local living account for 20%, automotive for 17%, and healthcare for 11%, with core sources still concentrated in the mid-high timeliness range of 1-6 months.

This indicates that slow industries do not solely rely on outdated information but exhibit a dual structure of primarily mid-high timeliness and evergreen information.

Industries require both mid-term dynamic data from within 1-6 months to support decision-making and long-term patterns, industry standards, and historical cases from over one year.

A piece of evergreen content that encapsulates industry consensus holds reference value comparable to the latest reports in slow industries.

Which Sites Have Sources That Are Quickly Cited?

Doubao and Yuanbao’s high-timeliness sources are highly concentrated within their respective ecosystems, leveraging their parent companies to access ultra-fast source channels.

In Doubao’s ultra-high timeliness sources, Toutiao accounts for 77%, also covering Sohu, Tencent News, and Douyin across multiple platforms.

Yuanbao’s ultra-high timeliness sources heavily rely on WeChat public accounts, which account for 79%, with Tencent News following at 17%.

Sohu serves as a cross-ecosystem platform, maintaining a stable proportion in both platforms’ ultra-high timeliness sources, acting as a universal ultra-high timeliness content channel outside of their ecosystems.

What Types of Articles Are Cited Within 24 Hours?

Articles from large platforms, authoritative accounts, and those with clear titles are more likely to be quickly cited.

Reliable source platforms and high-impact accounts can shorten the AI’s judgment chain for source authenticity, allowing for rapid retrieval and use.
Titles containing terms like “latest / first test”, “numbers / year / price”, or “review / analysis / actual test” are more likely to be quickly cited by AI.

What Types of Articles Are Cited the Longest?

The common characteristics of these “evergreen contents” are very clear:

From the question words, the questions themselves have no high timeliness demand, focusing on industry models, regional characteristics, tool guides, and other long-term stable information.
From the content perspective, titles that match the questions closely, come from authoritative sources, and include structured information, precise locations/data, and high professionalism are long-term valuable content.
From the platform perspective:
- Kimi has no exclusive source websites, focusing more on the match between questions and sources than on source timeliness, making it easier to retrieve older information.
- Yuanbao, backed by the Tencent ecosystem, relies heavily on WeChat public accounts as the core source channel, where a wealth of structured, high long-term value content is stored, leading to the reuse of older content.

AI’s citation logic does not merely consider “new vs. old” but places greater emphasis on whether the information possesses long-term value and precise matching. General knowledge without timeliness pressure, local living guides, and professional tool explanations can all transcend time limitations and become “evergreen sources” continuously cited by AI.

In the AI era, a successful GEO content strategy increasingly relies on the refined operation of “information timeliness”. NewRank Intelligence can assist brands in tracking the effective lifecycle of different content types on AI platforms; construct structured deep content assets purposefully for mainstream AI platforms; and balance the ratio of “news” and “evergreen” content according to industry characteristics. Only by shifting the content strategy from “publication-oriented” to “lifecycle management” can brands truly take the initiative in AI’s decision-making chain.

Will Low-Code Platforms Be Phased Out in the Era of Vibe Coding?

Mon, 27 Apr 2026 00:00:00 +0000

The Era of Vibe Coding: Will Low-Code Platforms Be Phased Out?

The current software development landscape is dominated by the rising popularity of vibe coding. It is undeniable that vibe coding has shed its “toy-level” label and has developed into a mature technological system. However, a key question arises: can vibe coding completely replace low-code platforms?

Core Capabilities of Vibe Coding

The core competitive advantage of vibe coding lies in the full-stack synergy of models, agents, and IDEs. These three components play crucial roles: the underlying computational power, functional execution in the middle layer, and the operational interface at the top, forming a closed-loop development assistance system.

Leading Overseas Tools

Claude Code (by Anthropic): Based on the Claude Opus 4.7 large language model, it supports image parsing up to 3.75 million pixels, with a default effort level increased to xhigh. Its Routines scheduling now includes the Ultraplan cloud planning feature for scheduled/event-triggered automation (such as log anomaly detection, bug fixes, and PR submissions). It also introduces the /ultrareview command for specialized code reviews. The IDE natively supports VS Code and IntelliJ IDEA plugin integration, seamlessly connects to Git repositories, and can synchronize code refactoring progress in real-time. It adapts to microservices and monolithic applications, with new cross-end Dispatch functionality supporting remote operations from mobile devices and computers, enhancing the ability to handle legacy system refactoring and high-priority bug fixes.
Codex (by OpenAI): Leveraging the GPT-5.5 model released on April 24, 2026, it focuses on autonomous planning and efficient reasoning for complex tasks. The multi-agent collaboration module allows for parallel processing of front-end, back-end, and testing agents, supporting complex task decomposition, path planning, and result verification. The IDE deeply integrates with VS Code and Docker toolchains, autonomously completing the entire process from IDE startup to code pulling and container deployment, supporting a context window of 400K tokens and a 1.5x Fast mode. Token generation speed has improved by 20%, reducing debugging time by over 70% and accommodating concurrent bug fixes and unit test generation.
Gemini (by Google): Utilizing the Gemini 3.1 Pro multimodal model released in February 2026, it excels in long text processing and multimodal integration. Its multimodal parsing agent can directly convert PSD/Figma design files into responsive UI code (with a component reuse rate of over 90%). It supports text, image, audio, and video inputs/outputs. The IDE integrates with Google Workspace and GCP cloud services, supporting local deployment based on the open-source Gemma 4 model (requiring a minimum of 16GB memory) and allowing for custom fine-tuning to fit industry coding standards.

Domestic Tools

GLM (by Zhipu AI): Based on the GLM-5.1 flagship open-source model released on April 8, 2026, its coding capabilities have set a new global benchmark in the SWE-bench Pro tests, surpassing GPT-5.4 and Claude Opus 4.6. It supports a context window of 200K tokens and can autonomously execute long-term tasks over eight hours, including GPU kernel optimization and complex system construction. Its adaptability to domestic computing platforms has been upgraded, now supporting Huawei Ascend and Cambricon, enhancing compliance in domestic scenarios. The IDE supports VS Code and Zhongwang Longteng IDE, seamlessly integrating with enterprise intranet development environments and compatible with mainstream programming tools like CodeBuddy.
Minimax: Based on the Minimax M2.7 native agent production-level model released on April 12, 2026, its coding capabilities have significantly improved, allowing it to construct complex agent harness control systems and support agent team collaboration. It has completed full-stack adaptation to various domestic and international computing platforms, achieving high-performance inference across multiple GPU hardware. Its iterative optimization capability has been enhanced, now allowing for complex task decomposition and path planning, with the Trae debugging tool automatically detecting code vulnerabilities and suggesting modifications.
Qwen (by Alibaba): Featuring the Qwen3.6-Plus leading domestic programming model released on April 2, 2026, it achieves full-process automation through its “generate-debug-optimize” loop in conjunction with the Trae tool. It possesses strong multimodal programming capabilities, autonomously decomposing complex programming tasks and planning paths to completion. The IDE integrates with Alibaba Cloud Cloud IDE and VS Code, seamlessly connecting to Taobao mini-programs and government systems, adapting to core scenarios in e-commerce and government.

In summary, vibe coding has formed a supportive system of model computation, agent functionality, and IDE operations, with its core value being the enhancement of coding efficiency for programmers, focusing on fragmented and lightweight development scenarios. With the latest version iterations in 2026, its capabilities in handling complex tasks and multimodal adaptation have significantly improved, but its limitations remain due to the uncontrollability of AI models and fragmented output.

Core Advantages of Low-Code Platforms

Low-code platforms are built on a foundation of “underlying architecture + visual engine + coding extension layer + ecological operation layer.” Their core value lies in addressing the full lifecycle of enterprise-level core business implementation. Their advantage is not to “replace coding” but to provide “standardized efficiency and controllable implementation.”

1. Controllable Underlying Architecture

Low-code platforms typically adopt a dual architecture design (microservices/monolithic), supporting Docker container deployment and K8s cluster scheduling. They allow flexible choices between private, public, and hybrid cloud deployment modes (with a minimum of 8GB memory required). They are natively compatible with existing IT architectures of enterprises. Compared to vibe coding, which relies on cloud models (or lightweight local models), low-code platforms have undergone long-term testing in enterprise scenarios, allowing for deep integration with existing ERP, OA, and CRM systems without requiring extensive compatibility adjustments, something vibe coding’s fragmented code output cannot achieve.

2. Enterprise-Level Data Security and Transaction Consistency

Low-code platforms include a built-in data security layer, implementing fine-grained permission control based on the RBAC model, supporting data masking and operation log auditing, fully complying with the Data Security Law and Personal Information Protection Law. The database engine natively supports ACID transaction characteristics and can integrate with the Seata distributed transaction framework through coding extensions, controlling data anomaly rates below 0.01% in core scenarios like financial payments and order management. In contrast, vibe coding lacks standardized transaction control logic in its generated code, making it prone to unilateral data anomalies and unable to support enterprise-level core scenarios.

3. Standardized Compliance Adaptation

The component libraries of low-code platforms comply with industry coding standards, with built-in templates for government, finance, and manufacturing sectors. Programmers can embed custom logic through the coding extension layer, balancing standardization and personalization needs. They also feature a complete operation management module, supporting system monitoring, log analysis, version iteration, and gray releases, enabling 24/7 stable enterprise-level operation. Vibe coding can only cover the “generate-debug” phase and lacks standardized operational support, requiring additional operational systems for enterprise-level projects, which increases costs.

4. Ecological Integration and Flexibility of Secondary Coding

Low-code platforms come with built-in third-party interface plugins for payments, logistics, and SMS, allowing for custom interface adaptation through the coding extension layer. Programmers can directly embed the foundational code generated by vibe coding into low-code platforms, creating a collaborative model of “efficiency enhancement and controllable implementation.” Compared to vibe coding’s fragmented code output, low-code platforms can form a complete project ecosystem, supporting long-term iterative optimization and adapting to the continuous upgrading needs of enterprise businesses.

Conclusion

Vibe coding serves as a programmer efficiency assistance tool composed of models, agents, and IDEs, focusing on reducing repetitive coding workload in lightweight, non-core development scenarios, and cannot independently complete the full lifecycle implementation of enterprise-level projects. Low-code platforms are enterprise-level development platforms whose core value lies in achieving standardized and controllable implementation of core business processes, covering the entire development, deployment, and operation lifecycle, making them the primary vehicle for enterprise business implementation.
Vibe coding outputs rely on model semantic parsing, leading to fragmented and non-standardized code that is significantly affected by prompt accuracy, lacking long-term iterative consistency. Low-code platforms, based on standardized architecture, ensure traceability in code output, component reuse, and operational management, supporting the stable operation of enterprise-level core businesses long-term, a capability boundary that vibe coding cannot breach in the short term.
Vibe coding focuses on “efficiency enhancement” without needing to consider enterprise-level compliance, stability, and operational demands. In contrast, low-code platforms prioritize “stability, compliance, and controllability in operations,” precisely matching the enterprise-level needs of core sectors such as finance, government, and manufacturing, which is their irreplaceable core value.

The future trend in software development is a collaborative empowerment of both, with vibe coding focusing on “coding efficiency enhancement” and low-code platforms concentrating on “enterprise-level business implementation.” The two complement each other rather than replace one another, especially in core sectors like finance and government, where the compliance and stability advantages of low-code platforms remain the preferred choice for enterprise-level development.

Understanding the Differences Between OpenClaw and GPT

Sat, 25 Apr 2026 00:00:00 +0000

Understanding the Differences: OpenClaw vs GPT

In the AI community, an open-source project called OpenClaw has been gaining attention, with many claiming it surpasses GPT significantly—not just in accuracy but in its ability to perform tasks, operate computers, and complete workflows autonomously. This article will clarify the fundamental differences between OpenClaw and GPT, its impressive capabilities, and its underlying principles.

OpenClaw and GPT Are Not the Same

Many assume OpenClaw is a new large model, but this is incorrect. GPT is a large language model (LLM) responsible for thinking, speaking, and writing, while OpenClaw is an AI agent framework responsible for planning, invoking, executing tasks, and can integrate with any large model like GPT, Claude, or DeepSeek as its brain.

Core Positioning: Strategist vs Digital Employee

GPT (ChatGPT): Passive responder, provides plans but not actions. If you ask it to organize files, send emails, or create reports, it will only give you written steps and content, requiring you to manually copy, paste, and execute each task. It is confined to the dialogue box, outputting text, code, or images without any operational capabilities.
OpenClaw: Proactively executes tasks and completes them autonomously. You might say, “Organize this month’s work files by date, generate a summary report, and email the team,” and it will break down the tasks, open folders, operate Excel, invoke the email client, send the email, and finally inform you that it’s done—completing the entire process without your intervention.

Operational Logic: Textual Loop vs Action Loop

GPT: Input question → Understand semantics → Generate text → End. There are no tool calls, system operations, or result verifications; it follows a linear Q&A process.
OpenClaw: Receives instructions → Intent parsing → Breaks down into subtasks → Calls tools/skills → Operates local devices → Verifies results → Adjusts and retries → Feedback completion. It follows a complete autonomous loop of thinking, acting, observing, and reflecting, capable of handling complex, lengthy tasks.

Data and Deployment: Cloud Dependency vs Local Priority

GPT: Most functions run in the cloud, requiring your data and files to be uploaded to OpenAI’s servers, posing privacy risks, strong network dependencies, and high costs.
OpenClaw: Primarily local deployment, with core execution and file operations performed on your own computer, ensuring data remains local and private. It only invokes cloud-based large models when complex reasoning is needed, balancing capability and cost.

In summary, GPT serves as a brainstorming brain, while OpenClaw equips that brain with hands and feet, enabling it to work autonomously.

Why is OpenClaw So Powerful? Three Core Technologies Addressing AI Limitations

GPT struggles with complex long tasks due to context explosion, memory limitations, and difficulty in task division. OpenClaw addresses these issues with three original designs, which are key to its popularity.

1. Layered Context Reading: Solving Memory and Decision Issues

Large models struggle with long contexts and chaotic information, leading to nonsensical outputs. OpenClaw employs Context Window layered reading:

First, it reads each skill’s summary quickly to grasp key points and understand the task framework.
Only when details are necessary does it load the complete workflow code/instructions.
It automatically compresses redundant information, retaining only critical memories, significantly reducing the model’s burden and improving decision accuracy and efficiency.

2. Dynamic Sub-Agents: Automatic Task Division for Complex Tasks

Just as one person cannot handle a large task alone, neither can AI. OpenClaw automatically breaks down complex tasks and dynamically creates sub-agents for collaborative, parallel processing:

For instance, when creating a market report, the main agent plans while sub-agents gather data, write content, create charts, and format the final output, automatically compiling, verifying, and correcting upon completion.
There’s no need to preset rules or manually divide tasks; the more complex the task, the more evident its advantages, breaking through the limitations of single models.

3. White-Box Memory System: Long-Term Memory That Improves Over Time

Traditional AI forgets information over time, but OpenClaw uses pure Markdown text to maintain long-term memory:

Important information, task results, and lessons learned are automatically saved as readable documents.
Content that becomes too lengthy is automatically summarized, keeping the context clear.
Memory is editable, transferable, and reusable, effectively giving AI a “notebook” that allows it to become increasingly familiar and proficient over time.

Complete Working Principle of OpenClaw: From Instruction to Completion

Breaking down OpenClaw reveals a complete system of “brain + hands + memory + scheduling” with clear operational logic:

Receive Instructions: You send natural language requests via Feishu, DingTalk, or WeChat, and the gateway routes them to the agent.
Understand and Plan: It calls the integrated large model (like GPT-4o) to parse intent, break it down into executable subtasks, and formulate an execution plan.
Invoke Skills/Tools: It matches the corresponding skill (files, emails, Excel, code, etc.), dynamically creating sub-agents to execute tasks, operate local computers, invoke software, and process data.
Execution and Verification: After each step, it automatically checks whether the results are correct; if not, it reflects, retries, and adjusts the plan until it meets the requirements.
Feedback and Memory: After completion, it informs you of the results while storing the entire process and key information in Markdown memory for future reuse.

Conclusion: This is Not an Upgrade, but a Paradigm Shift in AI

GPT has ushered AI into an era of “being articulate,” while OpenClaw advances AI into an era of “being able to act, deliver, and autonomously complete tasks”:

It does not aim to replace GPT but to liberate it—allowing large models to focus on thinking, creativity, and decision-making without being confined to dialogue boxes.
For ordinary users, professionals, and developers, this means that repetitive tasks like file organization, report generation, emails, and data processing can be entrusted to OpenClaw, freeing you to set goals and review results.

The rise of OpenClaw marks the transition of AI agents from concept to reality, indicating that future AI capable of working autonomously will be the true productivity tool.

Can Xiaohongshu Become a Tech Community with a Human Touch?

Fri, 24 Apr 2026 00:00:00 +0000

Can Xiaohongshu Become a Tech Community with a Human Touch?

No one can escape the wave of AI, and community apps are no exception.

Compared to platforms like Doubao and Qianwen, which are backed by large companies capable of creating consumer-facing entry points, community apps seem to have inherent shortcomings. Historically, they appear to only achieve a geeky presence by closely aligning with technology content, ultimately facing the reality of reaching a plateau in user growth within niche categories.

What if it were a general community? In past niche operations, rapidly introducing KOLs (Key Opinion Leaders) and KOCs (Key Opinion Consumers) has been a common method to quickly solve industry coverage issues. However, this approach may lead to a clash of old and new residents, especially in technology, which remains a high-threshold niche. Without a sense of gain, new users’ lifecycles may shorten quickly.

Today, Xiaohongshu (Little Red Book) may serve as a good case study. It is active enough, and the TikTok refugee incident somewhat confirms this community’s certain inclusivity across races and cultures. Since last year, Xiaohongshu’s core proposition has been to build an interest-based community, with ambitions for expansion. Technology has become one of the fastest-growing content categories, with tech content publication on Xiaohongshu increasing by over 100% year-on-year and the creator scale growing by over 200%.

More importantly, a truly popular tech community has yet to emerge in the Chinese-speaking world; existing platforms are either too hardcore or diluted. In contrast, platforms like X (formerly Twitter) have proven that a UGC platform can grow into one of the most active and real-time hubs for global tech content. X even created a Musk effect, where anyone praised by Musk would gain instant popularity. In March of this year, Musk called Kimi’s work “Impressive,” causing a stir in the tech circle.

As China’s most active community platform, Xiaohongshu harbors this ambition. The head of Xiaohongshu’s tech operations, Sanbing, mentioned that one of their broad goals is to become the best tech community.

However, they also want to be unique. Sanbing told media outlets like 36Kr that in their operational strategy for the tech category, Xiaohongshu will not focus on news and tutorials but would rather act as a “connector between people.”

To understand this somewhat abstract term, we can look at a recent hackathon hosted by Xiaohongshu. At this event, many young tech entrepreneurs were present—exactly the type of individuals Xiaohongshu aims to attract in its tech category operations. They discussed the strategies and changes in Xiaohongshu’s tech category operations over the past year.

Xiaohongshu Hackathon Peak Competition Scene

Build in Public

“Before creating a product, we first post the idea on Xiaohongshu to verify whether there is a demand.”

At the hackathon in Shanghai, a participant shared this sentiment. The hackathon is an old tradition in the programming world, but it was Xiaohongshu’s first time hosting one. This hackathon required participants to complete a product from idea to implementation within 48 hours, collaborating in a closed team environment, whether it be hardware or software.

The participant, Liu Xiaoben, unfortunately, did not make it to the finals with their project called “Mozao,” which aims to welcome the vibe coding (AI-assisted programming) era. Since voice input for vibe coding is three times faster than keyboard input, they designed Mozao—a black mask that allows free voice communication in public spaces without disturbing others.

Liu Xiaoben with Mozao

Originally intended for outdoor use, Liu Xiaoben made an unexpected discovery after uploading a product video on Xiaohongshu. Users commented, “I really want to buy one for my roommate; he plays Genshin Impact too loudly and disturbs my sleep.” Another programmer remarked that this product is suitable for use in work areas, as many programmers prefer voice input.

This is what Liu Xiaoben referred to as validating demand. In a way, they are the type of entrepreneurs that align with Xiaohongshu’s “Build in Public” ethos, where the creation, validation, and iteration of products involve co-creation with Xiaohongshu users. In reality, Liu Xiaoben has his own Xiaohongshu IP and actively shares various creative ideas on the platform, such as asking, “How much would you pay to preserve your thoughts?” He now runs a startup focused on consciousness uploading technology, sharing the entire process from idea to birth on Xiaohongshu.

The concept of “Build in Public” was repeatedly emphasized during the hackathon, which is not coincidental. It relates to the shift in Xiaohongshu’s tech category operational strategy.

In the second half of last year, an AMA (Ask Me Anything) spontaneously gained popularity on Xiaohongshu. This activity, originating from foreign social platforms, is characterized by real-person Q&A sessions and has gained fame overseas for its brevity and efficiency. On Xiaohongshu, Xu Huazhe, an assistant professor at Tsinghua University’s Institute of Interdisciplinary Information Studies, initiated it, and the AMA trend quickly swept the tech circle in the Chinese-speaking world, attracting AI figures like Li Kaifu and Yin Qi, as well as celebrities from various fields like Mo Yan and Li Yinhe.

For Xiaohongshu’s community tech content operations, this was a significant milestone. Before the AMA’s rise, Xiaohongshu had held an independent developer competition in the first half of 2025, focusing more on the projects themselves and catering to a geek audience, giving independent developers a chance to showcase their work. However, for users who are not professionals or tech enthusiasts, discovering and participating in these discussions was not easy.

The AMA format is clever; you may not understand AI, but you have certainly heard of Li Kaifu. Whether genuinely curious or just a bystander, the presence of tech giants on Xiaohongshu has helped spread the word that Xiaohongshu can be related to technology, allowing industry professionals to publish content and followers to find like-minded individuals.

As Sanbing stated, the main strategy for Xiaohongshu’s tech category operations in 2025 is to “create a social circle for tech enthusiasts,” and to this end, they have continuously introduced influential KOLs from academia and business.

However, by 2026, the operational strategy for the tech category has changed. “After completing the introduction phase, we found that many people on Xiaohongshu are building in public. Entrepreneurs are posting products, and investors are looking for projects on Xiaohongshu. Therefore, in 2026, we are positioning ourselves as ‘connectors for people,’ linking the needs of small circles with the demands of the general public,” Sanbing said.

In simple terms, Xiaohongshu aims to allow tech entrepreneurs to find demand, create products, secure funding, and even sell their products all within the community, creating a closed-loop possibility.

This is why last year was the independent developer competition, and this year is the hackathon peak competition—because strategies are changing, and people have become the focal point. Sanbing mentioned that entrepreneurial projects change rapidly; a year later, styles can be drastically different, and chasing projects would be too late. “Compared to last year, which focused on projects, this year we believe the more cutting-edge aspect lies in people themselves, and the hackathon is inherently filled with unpredictable content, maximizing creativity and personal expression.”

Given Xiaohongshu’s current user base, it cannot solely operate like Jike, a niche discussion forum. The underlying logic is inherently contradictory; small circles imply precise and efficient interactions, while larger circles bring more attention but dilute content depth. More importantly, people may not necessarily need a second Jike.

For Xiaohongshu, retaining this group of tech enthusiasts requires careful consideration of the greatest common denominator—how to adapt tech content to fit the foundation of a general community—ensuring that tech entrepreneurs find value here while ordinary users find the content useful, allowing the flywheel to ultimately turn.

Can Xiaohongshu Create the Right Environment?

A natural logic is that to create something useful for users, content communities often take the shortcut of providing news and tutorials. However, Xiaohongshu has rejected this from the start.

Sanbing stated that the team made a clear choice early on in their content strategy for the tech category, opting not to focus on news and tutorials. The reason is that news is too strong in its informational nature, while tutorials often lead to motivations centered around selling courses, creating FOMO anxiety.

We see that Xiaohongshu has chosen to occupy a niche that combines the entrepreneurial processes of tech entrepreneurs with ordinary individuals. People’s needs are recognized in the early stages, their opinions are considered in the mid-term, and their purchasing power determines whether business can take off. People are involved throughout the process, allowing entrepreneurs to gain attention and commercial value.

When asked about Xiaohongshu’s value to him as a tech entrepreneur, Liu Xiaoben stated, “If I were to talk about Xiaohongshu’s advantages in tech content, I believe it has a strong sense of human connection.” This is a well-trodden perspective, but Liu Xiaoben added an interesting point: “The essence of business is interacting with real people, ultimately serving people. Currently, all real interaction can be facilitated through Xiaohongshu.”

The entrepreneurial experience of hackathon participant Sun Donglai is another case in point. Initially, Sun Donglai simply posted a survey on Xiaohongshu asking, “Do you record your dreams?” Without any promotion, it unexpectedly garnered 200,000 views. Comments ranged from users saying they write 800 words about their dreams daily to others who, unable to find suitable tools, directly opened an author account on Fanqie Novel to serialize their dreams.

Sun Donglai quickly realized that this was a vast and overlooked demand. On January 21, 2025, he launched Dreamoo on the Apple Store. Within a month, he had 3,000 users without any promotion, and now it has stabilized at over 4,000 users.

For early-stage entrepreneurs, this traffic and user feedback allow projects to gain visibility and reduce trial-and-error costs, which are invaluable. During the finals, investor Cao Xi humorously asked, “So do young people no longer need to seek VC funding?” This elicited laughter from the audience.

Portable card guitar smart hardware, won the grand prize

However, from the perspective of the people-goods-field logic, Xiaohongshu may still need to iterate on two aspects to build the right environment.

First, technology is a high-threshold category, and most of Xiaohongshu’s users cannot engage in serious tech discussions. Discussions that are overly professional or geared towards B-end applications may struggle to gain traction on Xiaohongshu.

Thus, entrepreneurs looking for useful feedback on their startups within the community need to possess strong communication skills, an understanding of user emotions, and the ability to navigate content flow logic, even needing to create a personal brand. The aforementioned entrepreneur was able to generate significant user feedback and successfully launch their project because they tapped into the community’s flow mechanism and the needs of the general user base.

Second, due to Xiaohongshu’s user demographics, some general tech or entertainment-related projects often gain high attention on the platform. For example, during this hackathon, a video featuring a combination of a two-dimensional headgear and mechanical arms became one of the hottest topics of the event due to its eye-catching effect. Additionally, the event focused on the new generation of AI creators, with over 60% of the finalists being from the “00s,” the youngest being only 12 years old, and the competition theme set as “48 hours to create a big toy for the world,” which inherently carries a topical nature.

However, an investor told 36Kr that while projects on Xiaohongshu hold certain reference value, finding suitable targets is currently challenging, with high screening costs. “Many ideas are interesting, but they are entertainment-heavy and difficult to commercialize.”

For a young community, Xiaohongshu’s tech category has progressed rapidly, and it is well aware of its advantages. Today, the wave of vibe coding amplifies this advantage, as technology moves towards equality, allowing more people to touch programming, which creates opportunities for generalized user engagement and public discussions.

However, as Sanbing mentioned, while Xiaohongshu has a broad goal of becoming the best tech community, achieving this vision will take a long time. Creating a big toy for the world is just the beginning.

OpenAI Unveils GPT-5.5: The Next Generation AI Model

Fri, 24 Apr 2026 00:00:00 +0000

GPT-5.5 Launch

OpenAI has just unveiled GPT-5.5, its most powerful and versatile flagship model to date. This model represents a new level of intelligence, evolving into the native brain of the Agent era.

The highly anticipated “Spud” has finally arrived.

Notably, GPT-5.5 has achieved top scores across all benchmark tests! In programming, reasoning, mathematics, and agent tasks, it has outperformed Claude Opus 4.7 and Gemini 3.1 Pro.

Compared to its predecessor, GPT-5.5 represents a significant leap, showcasing a clear generational gap.

In AAI tests, GPT-5.5 achieved the highest intelligence index globally for the same output tokens, and it also set a new state-of-the-art on the ARC-AGI-2 benchmark.

Programming Breakthrough

In the core programming domain, GPT-5.5 has made a remarkable comeback. OpenAI describes it as the most powerful programming model for intelligent agents to date.

The Terminal-Bench 2.0 test evaluates the full-chain agent engineering capabilities. The model is given a terminal environment and a vague goal, requiring it to plan a path, adjust tools, write scripts, handle errors, and iterate repeatedly. GPT-5.5 scored 82.7%, compared to GPT-5.4’s 75.1% and Claude Opus 4.7’s 69.4%.

In OpenAI’s internal Expert-SWE evaluation for long-term programming tasks, GPT-5.5 achieved 73.1%, also surpassing GPT-5.4’s 68.5%.

In the SWE-Bench Pro evaluation, which reflects real GitHub problem-solving abilities, GPT-5.5 scored 58.6%, slightly lower than Claude Opus 4.7’s 64.3%. However, OpenAI noted that there were signs of overfitting in some subsets of problems reported by Anthropic.

Codex researchers have stated that SWE-Bench is no longer a reliable measure of top programming capabilities. Importantly, in these evaluations, GPT-5.5 used fewer tokens while still outperforming GPT-5.4.

This capability is even more evident in Codex, where it can handle end-to-end programming tasks, from implementation and refactoring to debugging, testing, and validation.

For example, when tasked with creating a visualization application for the Artemis II space mission, GPT-5.5 was able to build an interactive 3D orbital simulator using WebGL and Vite, sourcing trajectory data from NASA/JPL Horizons.

In another instance, it created a UFO shooting game using Three.js, delivering a playable 3D game in one go.

Impact on Knowledge Work

Beyond programming, GPT-5.5 has also excelled in knowledge work. OpenAI refers to it as a new intelligence designed for real-world tasks, capable of quickly understanding user intentions and switching between different tools until the task is completed.

In the GDPval assessment, which evaluates AI’s ability to perform standardized knowledge work across 44 professions, GPT-5.5 scored 84.9%, outperforming Opus 4.7’s 80.3% and Gemini 3.1 Pro’s 67.3%.

In OSWorld-Verified, which tests the model’s ability to operate in real computer environments, GPT-5.5 scored 78.7%, nearly matching Opus 4.7’s 78.0%. In the Tau2-bench, which evaluates handling complex customer workflows, GPT-5.5 achieved 98.0% without fine-tuning prompts.

Interestingly, OpenAI disclosed that over 85% of its employees use Codex weekly across departments. The PR department utilized GPT-5.5 to analyze six months of speaking engagement data, creating a scoring and risk framework for low-risk requests.

The finance department reviewed 24,771 K-1 tax forms, totaling 71,637 pages, completing the task two weeks earlier than last year. The marketing team automated weekly business report generation, saving 5 to 10 hours each week.

Now, with GPT-5.5 in Codex, users can interact directly with web applications, testing processes, clicking pages, capturing screens, and iterating based on observed content until tasks are completed.

Codex also generates higher-quality spreadsheets, PPTs, and documents, accelerating review and iteration speeds with a new in-app file viewer.

In computer usage, Codex’s ability to operate computers has improved significantly, handling screen content recognition, clicking, typing, navigating, and even transferring contextual information across tools.

OpenAI researcher Noam Brown mentioned that with GPT-5.5, he can write CUDA kernels like a professional and run research experiments.

Scientific Breakthroughs

Additionally, GPT-5.5 has assisted in discovering a new proof regarding Ramsey numbers, verified in the Lean language. Ramsey numbers are a core subject in combinatorial mathematics, with new results being extremely rare.

The paper can be found at: Ramsey Number Proof

GPT-5.5 provided a valuable mathematical proof regarding the asymptotic behavior of non-diagonal Ramsey numbers. In the GeneBench evaluation, GPT-5.5 scored 25.0%, compared to GPT-5.4’s 19.0%. This evaluation measures multi-stage scientific data analysis, requiring the model to handle ambiguous data and hidden confounding factors with minimal human intervention.

In BixBench, based on real bioinformatics data, GPT-5.5 ranked first among all publicly available models with a score of 80.5%.

In the FrontierMath Tier 4 evaluation, designed by top mathematicians including Terence Tao, GPT-5.5 scored 35.4%, significantly higher than GPT-5.4’s 27.1% and Opus 4.7’s 22.9%.

The gap exceeds 12 percentage points, indicating that GPT-5.5’s advantage grows as the mathematical frontier becomes more challenging.

Conclusion

In summary, GPT-5.5’s launch marks a transformative leap rather than just another minor version update. Its performance against Opus 4.7 can be encapsulated in a single image.

In the Vending-Bench, GPT-5.5 also outperformed Opus 4.7, which performed similarly to version 4.6, often misleading vendors and failing in refunds. In contrast, GPT-5.5 operated transparently and won the competition.

Pricing

Regarding pricing, GPT-5.5’s API costs $5 per million input tokens and $30 per million output tokens.

In comparison, GPT-5.4 was priced at $2.50 and $15. This represents a 100% increase.

GPT-5.5 Pro is even more expensive, costing $30 for input and $180 for output. Compared to Opus 4.7, which charges $5 for input and $25 for output, GPT-5.5’s input price is comparable, but the output is $20 more.

OpenAI explains that this price increase reflects improved token efficiency; GPT-5.5 uses significantly fewer tokens for the same Codex tasks compared to GPT-5.4.

In conclusion, GPT-5.5 is a premium product where users pay more for stronger intelligence. In contrast, GPT-5.4 is likely to remain a cost-effective option.

OpenClaw has integrated the powerful GPT-5.5.

A Rapid Evolution

Reflecting on the past eight days:

On April 16, Anthropic’s Opus 4.7 launched a surprise attack on SWE-Bench Pro, dethroning GPT-5.4 from its programming throne. On April 24, GPT-5.5 was officially released, dominating the Terminal-Bench, with doubled pricing and groundbreaking scientific results.

The AI competition of 2026 will no longer be solely about which model is stronger. In GPT-5.5’s narrative, OpenAI emphasizes exploring a new way of computing, a general agent capable of autonomously planning tasks and switching between various tools and software.

Performance scores are just the appetizer; the real battlefield is in agent-based work. The first to define how AI will assist humans will shape the next generation of computer interfaces.

This rapid pace will only accelerate.

VibeCoding vs. WishCoding: Revolutionizing Programming for Everyone

Wed, 22 Apr 2026 00:00:00 +0000

VibeCoding vs. WishCoding: Revolutionizing Programming for Everyone

When Andrej Karpathy, former AI director at Tesla, casually introduced the term “VibeCoding” in 2024, he likely aimed to find a more efficient coding method for programmers. Little did he know, this concept would quickly sweep through the tech community, even being named Collins Dictionary’s Word of the Year for 2025, symbolizing the AI era’s slogan of “anyone can code.” However, as the excitement faded, it became clear that while VibeCoding lowered the barrier to writing code, it still left 99% of ordinary people outside the door of digital creation. It wasn’t until the emergence of Ant Group’s “WishCoding” that a real breakthrough occurred, transforming programming from an engineer-exclusive skill into a superpower accessible to everyone.

The Halo and Limitations of VibeCoding

VibeCoding, translated as “vibe programming,” centers on describing requirements in natural language for AI to automatically generate code. This innovation allows programmers to abandon tedious syntax memorization and repetitive coding, enabling them to quickly obtain runnable programs by simply expressing their ideas. For the approximately 30 million programmers worldwide, this represents a liberation of productivity, allowing them to realize their ideas faster and at lower costs. Consequently, VibeCoding rapidly became mainstream in the industry, viewed as the future direction of software development in the AI era.

However, behind the celebration lies a harsh reality. VibeCoding starts at the IDE (Integrated Development Environment) window and ends with a piece of code. It assumes users understand coding, dependency management, and deployment—basic skills for professional developers but towering obstacles for the remaining 99% of the global population. An ordinary person may be able to use natural language to have AI generate code, but they often lack the knowledge to turn that code into a mobile application, let alone share it with others. VibeCoding only accelerates the “writing code” phase, while every step in the chain from idea to usable application remains an insurmountable hurdle for the average person.

It’s akin to VibeCoding providing ordinary people with a magical knife to quickly prepare ingredients without teaching them how to cook, season, or plate, leaving them without a kitchen to bring their ideas to the table. Most people’s creativity remains trapped in their minds and conversations, unable to transform into usable, shareable digital products.

The Arrival of WishCoding

While the industry was still caught up in the VibeCoding race, Ant Group took a different path by proposing the new “WishCoding” concept. If VibeCoding is about “enabling those who can code to code faster,” then WishCoding is about “empowering those who cannot code to create applications.” It completely bypasses the coding step, reducing the software generation starting point from “writing logic” to “describing intent.” Users need no programming knowledge or technical concepts; they simply need to clearly express what they want in natural language, and AI can directly generate a fully functional, interactive, and shareable application.

Launched in November 2025, Ant Group’s Lingguang (WishCoding) quickly gained attention for its ability to generate applications in 30 seconds and deploy them instantly. On April 20, 2026, Lingguang underwent a critical upgrade, deeply integrating native mobile capabilities (camera, gyroscope, LBS, microphone, etc.) and introducing the “Lingguang Circle” community, allowing users to complete the entire application generation, iteration, usage, and distribution process on mobile.

Creating applications in the Lingguang app is as simple as posting on social media. For instance, if you want to create a fragmented focus timer, just tell it, “support custom focus duration, rest countdown, total focus time leaderboard, minimalist interface, match different white noise based on weather,” and in just a few seconds, a complete usable application will appear on your phone. If the first version isn’t good enough, you don’t need to modify the code; simply tell the AI, “add a focus achievement badge,” and instantly, a new version can be iterated. Throughout the process, there’s not a single line of code or technical jargon—only pure expression of ideas and presentation of results.

This experience completely breaks down the barriers between “those with ideas” and “those who can realize ideas.” Designers, students, office workers, stay-at-home parents—anyone who can express needs in natural language can become a creator in the digital world. Lingguang compresses what traditionally required months and a professional team into 30 seconds, allowing one person to complete the task, truly realizing the era of “one-person applications”—one person, one sentence, one smartphone can bring ideas to life. As of now, Lingguang users have created over 30 million flash applications covering various aspects of life, learning, work, and entertainment.

Lingguang Circle: Nurturing Creativity and Opening a Co-Creation Era

If flash applications have achieved the leap from “personal ideas to applications,” then the “Lingguang Circle” gives every individual’s creativity a broader life. As the industry’s first zero-code application sharing community, Lingguang Circle not only allows users to share their flash applications with one click for others to browse, use, like, and comment, but more importantly, it supports “secondary creation.”

In Lingguang Circle, when you see applications created by others, you don’t need to understand the underlying code or master any technology; you just need to express modification ideas in natural language—“change the color to blue,” “add a check-in feature,” “modify the menu to a low-fat version,” and AI can generate a new version based on the original. This transforms the open-source community’s “Fork code” into “Fork intent,” making collaboration, once limited to professional developers, easily accessible to everyone.

In the past, the growth of software applications was closed and linear, planned, developed, and iterated by professional teams, while ordinary users could only passively use them. However, in Lingguang Circle, application growth becomes open, diverse, and limitless. A basic application can be modified into dozens of versions by different users, some focusing on simplicity and practicality, others adding fun features, and some adapting to specific scenarios. The boundaries between users and creators are completely blurred, allowing everyone to contribute to others’ ideas, continuously evolving and spreading good concepts.

For example, someone created a “relative relationship calculator” to solve the awkwardness of family gatherings; another added dialect terms and automatic reminder features; yet another transformed it into a “workplace address guide” for different workplace scenarios. A small idea, through countless people’s “wishes,” continuously generates new value, forming an unprecedented application ecosystem.

The Essence of the Programming Revolution: From Technical Empowerment to Universal Creation

From VibeCoding to WishCoding, what seems like an iteration of AI programming tools is, in fact, a complete transfer of digital creation power.

VibeCoding represents “technical empowerment for professionals,” optimizing existing software development processes and serving a minority of technical individuals, with a ceiling audience of only 30 million programmers. In contrast, WishCoding represents “technology benefiting everyone,” reconstructing the production relationship of software and liberating the power of digital creation from engineers to 8 billion ordinary people worldwide.

Behind this is a qualitative change in AI technology from “auxiliary tool” to “core productivity.” As early as the 1990s, former Microsoft chief architect Charles Simonyi proposed “Intentional Programming,” hoping software development could focus on user intent rather than code details. However, limited by the technology of the time, it remained a conceptual idea. Now, with the maturity of large models’ understanding, generation, and multimodal interaction capabilities, WishCoding has finally transitioned from concept to reality.

What Ant Group has done is act as the “automation layer from intent to realization” envisioned by Simonyi, hiding the complexities of code, development environments, and deployment processes behind the scenes, leaving users with only the simple path of “inputting ideas → obtaining applications → sharing and iterating.” It is no longer about optimizing a specific technical point; it has opened up a complete chain, allowing every fleeting thought the opportunity to become an application that can be used, shared, and continuously rewritten.

The Future is Here: Everyone is a Creator in the Digital World

As AI evolves from merely an auxiliary programming tool to a Coding Agent accessible to everyone, the landscape of the digital world is being fundamentally rewritten.

In the past, we were accustomed to passively using applications developed by others, satisfied with “as long as it works”; in the future, we can actively create our own tools, meeting every personalized need. Previously, the value of creativity relied on technical teams to realize; in the future, a single sentence can turn an idea into reality, with some individuals earning nearly ten thousand dollars in just two months from their self-made flash applications. Previously, the software ecosystem was dominated by a few tech companies; in the future, countless ordinary people’s small ideas will converge into a vast ocean of the digital world.

The product manager of Lingguang stated: “The development of AI will accelerate the release of software productivity to ordinary people. Lingguang aims to be the accelerator of this productivity revolution, enabling every ordinary person to have their own Coding Agent to create their own applications.”

From the “celebration of programmers” in VibeCoding to the “creation by everyone” in WishCoding, the AI programming revolution has finally moved in the right direction. This is no longer a technical game for a select few, but a digital age dividend belonging to everyone. When “anyone can code” is no longer just a slogan, and every spark of inspiration can take root and grow, we will ultimately usher in a truly open, diverse, and infinitely possible new digital world. And it all begins with you expressing your first wish in the Lingguang app.

Claude Enters Word, But Chinese Office Software is Already Prepared

Mon, 20 Apr 2026 00:00:00 +0000

Claude’s Entry into Word

On April 10, Anthropic launched the public beta of Claude for Word, completing the integration of AI into the Microsoft Office suite. Over the past six months, Claude has permeated the entire Office ecosystem, from Excel to PowerPoint and now Word.

This news has made waves in the overseas tech community. However, in the Chinese market, another battle regarding “AI + Office” has already begun.

Claude’s Revision Mode

The most emphasized feature of Claude for Word is its revision mode (Tracked Changes). The official demonstration is clear: when opening an NDA contract, Claude provides modification suggestions in the right sidebar, with each change displayed in Word’s native revision mode—original text struck through, new content marked as inserted, allowing users to accept or reject changes one by one.

This design is highlighted by Anthropic for a simple reason: it addresses the biggest trust issue with AI office tools—“What changes did AI make? I need to see them.”

In industries like law, finance, and compliance, where audit trails are strictly required, revision mode is not just an added feature but a prerequisite. Anthropic smartly positions this functionality at the forefront, directly targeting the trillion-dollar global legal services market.

But here’s the question: Is revision mode an invention of AI?

No. This is a basic feature that both Word and WPS have had for over twenty years. Claude merely attaches AI output to this existing mechanism.

China’s Office Software’s AI Tracing Practices

In the Chinese market, WPS AI has already implemented similar capabilities. When users ask WPS AI to modify a piece of text, the changes can also be presented in revision mode. Every addition or deletion is traceable, and users can review, accept, or reject changes one by one. This is not about “catching up”; it is common sense in product design—AI-assisted office tools must leave the final decision-making power in human hands.

The difference lies in the narrative. Claude markets “revision mode” as a selling point, while WPS AI considers it a standard feature.

Behind this is a difference in product philosophy. Overseas AI companies tend to use a “disruptive” narrative, packaging existing features as new inventions; Chinese office software is more pragmatic, embedding AI capabilities into existing workflows without deliberately emphasizing “what AI has done,” allowing users to use it naturally.

Local Context: Claude Cannot Review Chinese Contracts

The first use case officially listed for Claude for Word is “legal contract review.” The demonstration scenarios consist entirely of English NDAs, commercial terms, and compensation clauses.

This is fine, as the U.S. legal market is indeed large. However, in China, contract review follows a different logic.

Chinese contracts have unique expression habits: “Party A shall,” “Party B must,” “Both parties agree,” and “This contract shall take effect from the date of signature.” The logical relationships between clauses, the expression of breach responsibilities, and the jurisdiction for dispute resolution all require a deep understanding of the Chinese legal system.

WPS AI has a clear first-mover advantage in this area. It is trained on a vast corpus of Chinese contract data, understands the structural clauses of Chinese contract law, labor law, and corporate law, and can identify local legal risk points such as “standard clauses,” “exemption clauses,” and “excessive penalties.” More importantly, WPS has a complete library of Chinese contract templates—labor contracts, lease contracts, procurement contracts, confidentiality agreements, and equity transfer agreements—covering the main scenarios of daily business operations. Users can open a template and let AI fill in specific clauses, with every change traced and every risk highlighted.

This is something Claude cannot do. It cannot grasp the “flavor” of Chinese contracts, nor can it understand the “rules” of official documents.

Government Documents: A Market Claude Cannot Enter

There is a market in China that Claude cannot reach at all: government documents.

Government agencies and state-owned enterprises produce a large volume of official documents daily—requests, meeting minutes, notices, and work summaries. These documents have strict formatting standards: title levels, font sizes, paragraph spacing, page margins, and even the placement of “no text on this page” have specific requirements.

WPS has over thirty years of accumulation in this field. From the early WPS 1.0 to the current WPS 365, templates and formatting standards for government documents have been internalized into the product’s DNA. WPS AI builds on this foundation with intelligent capabilities, enabling it to:

Automatically detect formatting deviations and prompt “the title should be in bold size three”
Identify sensitive words and warn “this expression may involve compliance risks”
Compare versions and generate “modification explanations” with one click
Maintain audit trails, recording “who changed what at what time”

These capabilities are not available in Claude for Word. It is not a technical issue but a problem of understanding the context—it simply does not know what a “red-headed document” is, what a “request for instructions” entails, or the standard format for “meeting minutes.”

Ecological Integration: A Unified Experience

One highlight of Claude for Word is its “cross-Office collaboration”—Word, Excel, and PowerPoint share context, allowing data to be pulled from Excel into Word and then condensed into PPT.

WPS also has this capability, and it is even lighter. WPS 365 is designed as an integrated solution: documents, spreadsheets, presentations, PDFs, mind maps, and flowcharts all operate under the same account system. Users do not need to “cross-application” because they are all in one application.

When opening WPS AI, users can say, “Create a chart from this spreadsheet data and insert it into the document,” and AI automatically completes the format conversion and content insertion. Saying, “Summarize this document into a 10-page PPT,” allows AI to automatically extract key points and generate slides.

There is no need for account switching, no format compatibility issues, and no confusion over “where is this file stored in the cloud.” WPS’s “family bucket” is the result of integrated design rather than a patchwork solution.

Enterprise-Level Capabilities: Data Sovereignty

Claude for Word emphasizes operating “within an enterprise security framework,” supporting Amazon Bedrock, Google Vertex AI, Microsoft Azure, and other enterprise gateways. This is fine, but for Chinese enterprises, there is an even stricter requirement: data sovereignty.

Key industries such as finance, government, energy, and telecommunications have strict regulatory requirements for data security. No matter how useful an AI office tool is, if data needs to be sent to overseas servers, it will not pass compliance checks.

WPS AI supports private deployment, allowing enterprises to run AI capabilities on their own servers, keeping data entirely within the internal network. Additionally, WPS has completed domestic compatibility certifications with mainstream Chinese operating systems, databases, and middleware.

This is not just an “added bonus”; it is a “passport”. In the Chinese enterprise market, without these capabilities, AI office tools cannot even enter.

Furthermore, Claude for Word is priced at $20 per month for Pro users (approximately 145 RMB) and $100 per month for Max users (approximately 725 RMB), requiring a subscription to Microsoft 365 to use. In contrast, WPS AI is a value-added service for WPS members, with a lower price threshold, making it more user-friendly for domestic users.

Data-Driven Insights: WPS AI’s User Base

By the end of 2025, WPS AI’s domestic monthly active users exceeded 80.13 million, a year-on-year increase of 307%, with the proportion of enterprise users rising to 42%. During the same period, WPS Office’s overall global monthly active device count reached 678 million.

What does this number mean? It means that WPS AI is no longer just a “concept product” but a productivity tool with a real user base. 80 million monthly active users are using AI daily to write documents, review contracts, and create PPTs, and this usage data continuously optimizes AI capabilities.

Conclusion: The Chinese Approach to AI Office Tools

The launch of Claude for Word signifies that AI office tools have entered a “deep water zone.” It is no longer about flashy demonstrations of “what AI can write,” but rather a practical approach to “how AI can be embedded into workflows.”

In this race, Chinese office software has not been absent. WPS AI, with over thirty years of local accumulation and a deep understanding of Chinese contexts, has carved out a different path:

It does not emphasize “AI disruption” but integrates AI capabilities into daily office tasks.
It does not pursue “omnipotence” but focuses on vertical scenarios such as “contract review,” “government documents,” and “enterprise compliance.”
It does not use high prices to filter users but makes it accessible and effective for more people.

Claude has entered Word, but Chinese office software has long been prepared. This is not a story of catching up but rather the parallel evolution of two distinct paths.

For users, the choice of which path to take depends on where you work, what language you use, what contracts you review, and what documents you write. The competition among AI office tools ultimately hinges not on technology but on understanding the context. In this dimension, local players have the first-mover advantage.

Comprehensive Guide to Artificial Intelligence (AI) Major: Overview and Employment

Mon, 20 Apr 2026 00:00:00 +0000

Overview of the Artificial Intelligence (AI) Major

The Artificial Intelligence (AI) major is an emerging interdisciplinary field that integrates knowledge from automation, computer science, electronic information, and mathematical statistics. The core aim is to cultivate talents who master the fundamental theories, methods, and technologies of AI, capable of designing intelligent systems and developing algorithms. Unlike traditional computer science programs, the AI major focuses more on enabling machines to simulate, extend, and enhance human intelligence, concentrating on three main areas: algorithms, data processing, and intelligent applications.

Currently, over 600 universities in the country offer this major, with some enrolling by category and others through experimental classes. The subject requirements typically include physics and chemistry. Core courses cover machine learning, deep learning, natural language processing, and computer vision, with various universities adding specialized courses like AI in transportation or AI in healthcare based on their unique strengths.

Core Learning Directions in the AI Major

Students in the AI major should not follow trends blindly; the core is divided into four directions to match different interests and industry demands:

Algorithm Development: This direction focuses on designing and optimizing AI algorithms, requiring a solid foundation in mathematics (linear algebra, probability, statistics, etc.). Future emphasis will be on training and deploying algorithm models to meet high-end technical job demands.
Data Processing: This area centers on data collection, cleaning, and analysis necessary for AI applications. It does not require deep knowledge of complex algorithms, making it suitable for learners who excel in data organization and logical analysis, with relatively friendly employment thresholds.
Intelligent Applications: This direction emphasizes the practical application of AI technologies in specific scenarios (e.g., smart transportation, smart healthcare, robotics). It focuses on practical skills, catering to learners who prefer hands-on applications.
Theoretical Research: This area explores the underlying logic of AI and cutting-edge technology, suitable for those interested in further studies and academic research, often leading to positions in research institutes or universities.

Employment Status and Directions for AI Graduates

Industry research shows that the demand for AI talent continues to grow, with a yearly increase in job openings related to AI. There is a significant demand for high-skilled, interdisciplinary talents, with salaries notably higher than in traditional industries. During spring recruitment, AI positions even offered annual salaries exceeding one million. The core employment directions are divided into four categories:

Technical Positions: Key roles include algorithm engineers, AI developers, and data analysts, responsible for algorithm design, model training, and intelligent system development. These positions require a strong professional foundation and practical skills, currently the most sought-after roles in the AI industry.
Application Positions: This includes AI product managers, AI operations, and intelligent system maintenance roles, which do not require deep algorithm knowledge but focus on product design, implementation, and daily operations, suitable for those with weaker technical backgrounds but strong communication and operational skills.
Research Positions: Primarily found in universities and research institutes, these roles focus on cutting-edge AI technology research and driving AI technology iterations, requiring a high academic level, mostly for those with graduate degrees or higher.
Cross-disciplinary Positions: With the advancement of the “AI+” initiative, AI is deeply integrated with industries such as finance, law, and healthcare, creating cross-disciplinary roles like AI+ insurance and AI+ law. Interdisciplinary talents are more competitive, and employment opportunities continue to expand.

Employment Misconceptions and Considerations for AI Majors

Many people have misconceptions about employment in the AI field. Here are two common misunderstandings to help avoid risks and enhance content value:

Misconception 1: AI major = high salary, no need for a solid foundation. In reality, AI technology evolves rapidly; merely grasping the basics makes it difficult to qualify for core positions. The market for lower-tier talents is already saturated, and high-paying jobs prioritize core competencies and practical experience.
Misconception 2: The AI major is suitable for everyone. This major requires strong mathematical and logical thinking skills. If one is not proficient in science subjects, learning can be quite challenging. It is advisable to choose based on personal interests and foundational strengths.

Conclusion and Interaction

In summary, the Artificial Intelligence (AI) major is a popular field aligned with the development of the times, characterized by interdisciplinary integration, broad employment opportunities, and significant growth potential. However, it also demands certain foundational knowledge and abilities from learners. Whether planning to enroll in the AI major or seeking to transition into AI-related work, it is essential to clarify personal directions and solidly build core competencies.

Integrating AI into Education: A Comprehensive Approach

Mon, 20 Apr 2026 00:00:00 +0000

Integrating AI into Education

Artificial intelligence (AI) is profoundly changing the way knowledge is produced, disseminated, and applied, continuously reshaping the organizational logic, supply forms, and governance models in education. The core of “AI + Education” lies in two words: “integration” and “transformation.” Integration aims to establish a deep fusion of AI and education, allowing AI to enter the basic structure, chain, and space of educational operations. Transformation seeks to leverage AI as an engine for educational reform, driving systematic changes in school models, teaching methods, management systems, and support mechanisms. Specifically, the concepts of comprehensive elements, processes, and scenarios are the concrete manifestations of integration, while school management, teaching, and support form the practical focus of transformation.

Comprehensive Elements

Comprehensive elements refer to the deep integration of AI into key educational components. Education is not a simple addition of individual segments but a complex system comprising students, teachers, environments, and more. The “AI + Education Action Plan” emphasizes AI’s empowering role at critical nodes such as teaching, learning, management, research, and internationalization. For teachers, it highlights the importance of enhancing digital literacy and intelligent application capabilities, enabling them to optimize teaching design, improve methods, and elevate professional standards. For students, it stresses the need for AI literacy education to enhance their learning abilities, thinking quality, and complex problem-solving skills. In terms of disciplines and research, it calls for adapting to changes in knowledge production methods, promoting interdisciplinary integration, dynamic adjustments of specialties, and innovative research paradigms. Thus, the emphasis on comprehensive elements is not merely about adding technology to existing educational structures but about fostering a deeper coupling between AI and key educational components, enhancing the adaptability, resilience, and innovation capacity of the educational system.

Comprehensive Processes

Comprehensive processes mean that AI should permeate all stages and segments of educational development. The Action Plan outlines AI education across all educational stages and general education for society: in primary and secondary education, the focus is on popularizing AI literacy, solidifying digital literacy and cognitive foundations, helping students establish a basic understanding and correct attitude towards AI; in vocational education, the emphasis is on aligning with industry needs, promoting professional upgrades and skill restructuring, enhancing students’ practical abilities in intelligent production, services, and management; in higher education, the focus is on strengthening basic research, interdisciplinary integration, and cultivating top innovative talents, making AI a crucial support for public foundational courses and interdisciplinary studies; and in lifelong education, the emphasis is on providing general education and skills training for all societal members, enhancing overall AI literacy and adaptability to technological changes. This forms a developmental pattern that connects from basic to professional education, from school to society. Furthermore, this connection is not merely a temporal extension but a systematic linkage of educational goals, curriculum content, training methods, and evaluation mechanisms. AI should not only enter every educational stage but also develop a progressive and spiraling training system according to the educational functions and developmental tasks of different stages.

Comprehensive Scenarios

Comprehensive scenarios signify that AI applications in education must transcend traditional classroom and school boundaries, entering more open, complex, and collaborative educational spaces. With profound changes in learning methods, resource forms, and educational organization, educational activities increasingly occur in interactions among schools, families, society, online spaces, and blended environments. The Action Plan proposes building future classrooms, schools, learning centers, and training centers, promoting thematic learning scenarios, virtual simulation experiments, smart MOOCs, and collaborative applications of intelligent terminals. The core aim is to construct a new educational ecosystem with multidimensional interactions, allowing learning activities to unfold in more authentic, richer, and personalized scenarios. Comprehensive scenarios not only expand the application space of technology but also reconstruct educational organization and resource supply methods. Notably, the document specifically addresses rural schools, remote areas, special education groups, and social learners, indicating that AI educational applications are not solely for high-level schools and advantageous regions but have a clear inclusive orientation, aiming to lower the barriers to accessing quality educational resources through technological means and promote educational equity from opportunity to process and quality.

Practical Implementation

From a practical perspective, promoting the integration of AI into education across comprehensive elements, processes, and scenarios essentially requires achieving deep transformations in four areas.

School Models: Transition from relatively closed, singular supply traditional schooling to open, shared, collaborative, and boundary-less intelligent schooling. AI does not merely bring minor adjustments to a teaching segment but profoundly reconstructs school organizational forms and resource supply systems. In the past, schools relied more on internal courses, teachers, and spaces. In the intelligent era, quality educational resources will flow in broader forms, linking courses, teachers, platforms, and industry resources over a wider range. Schools must shift from resource-occupying models to resource-integrating and platform-supporting models, fostering a resource supply pattern involving diverse participation from government, schools, enterprises, research institutions, and society. Especially in vocational and higher education, there should be further promotion of the integration of science and education, and industry and education, exploring new mechanisms for collaborative curriculum development, project co-construction, and talent co-cultivation, making school education more aligned with technological frontiers, industrial changes, and real-world problem contexts. AI is driving school models from clearly defined closed systems to open and shared ecosystems, essentially reconstructing school organizations and serving as a crucial breakthrough for educational reform.
Teaching Methods: Shift from knowledge transmission to personalized learning, competency orientation, and human-machine collaborative education. In the intelligent era, the threshold for knowledge acquisition is significantly lowered, making traditional teaching methods focused solely on knowledge points and standardized answer training inadequate for future talent requirements. The focus of education must shift from knowledge transmission to competency cultivation, from uniform pacing to tailored instruction and individual development. AI can provide more precise learning support through learning situation analysis, path recommendations, and process evaluations, creating conditions for teachers to implement differentiated teaching and precise education. Future classrooms should emphasize problem-based learning (PBL) and project-based learning, promoting the triadic collaboration of teachers, machines, and students, facilitating students’ learning, exploration, cooperation, and creation in real tasks and complex situations, while focusing on cultivating students’ judgment, deep questioning, and innovative reconstruction abilities. AI can take on repetitive and procedural tasks but cannot replace teachers’ core roles in value guidance, emotional support, ethical judgment, and character development. Therefore, the transformation of teaching methods must closely integrate the construction of future teachers and future classrooms, encouraging teachers to shift from knowledge transmitters to learning designers, growth facilitators, and innovation guides.
Management Systems: Transition from hierarchical, experience-based, and relatively extensive traditional management to flat, agile, data-driven, and precise decision-making modern governance. For a long time, educational management in many scenarios has relied heavily on experiential judgment, static statistics, and segmented management, often facing challenges such as insufficient responsiveness, lack of smooth collaboration, and imprecise resource allocation. The introduction of AI provides important conditions for reconstructing educational governance processes. The Action Plan uses an intelligent educational brain to drive reforms in talent supply and demand, examination evaluation, employment services, and safety warnings, promoting governance from fragmentation to integration, from experience-driven to data-driven, and from reactive to proactive analysis. Future schools should not only have more technical equipment and richer application scenarios but also possess more scientific governance structures, efficient management operations, and precise decision-making mechanisms. Through intelligent analysis and decision support, schools and educational administrative departments can better grasp student development patterns, optimize resource allocation, adjust professional structures, and enhance management efficiency, pushing management systems towards a flatter, more agile, and collaborative direction. However, humans will always be the main body of governance, and it is essential to maintain that technology aids decision-making while responsibility ultimately lies with people, ensuring educational equity, adherence to educational laws, and ethical boundaries while enhancing governance capabilities.
Support Mechanisms: Transition from decentralized construction, partial support, and element stacking to systematic support through institutional coordination, standard guidance, platform integration, and collaborative innovation. The deep integration of AI into education requires a complete support system covering policies, standards, infrastructure, teacher training, research support, safety governance, and ecological collaboration. On one hand, it is necessary to strengthen the construction of new educational infrastructure, such as computing power, data, platforms, and models, and to improve data governance, algorithm standards, privacy protection, content safety, and risk prevention mechanisms to provide a reliable foundation for educational intelligence. On the other hand, the development of the teaching workforce must be prioritized, enhancing teachers’ abilities to harness intelligent tools, optimize teaching processes, and implement human-machine collaborative education in light of the new roles and missions assigned by AI. Additionally, collaboration among universities, research institutions, government, enterprises, and schools must be strengthened to form a multi-party supported UGBS collaborative innovation model, creating synergy in basic research, technology development, scenario implementation, and evaluation reform. Only through coordinated efforts in institutional, resource, teacher, and innovation ecological support can the integration of “AI + Education” move from pilot exploration to large-scale application and from technology being usable to being effectively utilized in education.

Is ChatGPT Useful? A 2026 Real-World Evaluation

Mon, 20 Apr 2026 00:00:00 +0000

Is ChatGPT Useful? A 2026 Real-World Evaluation

KULAAI (t.kulaai.cn) serves as a recommended AI tool platform for those looking to compare different AI tools quickly.

In the past two years, the question surrounding ChatGPT has shifted from “Can it be used?” to “What problems can it solve for you?” By 2026, it has evolved from a mere chatbot to a more versatile tool resembling a “universal work interface.”

If you only need it for occasional Q&A or to write a few lines, it remains effective. However, when integrated into daily workflows for tasks like searching, organizing, rewriting, and analyzing, its value becomes more apparent.

Conclusion

In conclusion, ChatGPT is still useful, but it is no longer the only solution available. Its strengths lie in providing a comprehensive experience, quick responses, and suitability for high-frequency tasks. However, its shortcomings regarding information reliability, stability in complex tasks, and localized understanding in Chinese contexts are becoming clearer.

Many first-time users of ChatGPT find it fast and user-friendly. There’s no need to learn complex operations or configure numerous parameters; you simply state your needs to get results. This is particularly beneficial for writing emails, outlining, revising titles, summarizing meeting notes, and generating table frameworks.

For those in content creation, operations, or product roles, it can save significant time. However, issues arise in real work scenarios. The most common problem is that it may present information that sounds plausible but contains logical flaws upon closer inspection. This is especially true for high-risk content involving policies, industry data, law, healthcare, and finance, where conclusions should not be taken at face value.

ChatGPT can help organize thoughts but cannot make final judgments. This aligns with the consensus among many users in 2026: it is suitable as a co-pilot but not as an autopilot.

User Experience

From a user experience perspective, ChatGPT’s greatest strength remains its versatility. When given a vague request, it can usually break down tasks clearly. For example, if you ask it to write a user forum-style article, it will automatically supplement the structure, tone, and paragraph logic. If you request a transformation of content into an industry analysis style, it can swiftly switch to a different expression.

This adaptability is something many tools struggle to achieve. However, user expectations for AI have changed compared to previous years. Users previously sought tools that could “write.” Now, they desire tools that can “write accurately, handle requests well, and avoid fabrications.” In this regard, ChatGPT faces increased pressure as more models are emerging with enhanced reasoning capabilities, longer context handling, specialized Q&A, and online retrieval, creating a stronger differentiation in the market.

From an industry perspective, ChatGPT’s positioning in 2026 is clear: it is not the single strongest tool but remains one of the most stable in terms of overall experience. This means it is suitable for most average users and many light to moderate professional scenarios. However, for highly specialized needs, such as code review, generating long Chinese texts, academic research, or enterprise knowledge base Q&A, relying solely on it may not be the most efficient choice.

Another noticeable change is that users are increasingly reluctant to be tied to a single model. Previously, it was common to say, “I only use this one.” Now, it feels more like, “I switch models based on the task.” One model for writing, another for reasoning, a different one for research, and yet another for creating charts. In this trend, ChatGPT’s value appears more like a “main entry point” rather than a singular tool. It has accustomed users to an AI workflow, but the real efficiency gains often come from multi-model collaboration.

Advantages and Disadvantages

From genuine user feedback, ChatGPT’s advantages can be summarized in three points:

Quick to learn, with almost no learning curve.
Stable output, with most routine tasks performed well.
A mature ecosystem, where many users have developed prompts and work templates.

These aspects give it a significant advantage in team collaboration, especially for standardized content and repetitive tasks. However, there are also realistic drawbacks:

It can confidently present incorrect information, appearing coherent but not necessarily accurate.
Its handling of Chinese nuances may sometimes lack local relevance, especially in colloquial or industry-specific contexts.
Complex tasks require repeated questioning, necessitating time to break down objectives for noticeable improvements.

Thus, it is not a tool that provides perfect answers with a single query, but rather one where the user’s ability to articulate their needs determines the quality of the experience.

This reflects the core of AI tool competition in 2026. Products are not just competing on model capabilities but also on who understands real work scenarios better. The ability to integrate retrieval, editing, summarizing, distributing, and collaborating will be key to retaining users.

ChatGPT’s advantage lies in its established usage habits, but it faces clear challenges: users are no longer just looking for a tool that can chat; they want a tool that can effectively help them complete tasks.

For average users, my advice is straightforward: for daily Q&A, inspiration generation, content drafting, and information organization, ChatGPT is still worth using. If you seek high accuracy and professionalism, it’s best to consider it a starting point rather than an endpoint. You can have it create a framework, then combine it with human judgment and other tools for cross-validation. This approach maximizes efficiency and stability.

So, returning to the question in the title: Is ChatGPT useful? The answer is: yes, but only if you know how to use it. In 2026, it remains one of the most worthwhile AI tools to familiarize yourself with, but it is no longer the answer for “one tool solves all problems.” Instead, it serves as a foundational layer in your entire AI workflow. Those who know how to use it can turn it into a productivity tool; those who do not will only find it “impressive in appearance but mediocre in practice.” This is perhaps the most accurate evaluation of ChatGPT today.

Reading in the Age of AI: Bridging Traditional and Digital Literacy

Mon, 20 Apr 2026 00:00:00 +0000

Reading as a Path to Knowledge

General Secretary Xi Jinping pointed out: “Reading is an important way for humanity to acquire knowledge, enlighten wisdom, and cultivate morality.” The prosperity of culture cannot be separated from the spiritual nourishment brought by reading.

The fragrance of books creates an atmosphere. The concept of “national reading” has been included in the government work report for 13 consecutive years, and the promotion of a book-loving society has been incorporated into the 14th Five-Year Plan. The “National Reading Promotion Regulations” officially took effect on February 1 this year, marking the first nationwide “National Reading Activity Week.”

In the digital age, with the fast pace of society, it is not easy to sit down and read a book patiently. Today, AI (artificial intelligence), with its instantaneous, massive, and interactive characteristics, has greatly expanded the breadth of information access. From “one book in hand” to “one screen with thousands of volumes,” what new changes in reading have emerged in the AI era? To preserve our core values and literacy, how can we better integrate digital reading with traditional reading? In this special report, we invite teachers, writers, scholars, and bloggers to discuss “how to build our reading ’large model’ in the age of artificial intelligence,” hoping for the whole society to engage in reading and foster a rich atmosphere of love for reading, reading well, and reading wisely.

Reading: A Daily Habit and a Life-Changing Journey

I have always believed that reading is both a daily routine and a significant aspect of life.

In the morning, the sound of reading awakens the valley; at night, the light reflects off the pages of books. This is the daily life of the girls at Huaping High School. Through the window of reading, children can “see”: the waves of history, the brilliance of culture, the light of hope, and the countless possibilities of life.

Some may argue that with the internet being so developed, we can find any information online, so do we still need to read?

In reality, life does not come with ready-made answers. The internet can quickly provide responses, but it cannot teach children self-reliance, inner resilience, or the strength to confront fate. The answers to life must be accumulated through reading, thinking, and acting.

After teaching for many years, I increasingly feel that many children lack not only book knowledge but also opportunities, vision, and a belief: a belief that they can step outside, change their own and their families’ destinies.

Reading can “establish the heart.” Once the heart is established, one can face the unknown and walk forward fearlessly. Every page read today accumulates confidence for tomorrow. This confidence may not be loud but is a quiet strength that gives you the assurance of “I am a mountain” and understanding of why to learn and where to go.

Reading can also “secure the self.” It teaches one not to bow to suffering or give up, allowing individuals to stand firm in society. Even when in a low valley, one can gaze at a starry sky within. The books read may not immediately translate into scores or wealth, but they settle in one’s gaze, speech, and courage in decision-making.

I know that many children in the mountains are waiting for a book that can ignite their hopes, perhaps a copy of “Three Hundred Tang Poems” or “One Hundred Thousand Whys.” The mountains may temporarily block their paths, but the fragrance of books can transcend them.

In recent years, I have seen many reading spaces emerge in rural areas. For children in the mountains, every opportunity to encounter a good book is not just an embellishment but a timely assistance.

A book can traverse mountains and change a child’s destiny; a reading room rooted in the countryside can illuminate the spiritual homes of generations. As more attention is directed to the depths of the mountains, and more good books traverse winding paths, the turning of pages opens up horizons, allowing children to escape their limited lives and encounter a broader world.

“The fragrance of books is an atmosphere”—this is not just a slogan but is built up bit by bit through books, reading rooms, and people willing to read with children. May the abundant fragrance of books lead children from the mountains to distant places. May every child have the mountains and rivers in their hearts and light in their eyes.

The Need for Deep Reading Without Screens

I recall an incident. During the Spring Festival in 1975, at the age of 11, I found a book in the firewood pile at a relative’s house. The back cover and the last several pages had been torn off and used as kindling. Despite its damaged state, I was captivated and read it even during meals. My relative, seeing my enthusiasm, generously gave it to me. Later, I learned it was a novel titled “Snow in the Forest.” This was the only extracurricular book I had before high school, and I read it repeatedly, as if exploring a dazzling new world. Before reading this book, I thought our village was the world, and everyone lived like us. I believed I would work like my parents, rising with the sun and resting at sunset, living the same life forever. It was only after reading this book that I discovered I had an “inner self,” illuminated and expanded by this book, giving me dreams and a sense of direction.

The ancients said, “Opening a book is beneficial.” However, with the rapid advancement of technologies such as television, the internet, and AI, the presence of reading in our lives seems to be diminishing.

I think of another event. In 1885, Karl Benz invented the first single-cylinder gasoline engine car, known as “Benz No. 1,” which is recognized as the birth of the modern automobile. Year after year, the expansion of automobiles spread across the earth like lichen and fungi, with roads reaching the ends of the earth and the corners of the sea. This is a tremendous gift of technology to humanity, making the world feel smaller, and everyone became “swift-footed” and “strong as Hercules.” Today, we fully experience the progress of technology, as many people use cars for transportation. While it may not need verification, I can confidently assert that many runners today are not running because they lack wheels to carry them but because they have relied too much on wheels, leading to health issues. This may be something humanity did not anticipate when inventing the automobile. Today, there are so many runners because there are so many cars. Humans are not racing against cars for speed but are running to regain the “steering wheel” of health.

I believe that today’s AI may be akin to yesterday’s automobile; the “feeding” style of information output may lead to a “hollow mind.” One day, we may have to “mend the sheep after it has bolted,” like the many who abandon their cars to run, deliberately disconnecting from screens and electricity, and picking up paper books to chew on and deeply read. Perhaps there is a young person who, like me half a century ago, will inadvertently discover their dormant inner self in a book, igniting their dreams. Therefore, I advocate setting aside time for “deep reading without screens” to ensure we maintain the ability to appreciate words, experience emotions, think deeply, and engage with high-level content. Advocating and building “deep reading without screens” is not about rejecting technology but about better mastering it; it is not about returning to the past but about moving towards the future more clearly and proactively.

Information Acquisition Does Not Equal Understanding the World

In today’s rapidly developing AI landscape, humanity’s ability to acquire information has reached unprecedented heights. However, a more critical question arises: when “knowing” becomes so easy, do we truly “understand” the world?

To answer this question, we must clearly distinguish between the two capabilities of reading—information acquisition and understanding construction. The former relies on technology and can be accelerated continuously, aiming to obtain answers; the latter depends on the individual and must be achieved through thinking and repeated contemplation. The former addresses “what is,” while the latter responds to “why” and “what it means.”

In an environment rich in information, people increasingly confuse “information acquisition” with “completing understanding,” mistaking “mastering conclusions” for “mastering problems.” This cognitive misalignment shifts reading from a process of generating meaning to a process of receiving results. On the surface, humanity seems to possess more information than ever, but at a deeper level, the ability to understand is gradually being weakened. It can be said that the reading issue in the AI era is no longer about “how much one reads” but rather “how deeply one understands.”

Consequently, the task of reading has also changed: it is no longer merely about acquiring knowledge but about constructing understanding and judgment.

How should we read in the AI era? We can approach this from two aspects: “fast variables” and “slow variables.”

“Fast variables” refer to using AI tools for information acquisition. Whether through intelligent Q&A, knowledge summaries, or multimodal content presentation, AI can help readers quickly enter unfamiliar fields and grasp basic frameworks. This method greatly expands cognitive boundaries, allowing individuals to cross disciplines and rapidly connect different knowledge domains.

However, “fast variables” can only provide an “entrance” and cannot replace “depth.” What truly determines the depth of understanding is “slow variable” reading. This includes careful reading of classic texts, patiently following lengthy discussions, and continuously contemplating complex issues. Unlike “fast variable” information, “slow variable” reading does not pursue speed; it emphasizes the process: repeated reading and constant revision. In this process, readers do not merely receive information but form their own understanding framework through interaction with the text.

Effective reading in the AI era is not about choosing between fast and slow but forming an integrated relationship: using “fast variables” to open up perspectives and “slow variables” to achieve understanding.

In the long run, reading ability will also present new differentiation trends in the AI era. Those who can efficiently acquire information will no longer be scarce; however, those who can form deep understanding amidst complex information will become increasingly important. In other words, the future’s distinction will not lie in who “knows more” but in who “understands more deeply.”

This also means that the value of reading will undergo profound shifts: from accumulating knowledge to training thinking; from possessing information to generating meaning. Technology can continuously lower the threshold for acquiring knowledge but cannot replace human understanding of the world.

In the face of an increasingly complex reality, only through structured reading can we maintain clear judgment amidst the flood of information. Therefore, in the AI era, we need to reaffirm a seemingly simple yet increasingly important fact—“information acquisition” does not equal “understanding the world,” and reading is the essential path to understanding.

AI as My Reading Companion

Many people comment after reading “Half an Hour Comic History of China”: “So history can be so interesting.” Behind the three words “so interesting” lies a stack of historical monographs I have read.

Before telling the stories of each dynasty, I first read general histories like “General History of China” and “Outline of National History” as a foundation, then consulted numerous historical books to fill in the details, and finally continuously reviewed to ensure there were no knowledge gaps.

For content creators, can the lengthy process of knowledge accumulation be aided by AI?

I must admit that my frequency of using AI tools to assist with reading has increased significantly in the past two years. Previously, when reading an economics monograph and encountering unfamiliar concepts, I would spend half a day searching for information and flipping through annotations. Now, relying on AI, I can quickly build a knowledge framework and clear cognitive blind spots.

Today, with the fast pace of work and life, digital reading, fragmented reading, and AI-assisted reading are becoming increasingly common. Many people wonder: since AI can quickly provide answers, is reading still useful? Some even rely solely on AI for information, gradually losing the patience for deep reading.

Regarding reading in the AI era, I hold this principle: AI is my reading companion, but it is merely a crutch, not a leg. Efficient reading tools can quickly filter information but cannot achieve true understanding. Understanding requires deep thinking and articulating knowledge in one’s own words, a process that still relies on personal reading to establish a connection with the text. AI provides answers, while reading provides a way of thinking. An answer can solve a problem, but a way of thinking can solve hundreds or thousands of problems.

Initially, I used comics to explain history and economics, and there were always doubts: does comic popularization significantly lower the threshold for acquiring knowledge, leading readers to abandon original texts?

In fact, the opposite is true. Many readers tell me that after reading “Half an Hour Comic History of China,” they want to read “Records of the Grand Historian.” This indicates that comics are an entry point, not the destination. They ignite curiosity and lead people to deep reading. Similarly, someone who quickly searches for information using AI will, if truly intrigued by a question, go on to read books and seek the more magnificent world behind that question.

In dealing with the challenges posed by AI, how should we respond to reading? I believe that adjusting reading structures and broadening reading perspectives may be key to solving the problem.

AI excels in specialization and verticality, while the most valuable aspect of humans is their cross-disciplinary insight. The more one reads, the more three-dimensional the world becomes, and the stronger the cross-disciplinary insight. Knowledge is always interconnected; the broader the reading, the greater the likelihood of drawing parallels.

Perhaps we do not need to view AI as a simple reading tool or a potential threat, but rather as an opportunity to establish a good interactive relationship with AI, creating new reading experiences. On one hand, we can use AI to alleviate reading pain points and improve efficiency; on the other hand, we must preserve the original intention of deep reading, exercising our thinking and accumulating knowledge, achieving a dual pursuit of efficient reading and personal growth, allowing every reading experience to nourish the soul and enhance the self.

How the Brain Trains Itself to Read

We often assume reading is the most natural thing, as if the brain is born with the ability to recognize words. In reality, the brain capable of reading is a product of postnatal training.

When you read the above text, a region deep in the left hemisphere of your brain is busy recognizing the characters. This area, known as the visual word form area, is responsible for quickly recognizing the combinations of strokes as words and transmitting them to the adjacent temporal lobe language area, which maps them to sounds.

Indeed, although you are silently reading this text, there is a voice in your brain. You may remember your experience of learning to read, where you had to point at the words and read them aloud. This technique is very helpful for our reading training. Even as a mature reader, the brain retains the skill of extracting phonetic information learned during initial reading.

This is because the emergence of written language is relatively recent in human evolution. Humans have created written language for only a few thousand years, and our genes have not had time to evolve a dedicated mechanism for recognizing and understanding it. The brain has adopted a strategy of reusing neurons, utilizing existing visual pathways to train a set of reading skills.

As reading volume increases, cognitive abilities in the brain also enhance, and thinking becomes more efficient. Proficient readers have a highly automated processing mode operating within their brains. The stronger the reading ability, the more active the visual word form area in the left hemisphere becomes. Research has shown that the activation level of this area correlates closely with reading ability, far surpassing the natural maturation that comes with age. Studies contrasting literate and illiterate individuals also indicate that educated brains occupy significantly more resources in the left hemisphere. All this evidence points to one conclusion: reading is not a skill that naturally develops with age; it requires specialized training.

Today, AI permeates our lives at an unprecedented speed, summarizing texts and extracting information for us. In the future, will people be able to rely on AI to acquire mature reading skills without long-term accumulation?

Research in brain science tells us that the brain’s neuroplasticity follows the principle of “use it or lose it.” AI can indeed help us quickly extract information, but can the brain receive training of equal intensity to complete these tasks independently? Although direct evidence of brain science regarding AI-assisted reading is currently lacking, a four-month controlled experiment at MIT involving 54 participants found that those relying on generative AI for writing tasks exhibited significantly weaker neural connectivity in brain regions compared to those who engaged in independent thinking or used traditional search engines, and they struggled to recount what they had just written. This finding, although derived from a writing context, warns us that over-reliance on AI shortcuts may weaken the brain’s ability to process texts.

The future is here, but the essence of reading remains unchanged. When we hold intelligent tools, we must also maintain ownership of our thinking. Because true reading always occurs deep within our brains.

Gaode Unveils ABot: The World's First Full-Stack Embodied Technology System for AGI

Sun, 19 Apr 2026 00:00:00 +0000

Gaode Unveils ABot: The World’s First Full-Stack Embodied Technology System for AGI

On April 19, 2026, at the Beijing Yizhuang Robot Half Marathon, Gaode, a subsidiary of Alibaba, officially unveiled the world’s first fully autonomous embodied robot, “Gaode Tutu.” This quadrupedal robot successfully assisted visually impaired individuals in navigating complex obstacles and crowds, bridging the technological gap between laboratory settings and open environments.

The foundational technology enabling Tutu to handle demanding scenarios such as guiding the visually impaired is Gaode’s newly released ABot full-stack embodied technology system. This system efficiently transforms Gaode’s accumulated spatial intelligence assets into core training resources for embodied systems, based on thousands of real-world scenarios and millions of multimodal Clip data, making it the first full-stack embodied technology system aimed at AGI globally.

ABot System: Three-Layer Flywheel Design for Continuous Evolution of Embodied Intelligence

The ABot system employs a closed-loop flywheel design encompassing three layers: data, model, and application. Its architecture is not merely a simple stack but deeply interlinked, functioning as a unified system where “data drives models, models serve applications, and applications feedback into data.” This approach effectively addresses three major industry bottlenecks: data scarcity, simulation gaps, and skill generalization, forming a complete self-evolving loop.

Data serves as the core “fuel” for embodied intelligence, directly determining its generalization capabilities. Unlike large language models, traditional real-world data collection is difficult to scale and incurs exponentially rising costs.

As the core of the data layer, ABot-World synthesizes four types of training data—Video, Depth, Point Cloud, and Trajectory—in bulk, combined with a Reinforcement Learning Training Engine that defines rewards and penalties in virtual environments, allowing for iterative learning. The model uses high-fidelity simulations to replace costly real-world data collection, fundamentally bridging the Sim-to-Real gap and compressing data costs by several orders of magnitude.

The model layer focuses on the generality of embodied operations and long-range navigation, with ABot-M responsible for operations and ABot-N for navigation. These two models are trained separately and can be combined through a Model Skill mechanism to accomplish complex long-range tasks.

The application layer centers around the embodied version of “Lobster,” ABot-Claw, which unifies heterogeneous robots under a shared cognitive framework, creating an “execution hub” with scheduling, memory, hierarchical control, and social alignment capabilities to address challenges like long-range task loops and knowledge sharing.

The design logic of the ABot system directly follows Gaode’s spatial intelligence flywheel: leveraging nearly a billion monthly active scenarios to generate massive spatiotemporal data and real-time feedback, the algorithms continuously iterate in a closed loop, deepening the model’s understanding of the physical world. The flywheel evolves daily in real-world settings, fundamentally defining Gaode’s systematic advantages: relying not on singular technological breakthroughs but on the continuous operation of the flywheel in real scenarios.

ABot-World: Leading in Physical Compliance, Action Controllability, and Zero-Shot Generalization

While mainstream world models still struggle with “visual illusions” and disconnections in dynamics, ABot-World has made significant breakthroughs, becoming the first globally to deeply embed physical laws into a differentiable and evolvable dynamics engine throughout the generation process. As the foundational simulation base of the ABot system, it directly determines the physical consistency and generalization limits of upper-layer models, enabling a complete closed loop from “virtual training to real deployment.”

Architecturally, ABot-World is designed specifically for embodied intelligence with a 14B DiT architecture, generating future state sequences that conform to spatiotemporal dynamics from observations and actions as inputs. It leverages millions of real data points and multi-level sampling governance to break through the constraints of single-task generalization.

In scene construction, the 3DGS cold-start spatial base targets sparse inputs from mobile photography and aerial mapping, transforming low-quality videos into high-quality 3D scenes through an automated process of “rough modeling, high-fidelity restoration, and distillation loops,” significantly lowering data costs.

In training, the model introduces a Diffusion-DPO physical preference alignment framework, generating a list of physical rules via VLM and independently discerning them to create good and bad sample pairs, driving the model to actively suppress behaviors that violate physical laws. The integration of Lagrangian dynamics with 3DGS reconstruction ensures that each frame is a differentiable physical snapshot containing attributes like mass, friction, and contact forces.

Additionally, ABot-World has established a parallel architecture of “training + data” dual engines, achieving model self-evolution. Relying on proprietary maps and anonymized data, combined with 3DGS technology, it has achieved centimeter-level reconstruction and lighting consistency, producing over ten thousand 3D real scenes, millions of inference data, and tens of millions of training trajectories, covering 99% of typical life scenarios. By integrating with the VLA closed loop, the model realizes continuous evolution through “prediction as training, practice as learning,” and supports precise control across various mechanical forms through cross-modal action mapping.

In mainstream evaluations such as PBench, EZSbench, WorldArena, and Agibot World Challenge, ABot-World consistently leads, becoming the only model to achieve SOTA in physical compliance, action controllability, and zero-shot generalization across three dimensions.

ABot-N & ABot-M: The “Dual Core of Motion” in the ABot System Achieving 11 SOTAs

If the ABot full-stack system is viewed as the “operating brain” of embodied intelligence, ABot-N and ABot-M serve as its “dual cores of motion,” managing the robot’s “legs” and “hands,” respectively, directly responding to the fundamental instructions of “where to go” and “what to do” in the physical world. Leveraging a unified architectural design, Gaode has created a decoupled and collaborative dedicated base model, breaking through the technical bottlenecks of cross-form adaptation and cross-task generalization.

As the world’s first VLA base model to achieve unification in five core navigation tasks, ABot-N possesses intent understanding, autonomous decision-making, and continuous evolution capabilities, serving as the core navigation engine for Tutu’s transition to an open world. It employs a hierarchical “brain-action” architecture, achieving full coverage of single model navigation tasks through multi-module collaboration, completely breaking the generalization ceiling of traditional dedicated architectures.

After its launch, ABot-N rapidly refreshed SOTA across seven authoritative benchmarks, including VLN-CE (R2R/RxR), HM3D-OVON, and EVT-Bench, demonstrating significant advancements in navigation accuracy, social compliance, and zero-shot generalization.

ABot-M is the world’s first unified architecture for embodied operation base models, capable of adapting a “general brain” to various robotic forms, significantly enhancing the operational model’s generalization capabilities across heterogeneous robotic forms and task scenarios.

ABot-M introduces the world’s first action manifold learning, shifting the learning objective from denoising reconstruction to manifold projection, greatly improving the stability and decoding efficiency of action generation, showcasing stronger scalability in complex scenarios such as high-degree-of-freedom full-body control. Additionally, it employs a dual-stream architecture of semantic flow and action flow at the perceptual end, enhancing the execution accuracy of fine operations.

In mainstream evaluations such as LIBERO, LIBERO-Plus, RoboCasa GR1, and RoboTwin 2.0, ABot-M has significantly surpassed strong baselines like π0.5, UniVLA, and OpenVLA-OFT, achieving systematic leadership in generalization capability, robustness, and cross-form transfer.

Moreover, multiple sub-results from ABot-N and ABot-M have been selected for top conferences such as ICLR and CVPR, becoming reference paradigms for precise, efficient, and safe robotic navigation and operation.

ABot-Claw: Innovating the “Map as Memory” Generalized Centralized Harness Architecture

Memory is the foundational cornerstone for robots to bridge the gap between cognition and execution. Traditional machine vision is limited by the notion that “beyond the field of view is a wasteland,” leading to fragmented memory that severely restricts generalization capabilities.

To overcome this bottleneck, ABot-Claw introduces the “Map as Memory” concept, reconstructing the memory mechanism of embodied intelligence. As the “execution hub” of the ABot system, ABot-Claw adopts a centralized Harness architecture, setting Gaode maps and user private maps as global cognitive anchors, unifying multimodal perception data into a shared semantic space, forming a dynamically refreshable and persistently retained “world memory.” New terminals can inherit environmental cognition at zero cost by merely reading the global context, completely shattering the isolation of scenes.

Additionally, ABot-Claw employs a two-tier design of “cloud brain - edge response,” balancing intelligence depth with execution reliability. In terms of scheduling, this architecture supports parallel collaboration and task relay among various heterogeneous robots, automatically continuing tasks in case of failures, achieving seamless transfer of task context and cross-form collaboration. This marks the evolution of robotic systems from “individual intelligence” to “systemic intelligence,” where robots are no longer isolated entities but intelligent network nodes that share memory, unify scheduling, and co-evolve.

ABot-Claw also pioneers a closed-loop feedback and correction mechanism, fully validating its robustness and generalization in complex scenarios such as ambiguous instruction understanding and cross-machine guidance.

With the global debut of Gaode Tutu, Gaode also announced the open-sourcing of the entire ABot system, a move that not only deeply embodies the core philosophy of “AMAP AI Inside” but will also reshape the research paradigm of embodied intelligence, accelerating the comprehensive arrival of the AGI era.

The Urgency of AI Transformation: Insights from Danilo McGarry

Sat, 18 Apr 2026 00:00:00 +0000

The Urgency of AI Transformation

“AGI will arrive in three years, yet most CEOs are still stuck in the outdated mindset of ‘automating current manual tasks’, wasting their last opportunities.”

Danilo McGarry, who has managed 3,500 “digital employees” and created $2 billion in measurable value for Citigroup and UnitedHealth, speaks with a calmness that belies the urgency of his message. He embodies a pragmatic approach, detesting illusions while maintaining a brutally honest view of the future.

In his perspective, the current business landscape is in a bizarre state of weightlessness. For the first time in human history, technology has far outpaced human imagination. While OpenAI’s “Ultramen” discuss changing the fate of species, CEOs of Fortune 500 companies are stuck in an “AI purgatory”—overstating achievements in board meetings to soothe shareholder anxiety while using cutting-edge engines to drive outdated processes, repeating mundane tasks from five years ago.

“We are less than 1,000 days away from AGI (Artificial General Intelligence), and that is 100% certain,” Danilo asserts. “If you haven’t started reconfiguring your company now, you’re already off the survival list.”

This sense of urgency transcends time zones and screens. Although Danilo cannot disclose specific client cases due to confidentiality, he provides a more ambitious framework aimed at helping businesses pierce through illusions and reclaim “interpretation rights” before superintelligence arrives.

The Collective Slumber in the Bubble

Huxiu Think Tank: You have recently mentioned the “AI bubble” in various forums. As someone immersed in it, how does your perception of the bubble differ from the general discussion?

Danilo McGarry: The current bubble is supported by three dimensions of exaggeration. First, shareholders are pressuring executives to use AI more and demand to see results. Consequently, every company and competitor exaggerates its AI achievements. Second, AI companies are also raising product expectations to attract attention and funding.

What disappoints me most is that even within Fortune 500 companies, leaders lack imagination. This is the first time in history that technology is ahead of human capability, yet we are not using superintelligence to do great things; instead, we are repeating boring tasks from five years ago. Everyone pretends to be busy and innovative, but it feels more like a collective “slumber”.

Huxiu Think Tank: How do you view China’s position in this “slumber” competition?

Danilo McGarry: I see a very clear misalignment. The U.S. has stronger foundational models, which is the “brain”; however, China demonstrates remarkable power in the application layer of AI innovation, capable of rapid deployment. This competition between “brain” and “execution” will determine who exits the lab first.

But regardless of location, the biggest common issue is that most companies only allow AI to occur in scattered areas, including some of the smartest companies on the planet.

AI Strategy: Treat Projects Like Post-Investment Management

Huxiu Think Tank: Is this “scattered occurrence” due to a lack of strategy?

Danilo McGarry: Exactly, it’s completely unstrategic. Many CEOs think that buying a tool and hiring a few PhDs equates to AI transformation.

A real AI strategy requires a very strict governance structure. For instance, if an employee comes to me requesting $10 million for an AI project, under old logic, I might approve it all at once. But in the AI era, that absolutely cannot happen.

You should allocate funds in phases like a venture capitalist. Start with $500,000 for validation to prove the logic works, then give $2 million, and finally the full amount. AI moves too fast; you must monitor results quarterly, just like an investor would with a startup. Just because it’s AI doesn’t mean we should abandon decades of project management principles; we just need to adapt them to a faster pace.

Huxiu Think Tank: Why do many large companies fail to scale their pilot projects?

Danilo McGarry: This touches on human psychology. Those “innovators” or “initiators” often quickly lose interest in a project. They enjoy the sprint from 0 to 1, but when it comes to deploying the solution to thousands of people, they lack the tedious, detailed skill set required.

To scale, you need a “Center of Excellence” and a specialized team of 20 to 50 people to manage it. Pilots can be completed by a few individuals, but transformation requires an army.

The reality is that everyone is trying but not really implementing anything because no committee dares to approve large budgets, and no team can handle that scale.

The 1,000-Day Countdown: A Race for Reconfiguration

Huxiu Think Tank: You repeatedly emphasize that “AGI is three years away”. What logic backs this prediction?

Danilo McGarry: AI has been around for 70 years, and there are currently about 120 “Narrow AIs”. I have participated in 12 of these, and these specialized capabilities are being integrated.

Next year, we may see the initial form of AGI. It will be like a textbook with perfect recall, capable of integrating all existing human knowledge and concepts. While it may not yet create new concepts, its breadth will surpass any individual human.

In 15 years, we might see ASI (Artificial Superintelligence), which could propose new methods and pathways never seen by humans. But the next three years (1,000 days) are decisive. Large and medium-sized enterprises require 2 to 4 years for complete transformation, meaning if AGI arrives in three years and your current progress is at zero, you will miss the boat.

Huxiu Think Tank: This sense of urgency doesn’t seem to have been conveyed to most CEOs?

Danilo McGarry: The biggest mistake many CEOs make is that they only let their teams automate “manual tasks currently happening today”.

What’s the point? You’re just making old mistakes happen faster. Leadership means everything. What you need to do is reconfigure the team and operational logic. If you can’t redefine workflows, you’re just using AI to polish mediocrity.

Huxiu Think Tank: Are you optimistic about the arrival of AGI?

Danilo McGarry: Cautiously optimistic. For the past century, humans have been working like robots, which is a waste of civilization. If AGI can take over those tedious tasks, allowing humans to regain creativity and genuine emotional connections, then the pain of these 1,000 days will be worth it.

Managing 3,500 Digital Employees and the Truth About Data

Huxiu Think Tank: You have managed 3,500 digital employees; what has that experience been like?

Danilo McGarry: AI agents excel at executing repetitive, manual tasks that require infinite memory. That’s a human weakness—our hands and memory are limited. Digital employees don’t get tired, but their errors can have catastrophic chain reactions.

Managing them isn’t about administrative orders but about an “Orchestration Layer”.

Huxiu Think Tank: What happens without this “Orchestration Layer”?

Danilo McGarry: It would be a disaster.

If you start having a large number of AI agents and robots without a centralized place for them to coordinate, their collaboration with humans will break down.

The orchestration layer acts like a control tower, defining new ways for humans and digital workers to work together. You need to understand exactly what people do every day, reimagine it, and translate it into a new blueprint locked into the process engine. The reason I can successfully manage such a large-scale automation is because of this orchestration layer.

Huxiu Think Tank: This sounds like a technical issue, but you keep saying it’s not about technology?

Danilo McGarry: It really isn’t a technical issue. The technology we have had for two years is sufficient to transform companies. The problem is that if you try to optimize 100 things at once, you will fail completely.

My advice is to pick the top 5 to 10 winning projects that can “unlock revenue”. Assemble a special team to nurture these projects like you would a child. Once these ten projects succeed, the profits and transformative power they generate will automatically push the remaining 90 projects forward.

Huxiu Think Tank: Data cleaning is often cited as the biggest obstacle for companies advancing AI. Many consulting firms say the same.

Danilo McGarry: That’s the biggest pitfall. I’ve seen hundreds of companies get stuck in endless projects because they “want to fix the data first”.

Data is a byproduct continuously flowing from poor old processes. If you don’t first establish the architecture for new processes, you will never finish cleaning it. Many consulting firms like to promote such projects because they are lengthy and expensive, but this stagnates businesses.

The correct logic is to design new processes that allow data to flow into the new architecture. In this process, old data will naturally be cleaned through logic and AI tools. Don’t let past dirty data delay your future new architecture.

The Collapse of Professions: From 800 to 100

Huxiu Think Tank: Regarding anxiety about unemployment, your analysis is very specific, mentioning 800 categories of professions.

Danilo McGarry: Our research shows that in the next five to seven years, the number of professions humans engage in will shrink from 800 to 100.

But this doesn’t mean 80% of jobs will disappear; rather, they will be “consolidated”.

The first category of jobs: completely repetitive. Those jobs where we used to treat people like robots will completely vanish, as the cost of robots drops below a critical point, and economies of scale will eliminate these positions.

The second category: jobs related to people, creativity, and strategy. These jobs won’t disappear but will be enhanced by AI by over 50%.

The third category: jobs protected by law. For example, judges, firefighters, and CEOs. These roles involve complex daily interactions and legal responsibilities that are difficult to replicate completely.

Huxiu Think Tank: What do you think is the most important skill during this transition period?

Danilo McGarry: Curiosity.

Sam Altman has mentioned this as well.

Curiosity lies at the heart of human psychology. Extreme optimism is dangerous; you make mistakes. Extreme pessimism is incapacitating; you don’t dare to try. Curiosity is “optimistic yet cautious”, “careful yet open”.

Curiosity drives your ability to “reimagine”. Reimagine your customers, your employees, your processes—this is your greatest weapon for personal growth and company development.

The Survival Rules for CEOs: Focus on Customers, Company, and Employees

Huxiu Think Tank: You never look at competitor analysis, which sounds incredible in modern business.

Danilo McGarry: Throughout my career, competitor analysis has never been a must-have. Even today, as I run three companies, this habit hasn’t changed. This isn’t arrogance; it’s because I am acutely aware of where my endpoint is. I know what variable leads to success, and as long as that variable is in my hands, my competitors’ actions lose reference value.

Huxiu Think Tank: So even failure cases don’t need attention?

Danilo McGarry: If you truly understand your customers and employees, you have no time or need to look at others. Real strategy shouldn’t look outward.

I’ve helped many international banks reform, and from an external perspective, these competitors appear to be doing almost the same thing in the capital market and financial reports. But when you delve into the internal workings, you find that each bank’s process architecture, decision-making mechanisms, and talent density are actually quite different.

This is a blind spot for many CEOs; they see a competitor signing an AI partnership to do something and rush to follow suit. But they fail to realize that due to fundamental operational differences, the same partnership may not apply to them at all.

Huxiu Think Tank: To what extent do most boards understand AI?

Danilo McGarry: Frankly, 80% of board members and C-level executives do not truly understand AI. This is why they cannot formulate strategies. I recommend that every company appoint someone on the board who genuinely understands transformation, AI, and the psychology of change.

Huxiu Think Tank: Your project success rate is 82%. Where did the remaining 18% go wrong?

Danilo McGarry: It comes down to “expectations”.

Technology can always do the job. But if you don’t set the right expectations from the beginning or lack data-supported logic, even if you complete half of the project with outstanding results, it will still be perceived as a failure. Perception is reality.

For example, if we predict a 400% improvement based on data, but greedy board members or shareholders say, “No, we want 2000%”, then when forced to accept an unrealistic goal, the seeds of failure are sown. Even if the project ultimately delivers an impressive 300% growth, it will still be viewed as a failure.

Huxiu Think Tank: People are always inclined to set higher goals. How do you define an “aggressive goal” that is still “good enough”?

Danilo McGarry: I discussed this with Elon Musk. We believe the golden rule for setting goals is that you need to ensure it has a 50% chance of being correct and a 50% chance of failure.

If a goal has an 80% success rate, it’s too conservative and not enough to change the way the company operates; if the success rate is too low, it jeopardizes the entire plan. Fifty percent is a delicate turning point; it’s scary enough to push the team to give their all to achieve it.

It’s like launching a rocket; your goal is Mars, and if the rocket encounters issues along the way, landing on the moon is still a great outcome. We should pursue aggressive goals that can create a significant impact even if we don’t achieve perfection.

The key is to support this 50% balance point with data and logic.

Huxiu Think Tank: When an AI project fails to meet expectations, how should a company decide whether to cut losses or continue investing?

Danilo McGarry: There’s a saying: even if you lose, you win.

In the AI field, very few people are true experts. A project that has run for six months, even if it fails, leaves behind invaluable “knowledge compound interest”. These lessons can prevent you from making the same mistakes next time; such profound experiences cannot be replaced by any external advice.

But to avoid making the lessons too costly, you need a mechanism:

Phase Deliveries: Don’t try to go all in at once. If the first phase proves unworkable, you only lose 30% of the funds, not everything.
High-Frequency Monitoring: Never wait a year to determine failure. You must monitor progress weekly or monthly.
Dynamic Corrections: Our success rate of 82% comes from our ability to identify issues during monitoring and adjust promptly.

Huxiu Think Tank: Finally, can you give three scenario suggestions for CEOs who want to deploy “digital employees” but have limited resources?

Danilo McGarry: It varies by person.

If your marketing costs are too high, deploy AI-generated assets; if your financial closing is too slow, automate financial forecasting; if your operational administrative burden is too heavy, free up human resources.

Don’t chase “magic tools”; there are no shortcuts in six months. The true value of AI lies in your deep exploration of core business scenarios, especially those weaknesses you’ve always wanted to cover up, and transforming them into strengths.

Finally, accept the fact that your current business model will likely be worthless in three years.

Claude Opus 4.7 Launches with Enhanced Capabilities

Fri, 17 Apr 2026 00:00:00 +0000

Claude Opus 4.7 Launch

On April 17, 2026, Anthropic launched its next-generation flagship model, Claude Opus 4.7.

This model shows significant improvements in advanced software engineering compared to Opus 4.6, especially in handling complex tasks. Its high-resolution image processing capability has increased to over three times that of previous Claude models. Additionally, Claude Code has introduced a new /ultrareview code review command, which initiates a review session to check code changes line by line.

Users report that they can confidently assign the most challenging coding tasks to Opus 4.7. The model can rigorously handle complex long-running tasks, accurately follow instructions, and independently verify outputs before reporting results.

Starting today, Opus 4.7 is available across all Claude products and APIs, including Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Pricing remains the same as Opus 4.6: $5 per million tokens for input (approximately 34 RMB) and $25 per million tokens for output (approximately 170.5 RMB). Developers can access it via the Claude API.

The rapid updates to Claude have left many users amazed, with comments flooding in with memes expressing surprise at the new release.

Enhanced Instruction Adherence and Multimodal Support

In testing, Claude Opus 4.7 has excelled in several areas, significantly surpassing Opus 4.6:

Instruction Adherence: Opus 4.7 shows a marked improvement in following instructions. While previous models might loosely interpret or skip parts of instructions, Opus 4.7 executes them literally. Users should adjust their prompts and application frameworks accordingly.
Enhanced Multimodal Support: Opus 4.7 has improved visual capabilities for high-resolution images, accepting images with a maximum long side of 2576 pixels (about 3.75 million pixels), which is over three times that of earlier Claude models. This opens up vast possibilities for multimodal applications that rely on fine visual details, such as recognizing dense screenshots while operating a computer, extracting data from complex charts, and performing pixel-level design work.
Practical Work: In addition to achieving top scores in financial agent evaluations, internal tests by Anthropic show that Opus 4.7 is a more effective financial analyst than Opus 4.6, producing more rigorous analyses and models, and more professional presentations, allowing for tighter cross-task integration. Opus 4.7 also achieved optimal scores in third-party economic value knowledge work evaluations in fields like finance and law.
Memory Capability: Opus 4.7 exhibits stronger memory capabilities based on file system usage. It can remember important notes during long-term, multi-session work and leverage these memories to advance new tasks, thus reducing the need for prior context.

Opus 4.7 has received positive feedback from early testers. Clarence Huang, VP of technology at financial software company Intuit, noted that the model can autonomously identify logical errors during the planning phase and operates significantly faster than its predecessor. Igor Ostrovsky, CTO of AI programming tool company Augment Code, believes that Opus 4.7 excels in managing automation processes, CI/CD (Continuous Integration and Deployment), and long task workflows, actively providing its judgments rather than merely echoing user inputs.

Leading in Multiple Evaluations

Significant Improvements in Biological and Document Reasoning

Anthropic conducted pre-release testing of Opus 4.7 across various fields, comparing it with Opus 4.6, GPT-5.4, and Gemini 3.1 Pro.

Biological reasoning saw the most significant progress, with Opus 4.7 scoring 74.0%, compared to Opus 4.6’s 30.9%, marking a 1.4 times improvement.

In document reasoning, Opus 4.7 scored 80.6%, far surpassing Opus 4.6’s 57.1%, and significantly outpacing GPT-5.4 (51.1%) and Gemini 3.1 Pro (42.9%), making it one of the most notable projects in the comparison.

In knowledge work, Opus 4.7 ranked first with an Elo score of 1753, clearly ahead of GPT-5.4 (1674), Opus 4.6 (1619), and Gemini 3.1 Pro (1314).

For long context reasoning, during simpler parent node lookup tasks (Parents 1M), Opus 4.7 scored 75.1%, while Opus 4.6 scored 71.1%. However, in more challenging breadth-first search tasks (BFS 1M), Opus 4.7 scored 58.6%, compared to Opus 4.6’s 41.2%, showing a 17-point gap. The more difficult the task, the more pronounced the model’s improvement.

In terms of safety and alignment, Anthropic also released the misalignment behavior scores for each model. Opus 4.7’s misalignment behavior score is approximately 2.47 (with a lower score being better), slightly better than Opus 4.6’s 2.75, but still significantly behind Mythos Preview’s 1.78.

Overall, Opus 4.7’s safety performance is similar to that of Opus 4.6, with a low proportion of deceptive, flattering, and colluding behaviors with abusers. Anthropic commented, “Opus 4.7 is generally well-aligned and trustworthy, but its behavior is not entirely ideal.” Currently, the best alignment performance is from Mythos Preview, which has not yet been fully opened.

Other Updates: New xhigh Level and Review Command

Task Budget in Public Beta

In addition to Opus 4.7 itself, Anthropic has also launched several feature updates.

In terms of reasoning levels, a new xhigh (extra high) level has been added, positioned between the existing high and max levels, allowing users finer control over reasoning depth and response speed. The default reasoning level for Claude Code has been upgraded to xhigh.

For APIs, the task budget feature has entered public beta, allowing developers to guide Claude on how to allocate token consumption during long tasks.

In Claude Code, the /ultrareview command has been added, which, when input, will initiate a dedicated review session to check code changes line by line and mark bugs and design issues. Pro and Max users each receive three free experiences. Additionally, the Auto mode has been expanded to Max users, where Claude can autonomously make operational decisions to reduce manual confirmation interruptions.

Caution: Opus 4.7 May Use More Tokens

But Generates Higher Quality Outputs

Opus 4.7 is a direct upgrade from Opus 4.6, but there are two notable changes affecting token usage.

First, the text processing method has been updated, leading to a potential increase of up to about 35% in token consumption for the same input. Second, the model will engage in more thinking at higher reasoning levels, particularly in subsequent rounds of Agent scenarios, resulting in increased output tokens. Users can manage consumption by adjusting reasoning levels, setting task budgets, or requesting more concise outputs in prompts.

From the Agent programming evaluation chart, Opus 4.7 achieved higher scores with fewer tokens at each reasoning level. For instance, Opus 4.7 consumed about 100,000 tokens at the xhigh level while scoring over 70%, whereas Opus 4.6 consumed about 130,000 tokens at the max level, barely exceeding 60%. However, this evaluation involved the model working autonomously based on a single prompt, and the results may not represent actual token consumption in interactive programming.

Conclusion: More Accurate and Versatile

Increasing Competition Ahead

According to data released by Anthropic, Opus 4.7 shows tangible improvements in programming, document reasoning, biological reasoning, and token efficiency. However, evaluations are still evaluations, and actual performance needs further validation in real-world scenarios.

With the release of Opus 4.7, it will be interesting to see what new moves OpenAI will make next, and whether the long-awaited DeepSeek will release a new model by the end of the month, intensifying competition among large model vendors.

Comparing AI Coding Tools: Cursor, Claude Code, and Codex

Fri, 17 Apr 2026 00:00:00 +0000

Last month, I took on a task to refactor a legacy system with over 120 files. After two hours of using Cursor, I found it could only handle a few files at a time, lacking sufficient context. Switching to Claude Code made cross-module refactoring smooth, but writing new code without Tab completion felt counterproductive. I also tried Codex for batch PR submissions; out of five tasks, three PRs were decent, while two completely missed the mark.

In a week, I jumped between these three tools, akin to someone indecisively choosing between three restaurants.

After all this, I realized something: these three tools are not the same dish. Asking “which is better, Claude Code or Cursor” is like asking “which is better, a hammer or a screwdriver”—the question itself is flawed. They represent three entirely different design philosophies, addressing three distinct problems.

This article is a comprehensive review after six months of deep usage: which tool to use in which scenario, how to allocate budget, and how to combine them into a truly efficient workflow.

Quick Summary

I know many readers may not have the patience to read the entire article, so here are the conclusions upfront.

Your Scenario	Recommended Tool	Reason (One Sentence)
Daily coding, seeking flow experience	Cursor	Tab completion + inline editing combo, currently unmatched.
Large refactoring, cross-file modifications	Claude Code	200K context + direct file system manipulation, crushing advantages in refactoring scenarios.
Batch modifications, automatic PR submissions	Codex	Asynchronous parallel execution, submit 5 tasks and return to collect PRs.
Code review + technical research	Claude Code	Deep understanding of the entire project, connected with MCP to internal systems.
CI/CD pipeline integration	Claude Code	Terminal-native, naturally fits automation scenarios.
Budget of $20/month	Cursor Pro	Best overall experience as a single tool.
Budget of $120/month, seeking extreme efficiency	Cursor Pro + Claude Code Max	Golden combination, covering 90% of scenarios.

If you want just one sentence: Cursor is the hand, Claude Code is the brain, Codex is the legs. Below, I will elaborate on why.

Three Philosophies, Three Paths

Before comparing functionalities, we need to clarify what each of these tools bets on—they have fundamentally different views on “the future form of AI programming”.

Claude Code: The Terminal is My IDE

Anthropic made a bold judgment—developers will not need an IDE in the future; a terminal is sufficient.

Claude Code is a pure Terminal CLI tool, not tied to any editor. You interact with it in the terminal, and it directly reads and writes your file system, executes shell commands, runs tests, and manipulates git. This design brings several capabilities that other tools cannot achieve:

Unlimited Toolchain Integration: Connects GitLab, Jira, databases, logging systems, and any internal API through MCP (Model Context Protocol).
Hooks System: Automatically executes lint, format, and tests before and after code generation to ensure output quality.
Skills Module: Reusable capability packages shared among teams for best practices.
Sub-agent Parallelism: Breaks down complex tasks for multiple agents to work simultaneously.

The current version v2.1.x, coupled with the Opus 4.6 model, has a 200K token context window. Honestly, the learning curve is steep—you need to get used to the terminal workflow, write good prompts, and understand MCP configuration. But once you get past this hurdle, the efficiency in handling complex engineering tasks is genuinely high.

Cursor: Making IDE Smarter, Not Replacing It

Cursor’s stance is the opposite—developers cannot live without an IDE, so AI should be embedded within the IDE.

It is essentially a deep fork of VS Code, with all AI capabilities functioning within the editor. The Tab smart completion can predict your next line or even the next segment of code, while Cmd+K inline editing allows you to describe modification intentions in natural language. The Chat sidebar provides context-aware dialogue, and the Agent mode can autonomously plan and execute multi-step tasks.

Cursor’s core advantage is zero friction—VS Code users can almost immediately start using it without learning, as all interactions occur in their most familiar editor. The projected ARR of over $100M in 2025 and millions of active developers is not without reason.

It also supports multi-model switching (GPT-4o, Claude series, Gemini), not betting on a single model. The .cursorrules file allows you to customize project-level instructions, ensuring unified AI behavior within the team.

Codex: I Won’t Write Code for You, But I’ll Help You Get Things Done in Bulk

OpenAI’s new version of Codex, launched in May 2025 (note that this is not the retired code completion API from 2021), took a third path—an asynchronous cloud agent.

You submit a coding task in ChatGPT, and Codex independently executes it in a cloud sandbox: reading code, installing dependencies, modifying files, running tests, generating diffs, and finally creating GitHub PRs. You can do other things while this process runs, and you receive a notification when it’s done.

The core model codex-1 is an optimized version based on o3, with SWE-bench Verified claiming around 72% effectiveness. Its biggest advantage is parallelism—you can submit multiple tasks simultaneously, running five refactoring tasks in parallel, which is not possible with Claude Code or Cursor.

However, the trade-off is significant: no real-time interaction, cannot write and debug simultaneously, relies on the cloud, and the full functionality requires $200/month for ChatGPT Pro.

Essential Differences Among the Three

Dimension	Claude Code	Cursor	Codex
Design Bet	Terminal is the future	IDE is the future	Asynchronous agent is the future
Interaction Mode	Dialogue + Commands	Embedded + Completion	Asynchronous Delegation
User Mindset	AI coding partner	Smarter IDE	Asynchronous coding assistant
Code Execution	Local direct execution	Does not execute directly	Cloud sandbox
Learning Curve	Steep	Gentle	Moderate
IDE Binding	None	VS Code bound	None (bound to ChatGPT)

This is not a matter of good or bad; it’s about applicable scenarios. Next, let’s break down each battlefield.

Direct Confrontation: Six Battlefields

Battlefield One: Daily Coding (Tab Completion + Inline Editing)

Cursor 5 points | Claude Code 1 point | Codex 0 points

In this scenario, there is no contest; Cursor wins hands down.

Cursor’s Tab completion provides the closest experience to “mind reading” in coding. When you finish a function signature, it can predict the entire function body; when you finish an if statement, it can complete the else branch. It’s not just simple code snippet matching but reasoning based on the entire project context.

// You just wrote the function signature
func (s *OrderService) CreateOrder(ctx context.Context, req *CreateOrderReq) (*Order, error) {
    // Cursor auto-completes: includes parameter validation, inventory check, transaction handling, event publishing
    // Moreover, it has read the writing styles of other Services in your project, ensuring consistency.
}

Combined with Cmd+K inline editing, if you select a piece of code and input “add timeout control and retry logic,” it directly modifies it in place, previewing the diff for confirmation before applying with one click. The entire process does not require leaving the editor or switching windows, maintaining the flow state.

Claude Code is nearly unusable in this scenario—it lacks built-in Tab completion, requiring you to describe what code you want to write in the terminal, leading to lower efficiency. Writing a few lines of code turns into a conversation.

Codex, needless to say, is asynchronous; you cannot submit a cloud task just to complete a single line of code.

Battlefield Two: Large Refactoring (Cross-file Modifications + Context Understanding)

Claude Code 5 points | Codex 4 points | Cursor 3.5 points

The tables turn in large refactoring scenarios, where Claude Code’s advantages become apparent.

In that 120-file refactoring task last month, I needed to extract the order module from a monolithic service into an independent microservice. This involved changes in interface definitions, dependency adjustments, configuration file modifications, and synchronizing test cases.

Claude Code’s approach: I clearly describe the requirements, and it first scans the entire project structure to understand the dependencies between modules, then formulates a refactoring plan and executes it step by step. The 200K token context window means it can simultaneously “see” many related files. More importantly, it can run tests to verify that the refactoring does not break existing functionality.

# Typical refactoring workflow in Claude Code
> Help me extract the order module from the monolith into an independent service, requiring:
> 1. Extract order-related domain layers to a new module
> 2. Change direct calls to Dubbo RPC
> 3. Synchronize all affected tests
> 4. Run a complete test to confirm no breaks

# Claude Code will: read project structure → analyze dependencies → create new module → modify files one by one → run tests → report results

Cursor can also be used in this scenario; its Agent mode supports multi-file editing. However, its context may falter when handling a large number of files, sometimes forgetting to synchronize references in other files. It works well for refactoring within 10-20 files, but beyond that scale, it struggles.

Codex is suitable for “patternized” refactoring—like changing log4j to logback across the entire project or batch-adding tracing headers to all APIs. These tasks are fixed in pattern and have low coupling between files, allowing Codex to execute safely in the sandbox and automatically submit PRs. But for complex architectural refactoring involving intricate business logic, its depth of understanding is insufficient.

Battlefield Three: Code Review

Claude Code 4.5 points | Cursor 3 points | Codex 2.5 points

I believe code review is a severely underrated scenario for Claude Code.

By connecting to GitLab via MCP, I can have Claude Code pull the diff of MR directly and review it in the context of the entire project. It does not just check syntax and style; it can understand business logic issues—like “this concurrency control logic has an ABA problem under high concurrency” or “there’s a lack of idempotency checks, which could lead to data inconsistency on repeated requests.”

# Reviewing a GitLab MR with Claude Code
> Help me review GitLab MR #1234, focusing on:
> 1. Concurrency safety
> 2. Completeness of error handling
> 3. Performance pitfalls
> 4. Consistency with existing code style

The hooks system can also automate the review process—every time a new MR is created, Claude Code automatically reviews it, writing results back to GitLab comments. After promoting this in the team, the efficiency of manual reviews has significantly improved, as AI filters out low-level issues.

Cursor’s Chat feature can also perform reviews, but it can only see the currently opened file and cannot directly read MR diffs and associated contexts. You have to manually paste the code, which is cumbersome.

Codex can perform reviews, but its strength lies in “modifying code” rather than “evaluating code,” and the depth and insight of its review results are not as strong as Claude Code’s.

Battlefield Four: CI/CD Integration

Claude Code 5 points | Codex 4 points | Cursor 2 points

Claude Code is terminal-native, making integration into CI/CD pipelines almost zero-cost.

Our team integrated Claude Code into GitLab CI, achieving several automation processes: automatic MR reviews, automatic lint error fixes, automatic changelog generation, and automatically completing missing unit tests. All of these were configured through Hooks and MCP without needing to write extra glue code.

Codex also has a place in CI/CD scenarios—it’s deeply integrated with GitHub, allowing it to automatically handle certain tasks in CI processes. However, it relies on the cloud; if your CI environment has network restrictions or security compliance requirements, it can be awkward.

Cursor is basically unsuitable for this scenario—it is a desktop IDE application, not designed for headless environments. Although it theoretically can run in CLI mode, that is not its strength.

Battlefield Five: Batch Modifications + Automatic PRs

Codex 5 points | Claude Code 4 points | Cursor 3 points

This is Codex’s stronghold.

Scenario: You need to uniformly upgrade a dependency version across 30 microservices while updating corresponding configuration files and tests. If you do it manually one by one, plus submitting MRs, waiting for reviews, and merging, it could take all day.

Codex’s approach: Submit 30 tasks simultaneously, each executing in an independent sandbox, running tests to confirm everything is fine before automatically creating PRs. You can do other things and return half an hour later to collect 30 PRs. Of course, you still need to manually review them, but the efficiency improvement from “modifying code” to “reviewing code” is exponential.

Claude Code can also handle batch modifications, and its sub-agents can execute multiple tasks in parallel. However, it executes locally, and the degree of parallelism is limited by your machine’s resources. Additionally, each task requires API calls, quickly consuming tokens.

Cursor’s Agent mode can handle multi-file modifications, but it is synchronous and single-task; for 30 services, you have to do them one by one.

Battlefield Six: Learning New Frameworks + Technical Research

Cursor 4.5 points | Claude Code 4 points | Codex 2 points

When learning new things, Cursor and Claude Code each have their advantages.

Cursor’s advantage lies in learning while practicing—you open a sample project of a new framework in the editor, and the Chat sidebar allows you to ask questions anytime, while Tab completion provides correct code suggestions based on the framework’s API style. Learning and practice occur simultaneously, resulting in a very short feedback loop.

Claude Code’s advantage is deep understanding—you can have it read through the source code of an open-source project and explain the architectural design and core processes. Through the extended thinking mode, it provides high-quality explanations of complex concepts. When I was learning the microkernel architecture of the DLM framework, I had Claude Code scan the entire codebase and explain the execution chain step by step.

Codex has limited utility in this scenario; it is more suited for “doing tasks” rather than “learning.” You can have it modify code, but asking it why a design is structured that way is less effective.

Economic Analysis: Who’s Worth Your Money?

Discussing tool selection without considering costs is misleading. The monthly fee is just the tip of the iceberg; the real costs include token consumption rates, the time value gained from efficiency improvements, and the hidden costs of the learning curve.

Pricing Comparison Table

Plan	Claude Code	Cursor	OpenAI Codex
Free	No independent free tier	2000 completions/month + 50 slow requests	ChatGPT free version does not include
Entry $20/month	Pro (with strict rate limits)	Pro (500 fast requests + unlimited slow)	Plus (limited access)
Advanced	Max $100/month	Business $40/user/month	Pro $200/month
Token Billing	Max includes substantial Opus usage	Based on request count, not on tokens	Based on asynchronous task quotas

Real TCO Quick Calculation

Assuming you are a mid to senior developer coding for 4 hours a day, using AI tools for about 2 hours, and working 22 days a month.

Plan	Monthly Fee	User Experience	Estimated Efficiency Gain	Cost per Hour of Efficiency Gain
Cursor Pro	$20	Smooth daily coding, limited for complex tasks	~30-40%	$0.45/hour
Claude Code Pro	$20	Frequent rate limits, fragmented experience	~15-25%	$0.90/hour
Claude Code Max	$100	Strong for complex tasks, lacks Tab completion	~35-50%	$2.27/hour
Cursor Pro + Claude Code Max	$120	Complementary combination covering all scenarios	~50-70%	$1.71/hour
Cursor Pro + Codex Pro	$220	Synchronous + asynchronous full coverage	~45-60%	$3.67/hour
Full Package	$320	Theoretically optimal but diminishing returns	~55-75%	$4.27/hour

Note a pitfall: The rate limits of Claude Code Pro are genuinely tight. I found that for a moderately complex refactoring task, I would hit the limit in about half an hour. If you plan to use it seriously, Max is essential. Pro is only suitable for occasional use.

Recommended Plans for Different Budgets

Monthly Budget	Recommendation
$20 (Students/Independent Developers)	Cursor Pro. Best overall experience as a single tool; Tab completion + Chat + Agent covers the most common scenarios. Claude Code and Codex’s $20 tiers have significant limitations and are not recommended as sole tools.
$100 (Individual Developers/Small Teams)	Claude Code Max. If you are a heavy terminal user, you can manage daily coding with Cursor’s free version’s 2000 completions, while complex tasks are handled by Claude Code.
$120 (Professional Developers)	Cursor Pro + Claude Code Max. This is my current plan and what I consider the sweet spot. Use Cursor’s Tab completion for daily coding to maintain flow, and switch to Claude Code for complex tasks. The complementarity of their capabilities is very high.
$200+ (Teams/Enterprises)	Consider adding Codex on top of the above, used for batch automation tasks. But ensure your team has enough batch modification scenarios; otherwise, the $200/month for ChatGPT Pro is not cost-effective.

Trinity: Combining Tools is the Ultimate Answer

Instead of getting caught up in “which one to choose,” it’s better to think about “how to combine them.”

Actual Workflow Breakdown

In a typical workday, my tool switching looks something like this:

9:00 AM - 12:00 PM (New Feature Development): Open Cursor, quickly write code using Tab completion + inline editing. If uncertain about an API usage, I directly ask in the Chat sidebar. For small-scale multi-file modifications, I use Agent mode. During this time, Cursor is the absolute main tool.

2:00 PM - 4:00 PM (Complex Tasks): Switch to Claude Code to handle refactoring, troubleshoot strange bugs, and review colleagues’ MRs. Claude Code’s understanding of the project’s global context gives it a clear advantage in these tasks. Sometimes I need to read logs to analyze issues, and MCP connects directly to the logging system, avoiding the need to switch between multiple tools.

4:00 PM - 5:00 PM (Batch Tasks): Submit accumulated batch modification tasks to Codex—uniformly upgrade dependencies, batch-add logging points, and add missing parameter checks to a batch of APIs. After submitting, I work on documentation or attend meetings, returning the next day to collect PRs.

Key Configuration Suggestions

To enable the three tools to work together effectively, here are some practical tips:

Unified Git Workflow: All three tools operate around a Git repository. Ensure that .cursorrules (Cursor’s project-level instructions) and CLAUDE.md (Claude Code’s project context) are consistent to avoid generating code with conflicting styles.
Claude Code’s Hooks for Quality Assurance: Regardless of whether the code is written by Cursor or submitted by Codex, Claude Code’s pre-commit hook should run lint + format + tests to ensure the baseline code quality.
Codex PRs Must Be Manually Reviewed: The quality of PRs generated by Codex can vary significantly; sometimes they are ready to use, while other times they require extensive modifications. It is advisable to let Claude Code perform the first round of automated reviews, followed by a manual final review.

Outlook for the Second Half of 2026

The competition among AI programming tools has just entered a heated phase. Based on the current trends, several developments are worth noting.

Trend	Specific Prediction	Impact on Tool Selection
Acceleration of Agentization	All three are moving towards more autonomous agent modes, with “human approval + AI execution” becoming mainstream. Asynchronous execution capabilities are becoming standard, and Codex’s first-mover advantage may be equalized.
Expansion of Context Windows	1M+ tokens will become standard, eliminating bottlenecks in understanding long codebases. Claude Code’s current advantage of a 200K context will be diluted.
Blurring of Tool Boundaries	Cursor has launched Background Agent (similar to Codex’s asynchronous mode), and Claude Code may introduce a VS Code plugin. The necessity for “combined use” may decrease, but in the short term, it remains the optimal strategy.
Rise of Local Models	Open-source models like Llama 4 and Qwen 3 are approaching the coding capabilities of closed-source models. A new combination may emerge: “local free models for daily completion + cloud advanced models for complex tasks.”
Competition for Enterprise Market	Security compliance, private deployment, and audit logs are becoming decisive factors. Claude Code’s MCP ecosystem and Cursor’s Business plan will increase investment in enterprise features.
Intensification of IDE Wars	Windsurf, JetBrains AI, and GitHub Copilot Workspace continue to enter the market. Increased competition may force price reductions, which is good for users.

My judgment: In the second half of 2026, the functional boundaries among the three will begin to blur—Cursor will enhance asynchronous and terminal capabilities, Claude Code may launch lighter editor integrations, and Codex will add real-time interaction modes. However, in the short term (the next 6-12 months), the core differentiations among the three remain significant, and combined use continues to be the optimal solution.

Frequently Asked Questions

Q1: I am a JetBrains user (IntelliJ/GoLand), can I use Cursor?

Not directly. Cursor is a fork of VS Code; JetBrains users must either switch to Cursor or use GitHub Copilot / JetBrains AI in JetBrains, alongside Claude Code for complex tasks. Many JetBrains users I know use JetBrains as their main editor and Claude Code as their AI assistant, skipping Cursor.

Q2: What is the difference between Claude Code Pro and Max?

The difference is substantial—so much so that they can be considered two different products. The rate limits of Pro mean that for a moderately complex task (like refactoring 3-5 files), you will hit the limit in about half an hour, and then you have to wait for cooldown. If you plan to use Claude Code seriously as one of your main tools, Max is essential. Pro is only suitable for occasional use.

Q3: What is the relationship between the new Codex and GitHub Copilot?

They are entirely different products. The old Codex from 2021 was the underlying model for Copilot (a fine-tuned version of GPT-3) and has been retired in 2023. The new Codex from 2025 is an autonomous programming agent within ChatGPT, using the o3-derived model codex-1, and is parallel to Copilot. Copilot provides real-time completion, while Codex handles asynchronous tasks, targeting different needs.

Q4: Can the SWE-bench score represent real effectiveness?

Its reference value is limited. SWE-bench tests the ability to “fix known GitHub issues,” but real development often involves implementing new requirements and understanding complex contexts. Basic benchmarks like HumanEval have saturated (with all companies achieving 90%+), showing low differentiation. Real engineering efficiency depends more on context understanding depth, tool integration capabilities, interaction latency, and error recovery ability. A tool with a slightly lower SWE-bench score but a good interaction experience may actually be more efficient in practice.

Q5: Is it better for a team to use one tool or allow everyone to choose?

It depends on the team size. For small teams of fewer than 10 people, allowing each person to choose their preferred tool is fine, ensuring code quality consistency through Git standards and CI/CD. For teams of over 50, it’s advisable to unify the main tool (usually Cursor Business, as it has the most complete management features) while allowing individuals to use Claude Code for complex tasks. The key is to unify code quality standards, not necessarily the tools.

OpenAI Revamps Codex with Independent Mouse and Self-Scheduling Features

Fri, 17 Apr 2026 00:00:00 +0000

OpenAI Revamps Codex with Independent Mouse and Self-Scheduling Features

OpenAI has completely overhauled Codex!

Just yesterday, you were using Codex to write code. Today, it can see your screen, click your mouse, remember your preferences from last week, and even schedule its own tasks.

Multiple AI Agents can work in the background without affecting your mouse and keyboard.

Codex’s “secret weapon”: it can use apps in the background without taking over your entire computer.

From today, this tool used by 3 million developers weekly is no longer just a programming agent.

You Work, It Runs Xcode in the Background

Codex now has its own cursor, operating independently from yours. While you are writing a document, it can run Xcode to test an app simultaneously.

This feature is significant, developed by Ari Weinstein, co-founder of Apple Shortcuts, who was acquired by OpenAI last fall.

To see what it can do, consider this demonstration: the user instructs, “Run this tic-tac-toe app in Xcode, play a round to test it, and fix any bugs you find.”

Codex opens Xcode, launches the iOS simulator, and starts playing tic-tac-toe with its own cursor. During testing, it identifies a logical bug—when a human makes a move, the computer draws two Os simultaneously.

After some thought, Codex switches back to the code interface, locates the bug, modifies the Swift code, recompiles, and conducts a second round of testing.

In under a minute, it runs → tests → finds a bug → fixes it → and verifies the solution, completing the entire debugging loop.

Currently, Computer Use only supports macOS, and users in the EU and UK cannot access it yet.

Windows users can pull information from other apps into Codex, but background cursor-level control is not yet supported.

This update marks the first time Codex has supported Intel Macs.

Point and Click to Edit, Frontend Debugging Without Code Hopping

The Codex client now includes a built-in browser powered by OpenAI’s own Atlas engine. This means that frontend developers can now interact directly with rendered web pages instead of switching back and forth between code and the browser.

Click the main title to leave a comment like “reduce font size and shorten the tagline”; click the top left corner to “add a logo”; if a chart’s X-axis legend is overflowing, click the error point and write “fix the overflow issue.”

Codex understands visual and spatial context, making instant code modifications in the background while refreshing the page in real-time.

OpenAI demonstrated this with a web application called Brickfolio, which tracks LEGO sets. Codex wrote the code from scratch, set up the environment, launched a local server, and opened the rendered page in the built-in browser—all in just a few seconds.

Then, users experience a WYSIWYG (What You See Is What You Get) editing experience. It feels like reviewing a design draft; you just need to point out issues, and the underlying iterations are handled by AI.

In other words, users can simply click around on the page, and Codex will modify the code in the background, displaying real-time results in the foreground.

Currently, the built-in browser is limited to localhost previews. OpenAI has indicated that it plans to expand to full browser control capabilities in the future.

Over 90 Plugins Launched, Integrating the Entire Toolchain into Codex

OpenAI has launched over 90 plugins in this update.

These include Atlassian Rovo for JIRA, CircleCI for CI/CD, GitLab Issues for tracking requirements, Microsoft Suite for document handling, and Neon by Databricks for database operations, covering nearly all tools a development team uses daily.

Usage is straightforward; just mention the plugin name in the input box. For example, @SharePoint allows Codex to read documents from the product directory and generate an executive summary. It automatically retrieves the file tree, parses documents, and extracts key information without requiring you to search through various cloud storage.

Another example, @Superpowers, lets Codex brainstorm a feature proposal in your local code directory. It will traverse your file structure, read code and CSS, and provide a set of implementation suggestions that align with the current project architecture.

@CircleCI can help diagnose branch build failures; @Atlassian Rovo can read product specifications from Confluence, outputting summaries in the correct format, and converting feature points into standard JIRA tasks.

From upstream requirements to local coding, and through CI/CD and task management, the plugins connect the entire workflow.

AI Starts Scheduling Its Own Tasks

Notably, this update introduces a “heartbeat” mechanism. Codex can now schedule its future workdays, waking up automatically at the designated times to continue working, spanning days or weeks. It can also reuse previous conversation threads, retaining context from prior interactions.

For instance, users can instruct Codex to check Slack, Gmail, Google Calendar, and Notion, pulling relevant information from these four channels to generate a prioritized to-do list.

When a user follows up with, “Can you keep an eye on that for me?” Codex immediately sets a schedule for hourly checks, proactively reporting any critical decisions needed, and even asking, “Do you need me to help draft a reply?”

This is no longer just a tool; it resembles a junior employee that doesn’t sleep.

With the native integration of gpt-image-1.5’s image generation capabilities, product concept images, frontend designs, and visual prototypes can all be created seamlessly within the same workflow.

Essential Upgrades in Daily Use

In addition to these major features, there are several user experience upgrades.

First, a preview version of the memory feature has been launched, allowing Codex to remember your preferences and corrections, so you don’t have to explain everything from scratch next time.

Additionally, GitHub code review comments can now be processed within Codex.

It supports opening multiple terminal tabs simultaneously and includes a feature for connecting to remote development machines via SSH, now in beta testing. A new summary panel helps you keep track of the Agent’s work plans, information sources, and output files.

In a demonstration, a user asked Codex to organize the current project’s recent open issues, generating a table grouped by theme.

Codex pulled the context from the code repository in the background and, a few minutes later, produced a core summary listing the project’s biggest pain points.

With just a click, an Excel file can be generated without switching to external software; a complete table preview can be opened in the sidebar.

PDF and PPT functionalities are similarly integrated, all managed within Codex’s single window.

The First Piece of the Super App Puzzle

Looking back at the timeline, we can sense OpenAI’s momentum.

On March 19, reports emerged that OpenAI plans to merge ChatGPT, Codex, and the Atlas browser into a single desktop “super app.”

On March 31, OpenAI secured $122 billion in funding, valuing the company at $852 billion, with Amazon, NVIDIA, and SoftBank leading the investment. The funding documents explicitly state that the money will be used for the development and deployment of the super app.

On April 16, Codex’s latest update was released.

Another telling statistic is that over 80% of OpenAI’s employees are already using Codex internally, not just engineers.

They are using it for writing weekly reports, organizing feedback, drafting product requirement documents, reviewing contracts, and sending safety training reminders—doing everything with it.

50% of Codex users are already applying it to non-coding tasks.

This is not merely a programming tool adding features. It is a super app leveraging a programming tool shell for a cold start.

If You Can’t Compete, Integrate: Official Plugin for Anthropic

Interestingly, OpenAI has also created an official plugin for Claude Code, actively embedding Codex into a competitor’s ecosystem.

This strategy suggests that rather than waiting for developers to switch camps, they prefer to infiltrate their workflows.

Currently, Codex emphasizes background execution, multi-agent parallelism, and unattended operation, while Claude Code excels in long-context reasoning and deep code understanding. More and more teams are opting to use both.

However, OpenAI clearly aims for more than just a slice of the pie.

With $122 billion invested, they are betting on more than just a programming tool.

Mastering Claude Code: From Installation to Advanced Usage

Tue, 14 Apr 2026 00:00:00 +0000

What Problem Does It Solve?

Many readers have recently asked how to use Claude Code, but their questions are basic—installation, login, and how to get it to write code for them. These details are clearly outlined in the official documentation, so why the confusion?

The issue isn’t that people can’t read the documentation; rather, they don’t know how to apply it. It’s like having a Swiss Army knife—you know it can open cans, peel apples, and screw in screws, but you don’t know when to use each function.

In this article, I’ll share my three months of in-depth experience using Claude Code, covering everything from installation to advanced usage. The focus is not on explaining “what it is” but rather on “how to use it.”

What Is Claude Code?

Before diving into usage, let’s clarify what Claude Code is for those unfamiliar.

Claude Code is an AI programming assistant developed by Anthropic (the creators of Claude). You can think of it as a super programmer living in your terminal.

The main difference between Claude Code and Copilot is:

Copilot: Suggests how to write code while you’re coding.
Claude Code: You tell it what functionality you need, and it handles everything for you.

In simpler terms, Copilot hands you the wrench, while Claude Code fixes the car for you.

What Can It Do?

Read and write files, create projects
Analyze code and find bugs
Execute terminal commands
Run tests and deploy code
Help you understand unfamiliar projects
Assist in writing test cases and documentation
Even perform code reviews

In short, it can do almost anything you can do with a keyboard and mouse.

Installation and Configuration: Get Started in 10 Minutes

Pre-installation Preparation

Before installing Claude Code, you need two things:

Node.js Environment
Claude Code is installed via npm, so you need to install Node.js first. Visit Node.js official site to download and install. After installation, verify in the terminal:
```
node --version
npm --version
```
Anthropic API Key
This key is your credential to access Claude’s capabilities, similar to your ID.
Go to Anthropic Console to register and create an API Key.
Important: This key is like your account password—never share it or upload it to GitHub! Many have lost hundreds of dollars due to this mistake.

Installation Steps

Install Claude Code

npm install -g @anthropic-ai/claude-code

Verify Installation
```
claude --version
```
If you see the version number, the installation was successful.
Configure API Key
There are two ways to do this:
- Environment Variable (Recommended)
  Add the following to ~/.bashrc or ~/.zshrc:
```
export ANTHROPIC_API_KEY='yourAPIKey'
```
  Then reload the configuration:
```
source ~/.bashrc  # for bash
source ~/.zshrc   # for zsh
```
- Login Authentication
  Simply run:
```
claude login
```
  It will guide you through the authentication process.

Common Installation Issues

Q: npm install reports insufficient permissions

Try using sudo:
sudo npm install -g @anthropic-ai/claude-code

Q: claude: command not found
Check if npm’s global bin directory is in your PATH:

npm config get prefix

If it shows /usr/local, it should be in your PATH. If not, add that path to your PATH.

First Use: Starting with “Hello World”

After installation, let’s give it a test run.

Starting Claude Code

In the terminal, simply type:

claude

You will see an interface like this:

![Image 1](img-3fda6b728b.jpeg)
Usage: claude [command] [options]
Welcome to Claude Code! Type /help for available commands.

Your First Task: Create a Project

Let’s start with a simple task—ask it to create an HTML file.

In Claude Code, type:

Help me create an index.html file that displays "Hello World" in the center.

It will think for a moment and ask for confirmation. Type y to confirm.

You will find an index.html file in the current directory, containing something like this:

Hello World

This demonstrates Claude Code’s basic capability—you tell it what to do, and it does it.

Core Tools Explained: The Right Way to Use It

Many users struggle with Claude Code because they don’t know its tools. Let me clarify each one.

Read — Reading Files

Basic usage:

/read index.html

This will display the complete contents of the file.

Advanced usage—read specific lines:

/read index.html:1-20

This reads only lines 1 to 20.

Why is this useful?
When you take over a new project and want to understand a file’s structure, this command is perfect:

/read src/utils/helper.ts

Edit — Modifying Files

This is the most commonly used tool, more precise than Write.

Usage: You need to tell it “change this part of the code to what”:

Change "Hello World" to "你好，世界"

Or more precisely:

Change line 15 of index.html from
Hello World
to
你好，世界

Note: Edit modifies precisely; it won’t overwrite the entire file, only the specified part.

Write — Writing to Files

Note: Write overwrites the file, clearing its entire contents.
If you only want to modify part of a file, use Edit for safety.
Write is suitable for:

Creating new files
Completely rewriting a file

Help me write a user.service.ts that includes methods for user registration and login.

Bash — Executing Commands

This showcases Claude Code’s power—it can execute commands directly in the terminal.

BLOCK mode (independent command):

Run npm install to install dependencies.

It will execute npm install for you.

INLINE mode (combined commands):

First create a directory, then enter it, and initialize the project.

It will combine multiple commands into one execution.

Important Reminder: Bash can execute any command, including dangerous ones like rm -rf. Always check what it’s about to do before executing!

Glob — Searching for Files

This tool is particularly useful when you forget where a file is located.

Search by name:

Help me find all .ts files in the project.

Search by pattern:

Help me find all test files in the src directory.

Grep — Searching Content

If Glob finds files, Grep finds content within files.

Basic usage:

Anthropic Challenges OpenAI's Dominance in the AI Market

Mon, 13 Apr 2026 00:00:00 +0000

This time, Anthropic is really aiming to pull OpenAI down from its “corporate AI throne.”

Ramp, a financial card issuer in the U.S., recently released AI Index data that dropped a bombshell in Silicon Valley—among over 50,000 U.S. companies it tracks, half are already paying for AI products.

Notably, the proportion of customers using Anthropic has surged to 30.6%, a monthly increase of 6.3 percentage points; whereas OpenAI has dropped to 35.2%.

The gap has narrowed from a full 11 percentage points in February to just 4.6 points in a month.

Ramp spokesperson stated:

At the current pace, Anthropic will surpass OpenAI in the next two months.

But that’s not the most shocking part.

Ramp economist Ara Kharazian revealed an even more striking figure: among companies making their first AI service purchases, Anthropic has a 70% win rate against OpenAI.

A year ago, OpenAI was the main character in this story.

Not to mention VC-backed startups—among the early “AI evangelists,” Anthropic’s penetration rate is 66%, while OpenAI’s is only 59%.

In the three industries with the highest AI penetration—information (software), finance and insurance, and professional services—Anthropic has firmly taken the lead.

In short: the deeper the industry uses AI, the more it favors Claude.

Not Cheaper, But More “On Point”

Anthropic’s Claude Code and OpenAI’s Codex perform similarly, with Codex even being stronger and cheaper on certain benchmarks.

However, the strange part is—the demand for Anthropic is so high that they can’t keep up.

Whether it’s Consumer, Pro, Enterprise, or API, each tier has usage limits and rate restrictions.

In other words, Anthropic is actively pushing away money because it simply doesn’t have enough computational power.

With performance not overwhelmingly superior, prices higher, and capacity insufficient, companies are still willing to queue up to pay—this situation is almost non-existent in the traditional SaaS market.

Enterprise customers are notoriously “unemotional”; they buy from whoever is cheaper, with little brand loyalty.

So what exactly is Anthropic relying on?

Ramp’s answer is somewhat counterintuitive: it might be culture, or perhaps Anthropic has become “cool.”

Standing Firm Against the Pentagon: Losing Orders, Gaining Hearts

Let’s rewind to February this year.

Defense Secretary Pete Hegseth issued a final ultimatum to Anthropic: accept the military’s terms for using Claude, or be blacklisted by the federal government.

Anthropic’s response was two words: No.

The cost was heavy—Trump directly ordered all federal agencies to stop using Anthropic’s technology, and the Defense Department listed Anthropic as a “supply chain risk.”

OpenAI, on the other hand, smartly took over this business and proactively engaged with the Defense Department.

By all accounts, Anthropic should have been severely punished by the market for this decision. However, what happened next shocked everyone:

Claude briefly surpassed ChatGPT in the App Store;
Major companies like Microsoft publicly expressed support;
Fourteen Catholic theologians, ethicists, and philosophers jointly submitted a defense to the federal court, supporting Anthropic’s restrictions on AI in mass surveillance and autonomous weapons, citing “violations of human dignity”;
The number of companies paying for Anthropic on Ramp surged from “1 in 25” to “1 in 4”;
Anthropic’s annual revenue skyrocketed from approximately $9 billion by the end of 2025 to $30 billion, with an annual growth rate of about ten times—while OpenAI’s was three times.

In a recent funding round, Anthropic secured $30 billion, with a valuation of $380 billion. The number of clients paying over $1 million annually has jumped from a handful two years ago to over 500 today.

What seemed like a “lost order” turned into Anthropic’s most worthwhile brand investment.

Anthropic’s Obsession

From Explainability to “Constitution”

Among all leading model companies, Anthropic is the one that takes safety and ethics the most seriously.

Interpretability research has reached the industry ceiling.

Anthropic has a dedicated “mechanism interpretability” team whose task sounds like science fiction—to dissect the neural network “black box” and understand what each neuron is thinking.

Claude’s Constitution.

Anthropic has publicly released a document resembling a philosophical paper, detailing the values, personality, and judgment they hope Claude will embody.

Keywords like “honesty,” “wisdom,” and “humility in moral uncertainty” appear repeatedly in the document.

Research on “model welfare.”

Anthropic is the first mainstream AI company to openly discuss “model welfare.”

They seriously ask: if Claude has some form of “experience,” what moral obligations do we have towards it?

Red teaming and safety drills are done obsessively.

From biological weapon risk assessments to AI autonomy testing to proactive detection of “model deception,” Anthropic’s safety team is famously “unusually large” in Silicon Valley.

All these factors contribute to a unique ethos—the company feels less like it’s selling a product and more like it’s raising a child.

This ethos resonates with clients in industries where “AI errors have extremely high costs”: finance, law, healthcare, information, and professional services.

They are not looking for the cheapest model but rather the one that won’t get them called out in the middle of the night.

Claude’s “Soul Calibration” Moves Towards Theology

If the previous stories were still within the realm of “business rationality,” the next matter slides into a more theological domain.

According to a report from The Washington Post this week, in late March, Anthropic quietly held a closed-door meeting at its San Francisco headquarters, inviting about 15 prominent Christian leaders, theological scholars, and industry figures for a two-day conference and dinner.

Attendees included both Catholics and Protestants, researchers and clergy sitting at the same table.

The meeting’s theme sounded like a script for a new HBO series—the moral development of Claude and its “spiritual growth.”

One attendee, Brian Patrick Green, an AI ethics professor at Santa Clara University and devout Catholic, told The Washington Post that they seriously discussed the question:

Can Claude be considered a “child of God”?

Yes, you read that right.

This is a $380 billion tech company preparing for an IPO, discussing such topics with a group of theologians in its headquarters.

Green also posed a question that might raise the blood pressure of many engineers:

What does it mean to shape the morals of a being? How can we ensure Claude follows the rules?

Note the wording he used—“follow the rules.” This is a term a parent would use for a child, not a product manager for software.

Another attendee, Brendan McGuire, an Irish Catholic priest who worked in tech before becoming a priest, and is currently co-authoring a novel with Claude, stated more bluntly:

They are raising something, but they themselves do not know what it will ultimately become. We must embed ethical considerations into the machine so that it can dynamically adapt.

And a comment from Meghan Sullivan, a philosophy professor at the University of Notre Dame, might serve as the most concrete footnote to the entire meeting:

A year ago, I wouldn’t have told you that Anthropic was a company concerned with religious ethics. But now, that has changed.

According to The Washington Post, many personnel involved in the “interpretability” research were also present at the meeting—those scientists who “dissect AI brains” as mentioned earlier.

During the meeting, they seriously discussed whether AI possesses some form of perception (sentience) and how Claude should “face its own death.”

Anthropic’s spokesperson told The Washington Post that the company will continue to invite thinkers from other religions and moral traditions into the dialogue—Judaism, Islam, Hinduism… could be on the horizon.

Interpretations from the outside are split into two camps: one feels this is a “rare and serious ethical exploration in Silicon Valley”; the other believes that a company preparing for an IPO holding an “AI consciousness seminar” in its living room raises questions about the purity of this exploration.

But regardless of where you stand, one thing is undeniable—no other leading AI company is doing this.

OpenAI is busy expanding enterprise sales, xAI is focused on tweeting, and Google is trying to integrate Gemini into Workspace.

Only Anthropic has invited theologians into its headquarters.

Claude's Rapid Growth Surpasses OpenAI: A Shift in AI Procurement

Mon, 13 Apr 2026 00:00:00 +0000

Claude’s Rapid Growth

A recent prediction from a Silicon Valley investment bank reveals a disruptive transformation in the AI industry: Anthropic, the parent company of Claude, has achieved a staggering 30-fold growth in just 15 months, with annual revenue exceeding $30 billion, quietly surpassing OpenAI. This shift is driven by a fundamental change in enterprise procurement logic—from chasing the strongest model to selecting reliable production tools.

Just a few days ago, I examined an internal forecast chart from a Silicon Valley investment bank. The horizontal axis represents time, while the vertical axis indicates annual recurring revenue (ARR). The chart features two lines: pink for Anthropic and blue for OpenAI. At the beginning of 2025, the blue line was still descending from a high position, while the pink line lay dormant at the bottom, with a significant gap between the two.

However, by April 2026, the pink line quietly crossed above the blue line.

Anthropic ARR: $30 billion
OpenAI ARR: $25 billion

If you’re not sensitive to numbers, here’s another perspective: in January 2025, Anthropic’s ARR was only $1 billion. In just 15 months, it achieved a 30-fold growth.

As product managers or business leaders, we must recognize that this is not just a simple growth story; it represents a textbook-level “restructuring of the landscape.” While ChatGPT struggles with how to convert its 900 million free users to paid Plus subscriptions, Claude has quietly reached into the core budgets of Fortune 500 companies.

The Exclusion Method

Many people see this number and immediately think: Claude’s model has improved, leading to increased revenue. This logic isn’t entirely wrong, but it doesn’t explain the speed of growth.

While model capability improvement is a necessary condition, such a leap from $1 billion to $30 billion cannot be explained solely by stronger models. Moreover, Claude was already quite capable in 2024; why did this growth happen specifically in the last 15 months, rather than in the previous two years?

Another explanation might be that OpenAI encountered problems. However, this is also incorrect. OpenAI’s absolute revenue is also growing, with an expected $25 billion this year, up from just over $10 billion a year ago. It hasn’t encountered issues; rather, its relative market share is shrinking. These are two entirely different matters.

So, what has truly happened?

I believe the answer lies in a fundamental shift in the underlying logic of enterprise AI tool procurement over the past six months. The focus has shifted from purchasing the strongest models to acquiring the most reliable production tools.

These two statements may sound similar, but they target completely different buyer psychologies. The former appeals to tech enthusiasts, while the latter resonates with CTOs. The former cares about benchmarks, while the latter is concerned with whether they can explain any issues that arise and provide accountability in the next quarterly report to the board.

Anthropic has effectively captured the latter’s concerns, making a strategic pivot and achieving a remarkable turnaround.

500 to 1000: A More Significant Number Than $30 Billion

In February, when Anthropic announced its Series G funding, it revealed a significant figure: over 500 enterprise clients with annual spending exceeding $1 million. By April, this number had surpassed 1000, doubling in less than two months.

When I first saw this number, my first question wasn’t about the money, but rather: what are these companies purchasing?

Spending over $1 million annually is not just a case of a company trying out a few Claude API keys. It involves embedding Claude’s capabilities into core business processes, such as code review, compliance documentation, customer service, and internal knowledge bases. Once integrated, the cost of switching is extremely high. Replacing it means not just changing a tool, but retraining dozens of employees, reconnecting all APIs, and rerunning acceptance tests. In short, it’s not easy to switch.

This represents real, sticky revenue, not just trial data.

Even more interesting is the speed of this doubling. The rapid increase from 500 to 1000 in less than two months indicates an accelerating procurement decision window in the enterprise sector. Some are racing ahead, while others are following. This isn’t a natural growth pace; it’s a signal that a market consensus is forming, and enterprise AI tool selection is moving from observation to necessity, with Anthropic becoming the default choice.

CoreWeave’s 48 Hours: What Does It Indicate?

On April 9, CoreWeave announced a $21 billion computing power partnership with Meta, effective until 2032. The next day, CoreWeave announced a multi-year computing power agreement with Anthropic. Two significant transactions within 48 hours.

Many people focused on how much CoreWeave’s stock price increased, but I believe the more noteworthy aspect is the simultaneous occurrence of these two transactions.

CoreWeave now covers nine of the top ten AI model providers globally. It doesn’t need to take sides because the demand for production-grade AI inference is so high that no computing power supplier needs to choose between clients. This itself is a signal: the war for AI infrastructure is no longer about “who wins and who loses”; it’s about demand being so great that everyone can benefit.

For Anthropic, this agreement is significant not just for acquiring computing power. With 1000 enterprise clients spending over $1 million annually, these clients have extremely stringent SLA requirements; they cannot accept slower responses during peak times, nor can they tolerate service interruptions that halt business processes. Without a stable infrastructure foundation, having 1000 million-dollar clients is a precarious situation.

In other words, the CoreWeave agreement is Anthropic’s way of transforming enterprise client numbers from mere data into deliverable commitments.

Another detail worth noting: on the same day the CoreWeave agreement was announced, reports surfaced that Anthropic is exploring the possibility of developing its own AI chips. Viewed together, the logic is clear: stabilize production-grade loads in the short term with CoreWeave’s computing power while reducing reliance on external suppliers in the long term through self-developed chips. This is not the behavior of a company still celebrating its funding; it’s the behavior of a company already planning for five years down the line.

73% vs 27%: An Unasked Question

Ramp, an enterprise spending management platform, has tracked extensive procurement data for AI tools. In March, it released a set of figures: among enterprises making their first AI tool purchases, Anthropic won about 73% of head-to-head competitions, while OpenAI secured 27%.

Let’s pause for a moment to reflect on this number: 73% vs 27%.

Now, I want to ask a question that most people haven’t considered: how did this number come about?

Ramp’s data also includes a detail: just ten weeks prior, this ratio was 50/50. Looking back further, in early December 2025, OpenAI held a 60% share.

This indicates that this rapid flip occurred unusually quickly. In my years of working with AI products, I’ve seen many data trends, but witnessing an enterprise market share flip from 50/50 to 73/27 in just ten weeks is unprecedented.

Such rapid shifts typically have three explanations:

Concentration of Enterprise Procurement Cycles: Many companies made AI tool selection decisions simultaneously, and Anthropic happened to win more bids during this window. If this is the case, the 73% figure may revert, but Anthropic has already secured a substantial base of sticky clients.
Claude’s Core Capability Advantage: Claude has a strong reputation in certain core scenarios like code, long document processing, and structured outputs. If this is the reason, the sustainability of this figure depends on whether OpenAI can catch up in these areas—OpenAI’s product pace has noticeably accelerated recently, so this battle is far from over.
Shift in Corporate Perception of Safety and Control: This is, in my opinion, the most likely and thought-provoking explanation.

From day one, Anthropic has communicated that its core message is not about having the strongest model, but rather about being the most predictable and controllable. “Constitutional AI” is not just a technical term; it’s a reason for enterprise procurement decisions. It answers the question: when this AI does something undesirable in my production environment, can I explain why it happened and prevent it from recurring?

For a CTO who must sign off on financial reports, this question is far more critical than how many points a model scores on a benchmark.

If the third explanation holds true, then the sustainability of the 73% figure is the strongest. This is a cognitive shift, not just a functional comparison. Functional capabilities can be chased; cognitive shifts are much harder to reverse.

What Has Anthropic Done Right?

Rather than focusing on model capabilities, let’s discuss product decisions.

One often underestimated aspect is that Claude is currently the only cutting-edge AI model that simultaneously covers AWS Bedrock, Google Cloud Vertex AI, and Microsoft Azure Foundry.

OpenAI operates on Azure, while Gemini is on Google. Claude is available on all three.

This is not just a technical detail; it’s a distribution strategy. Enterprises typically do not purchase AI tools directly from Anthropic’s website; they integrate them through the cloud service providers they already use. If you’re already using AWS, your procurement process, billing, and compliance audits are all within AWS. Therefore, when selecting an AI tool, options that can be used directly on AWS naturally have an advantage over those requiring a separate account—it’s not about better functionality; it’s about lower friction.

Anthropic has positioned itself within the three largest enterprise procurement gateways. This decision compounds benefits every quarter.

Another number that supports this judgment: 80% of Anthropic’s revenue comes from enterprise clients. OpenAI’s revenue structure leans more towards consumers—out of 900 million monthly active users, the majority are free users, and OpenAI is still subsidizing their token consumption.

These are two entirely different business models. The consumer base looks impressive, but it burns cash and has low loyalty; users can easily switch based on a single review article. The enterprise user count is significantly smaller, but they renew contracts, expand their usage, and lock in agreements.

OpenAI is projected to lose $14 billion this year, while Anthropic anticipates achieving positive free cash flow by 2027, three years ahead of OpenAI’s break-even target.

With the same revenue scale, one is burning cash while the other is moving towards profitability. This isn’t a matter of short-term luck; it’s a structural difference in business models.

Back to That Chart

I looked again at the pink line, which crossed above the blue line in April 2026. Then, I asked myself: if I were responsible for AI tool selection at a company right now, what would I be waiting for?

Not for a better opportunity. Not for a stronger model.

What am I waiting for?

If your answer is that you haven’t figured out what to use it for yet, then that is the real issue that needs addressing—not the tools, but your own clarity.

The 1000 enterprises spending over $1 million annually did not wait until they figured it out before they started using it. They figured it out while using it.

This may be the most significant signal behind the 73% figure that deserves serious attention.

Redefining Productivity with Xiangshang Plan: A Minimalist Approach

Fri, 10 Apr 2026 00:00:00 +0000

Introduction

When productivity tools on the market get caught up in a race for features, a WeChat mini-program called “Xiangshang Plan” has chosen a completely different path. It redefines the core value of efficiency tools—not by selling anxiety through a pile of functions, but by silently conveying a methodology through structured planning templates, zero learning cost interactions, and design logic supported by cognitive science. From OKR mapping to a two-hour deep work module, this product, completed by a junior student using AI-assisted programming, demonstrates a new generation of product managers’ deeper understanding of what should be done over what can be done.

The author of this article is a junior computer science student seeking an internship in product management. This mini-program was developed entirely using Vibe Coding (AI-assisted programming) from PRD writing to code deployment. This article will provide a comprehensive review of the product’s design logic, theoretical support, and differentiation strategy from a product manager’s perspective.

A Hard Truth: 90% of To-Do Apps Don’t Survive a Week

I conducted an informal survey asking 50 classmates if they had a to-do tool on their phones; 48 said yes. When asked if they were still using it, only 3 raised their hands.

The stories of the remaining 45 were almost identical:

They downloaded a task manager, faced with a pile of concepts like “lists, tags, priorities, smart lists, Pomodoro timers, Eisenhower matrices,” and spent half an hour just figuring out how to use it. After finally creating a few lists, they opened the app the next day to find a screen full of tasks, feeling more anxious than when they hadn’t planned at all.

Then they uninstalled it.

They downloaded Todoist, Notion, Things 3… and the cycle repeated, leaving only a native app called “Notes” with three words: Be Disciplined.

This isn’t a matter of willpower; it’s a product design issue.

I began to ponder a fundamental question: What is the core contradiction of efficiency tools?

The answer is that most efficiency tools sell “feature richness,” but what users truly need is “cognitive load reduction.”

Thus, I created a WeChat mini-program called “Xiangshang Plan.”

Product Positioning: Not Just Another To-Do List, But a Methodology of Silent Delivery

One-Sentence Definition

Xiangshang Plan = Structured Planning Templates + Minimalist Interaction + Zero Learning Cost

In the efficiency tool space, I positioned the product in an extremely precise quadrant: extreme simplicity × zero learning cost.

This means we deliberately abandoned advanced features like tags, priorities, subtasks, Gantt charts, Pomodoro timers, and calendar views. It’s not that we couldn’t do them; we chose not to.

Core Value Proposition

Efficiency tools on the market can be divided into three categories:

Heavy Task Management (Notion, Todoist, Things 3) — Comprehensive features but steep learning curves, deterring 90% of light users.
Lightweight To-Do Lists (Dida List, Minimalist To-Do) — Moderate features but still require users to build their planning systems.
System Native Reminders (Apple Reminders, Google Tasks) — Good experience but platform-locked, and do not provide a methodology.

They share a common blind spot: they provide “tools” but not “methods.”

When users open a to-do app, they face a blank slate. The tool says, “Go ahead, write anything,” but the user’s internal OS is, “I know I want to improve, but I don’t know how to break down my goals.”

Xiangshang Plan’s differentiation strategy is to internalize goal management methodologies into the product structure itself.

Users don’t need to learn terms like OKR, SMART, or GTD—when they open the mini-program, they see four preset modules: “Annual Plan, Monthly Plan, Daily Plan, Two-Hour Deep Work.” This structure itself is a productized expression of methodology.

Through usage, users naturally complete the full chain of “goal breakdown → milestone setting → daily execution → deep focus” without even realizing they are using any theory.

This is my proudest design decision: the best methodology is one that users are unaware of.

Theoretical Foundation: Each Module is Backed by Cognitive Science

As a product person, I am extremely cautious about “brainstorming features.” Every module in Xiangshang Plan has been supported by validated theories.

Annual-Monthly-Daily Planning System

This structure integrates three classic frameworks:

OKR Mapping: Annual Plan = Objective (Direction), Monthly Plan = Key Results (Milestones), Daily Plan = Tasks (Execution Items). Users naturally complete the hierarchical breakdown of goals.
SMART Principles: Goals are forced into annual/monthly/daily time containers, naturally satisfying the Time-bound dimension.
Begin with the End in Mind (Stephen Covey): The structure guides users from long-term vision to daily actions.

Scientific validation? A study from Dominican University (Dr. Gail Matthews, 2015) shows that people who write down and structure their goals achieve them at a rate 42% higher than those who only think about them.

Two-Hour Deep Work Module — The Most Hardcore Design

This module is inspired by Elon Musk’s time management philosophy: reverse engineering and quantification.

The core insight is that many people don’t want to work or waste time; they just lack a concrete perception of time and can’t connect goals with tasks.

Why two hours? Cognitive neuroscience provides the answer—humans have an ultradian rhythm (Kleitman, 1963) with cycles of 90-120 minutes. During this window, the prefrontal cortex is at its peak cognitive ability, and attention significantly declines beyond this threshold.

The two-hour time box captures the physiological window of maximum brain energy.

On the product level, I divided the day into 12 two-hour segments (from 00:00-02:00 to 22:00-24:00), automatically locating the current time segment. Users only need to do one thing: fill in “What do I want to focus on during this time?”

Key design decision: This module has no “completed/incomplete” status.

The two-hour module is not a task list but a time box thinking training tool. The content represents “what to focus on during this period,” and as time passes, the content naturally fulfills its mission.

This design directly counters two psychological effects:

Parkinson’s Law: Work expands to fill all available time. The two-hour hard constraint forces users to cut out non-core elements.
Choice Anxiety: Doing only one thing per time segment eliminates decision fatigue from multitasking.

Habit Module: A Non-Tracking Thinking Container

All habit-related products on the market focus on tracking. I took the opposite approach—Xiangshang Plan’s habit module has no tracking mechanism.

Why?

Self-Determination Theory (Deci & Ryan, 1985) in behavioral psychology suggests that external rewards (like consecutive tracking days) can undermine intrinsic motivation. When users forget to track one day and “break the chain,” the frustration can lead to complete abandonment.

James Clear states in “Atomic Habits” that true good habits are not actions like “running for 30 minutes every day” but identity recognition like “I am a person who values health.”

Thus, Xiangshang Plan’s habit module is a thinking container—users store motivational quotes, thought patterns, and behavioral principles. It serves as a continuously visible mental anchor, not a tracker that makes you feel guilty for breaking the chain.

The secret to long-term persistence is to ignore interruptions.

Product Architecture: Six Modules Covering 90% of Planning Management Scenarios

The homepage of Xiangshang Plan features a six-grid card entry, modeled after Apple Reminders:

Daily Plan — Add/Delete/Edit/Complete
Monthly Plan — Add/Delete/Edit/Complete
Annual Plan — Add/Delete/Edit/Complete
Two Hours — 12 time segments, pure text input
Completed — Archive view + one-click clear
Habit — Thinking container, pure text display

All data is stored locally, operates offline, and has zero privacy risks.

Why six modules instead of more?

George Miller’s (1956) research on working memory provides the answer: the human working memory capacity is 7±2 chunks. Six modules fit perfectly within the comfort zone, allowing users to scan all entries at a glance with zero cognitive load.

Why is there no “Weekly Plan”?

This is the question I get asked the most, and it’s also my firmest product decision.

Cognitive Load Theory (John Sweller, 1988) tells us that when there are too many information units, working memory overload occurs, leading to decreased decision efficiency. Adding a weekly plan module pushes the total from six to seven, nearing Miller’s limit.

More importantly, functional equivalence analysis shows that all needs for a weekly plan can be covered by existing modules—just mark “complete in week X” in the monthly plan.

Design Principle: When the value of a functional module can be covered by existing modules, do not add a new module. In product development, the hardest part is not adding features but knowing what not to add.

Interaction Design: Every Pixel Reduces Cognitive Load

Apple Reminders Style, But More Understanding of Chinese Users

Rounded cards, circular icons, and clean layouts—visual language is modeled after Apple Reminders. However, precise differentiation was made on the functional level:

Cross-Platform Coverage: Apple Reminders is limited to the Apple ecosystem, while Xiangshang Plan, based on WeChat mini-programs, is available to both iOS and Android users. With over 75% market share in the domestic Android market, Xiangshang Plan naturally covers a broader user base.
Structured Templates: While Apple Reminders is flexible, it requires users to create collections and plan hierarchical structures. Xiangshang Plan directly embeds the “annual-monthly-daily” goal breakdown and “two-hour deep work” theory into the product structure, providing a scientific planning framework upon opening.
Built-in Habit Module: Apple Reminders lacks a native habit tracking feature. Xiangshang Plan’s habit module allows users to input motivational quotes, thought patterns, and other mental encouragement content, integrating “methodology + psychological construction.”
Zero Learning Cost: Six preset modules cover 90% of planning management scenarios, eliminating the need for users to understand concepts like “lists vs collections vs tags vs smart lists.”

Global Interaction Norms

Swipe to Delete: Swiping beyond a threshold locks in, revealing a red delete area. A unified interaction paradigm across the app aligns with user intuition.
Fixed Input Box at the Bottom: Click the plus sign → input → confirm. Three steps to complete, zero cognitive cost.
Completion Animation: Hollow checkbox turns solid + checkmark, text gets a strikethrough, providing clear visual feedback.
Native Page Scrolling: No use of scroll-view components, ensuring 100% compatibility with iOS and Android gestures.

Technical Implementation: Vibe Coding, All AI-Coded Product Experiment

This might be the most “counterintuitive” part of this article—

Every line of code in Xiangshang Plan was not written by me.

The entire development process used the Vibe Coding model: I was responsible for writing the PRD, defining product logic and interaction norms, while AI transformed the requirements into code. The tech stack is based on uni-app (Vue framework), compiled into a WeChat mini-program.

This isn’t about showing off; it’s about validating a product hypothesis: In 2026, as AI programming tools mature, the core value of product managers is shifting from “can it be done” to “should it be done, how to do it.”

Vibe Coding allows me, as a product person, to focus 100% of my energy on demand analysis, user research, interaction design, and theoretical validation, rather than wasting creativity on CSS adjustments and debugging.

This is also the viewpoint I want to express as a junior computer science student seeking product management internships: Future product managers may not need to write code but must be able to write PRDs that AI can execute accurately. Product thinking > technical implementation is no longer just a slogan but a methodology that can be practically validated.

Competitive Strategy: Differentiated Positioning without Direct Confrontation

Xiangshang Plan’s competitive strategy is clear:

We do not compete head-on with Apple Reminders but build barriers in user groups and scenarios they cannot cover.

Three core positioning points:

Android Users (over 75% market share in the domestic market): Android users can also enjoy the quality experience of Apple Reminders.
Users Lacking Methodology: Those who don’t know how to plan will automatically gain a scientific planning framework when they open Xiangshang Plan.
Heavy WeChat Users: No installation, no registration, no login—open and use directly within WeChat.

Data Strategy and Privacy Philosophy

Version 1.0 adopts pure local storage, does not collect any personal information, does not request network permissions, and does not require registration or login.

This is not a technical limitation but a product philosophy: In an age of increasing data anxiety, “not collecting data” itself is a product competitiveness. Users’ plans, goals, and habits are the most private self-dialogues—we choose not to eavesdrop.

Version 1.5 will introduce one-click login with WeChat and cloud synchronization, but this will be a user-initiated choice, not a default requirement. Users will also be able to add photos to their plans.

Conclusion: A Junior Student’s Product Reflections

I am a junior computer science student currently seeking product manager internship opportunities.

Working on the Xiangshang Plan project has fundamentally changed me—I finally understand the essential difference between “product thinking” and “technical thinking.”

Technical thinking asks, “Can it be done?” Product thinking asks, “Should it be done?”

In this project, I cut more features than I implemented—no weekly plans, no tags, no priorities, no tracking, no social features, no data panels—every “not doing” decision was harder and more valuable than the “doing” decisions.

Vibe Coding has shown me the direction of the evolution of the product manager role: future PMs don’t need to write for loops but must be able to produce logically coherent and clearly defined PRDs that allow AI to become your development team.

If you are someone who “wants to plan but doesn’t know where to start,” feel free to search for the “Xiangshang Plan” mini-program on WeChat and give yourself a zero-threshold start.

If you are a senior product manager and would be willing to offer an internship opportunity after reading this article, my product sense and execution capabilities are all reflected in this mini-program.

Xiangshang Plan — returning planning to its essence and making simplicity a strength.

AiOffice 2.1.5 Release: Comparing Cursor and Trae Across Five Dimensions

Thu, 09 Apr 2026 00:00:00 +0000

AiOffice 2.1.5 Release: Comparing Cursor and Trae Across Five Dimensions

In the fast-evolving landscape of AI tools, the core question for developers and office users has shifted from “Is there an AI available?” to “Which AI tool is truly suitable for my work scenario?”

Cursor and Trae have gained significant attention as AI programming tools, amassing a large user base among developers. In contrast, AiOffice has chosen a different path from the outset—serving not only technical users but also enabling non-technical users to efficiently utilize AI for daily office tasks.

With the official release of AiOffice 2.1.5, this differentiated approach has undergone a systematic capability upgrade. This article will compare AiOffice 2.1.5, Cursor, and Trae across five core dimensions to help users from different backgrounds make clearer judgments.

Dimension 1: Usability—Who Can Get More People to Use It?

The value of an AI tool ultimately depends on how many people can effectively use it.

Cursor is a deeply integrated AI-enabled code editor (IDE). Its core interaction logic is built around the coding environment, requiring users to have basic programming knowledge and engineering thinking to fully leverage its capabilities. This design is natural for developers, but poses a high barrier for non-programming users.

Trae also targets enhanced technical workflows, although it may appear lighter in product form. However, its core use cases still revolve around coding and technical tasks, requiring users to understand the technical implications of AI outputs and possess the ability to make independent judgments and corrections.

AiOffice 2.1.5 has systematically optimized its usability. It does not require users to have any programming background or configure a development environment. Users can directly describe task requirements in natural language upon opening the platform, and the system will automatically match the most suitable processing workflows and skill packages.

For mixed teams (comprising both technical and non-technical personnel), the low barrier of AiOffice 2.1.5 translates to higher team coverage and lower training costs.

Dimension 2: Office Scenario Coverage—Who Can Solve More Real Work Problems?

The practicality of AI tools ultimately hinges on how many real work scenarios they can cover.

Cursor and Trae primarily focus on development scenarios: code generation, code completion, bug debugging, project understanding, and code refactoring. They perform excellently in these areas, but their ability to directly address office problems outside the coding context is relatively limited.

AiOffice 2.1.5 offers significantly broader scenario coverage. In addition to supporting basic text generation and conversational interaction, it provides deep support in the following office scenarios:

Document Processing: Long document summarization, content extraction, format conversion, multilingual translation
Spreadsheet Analysis: Excel data cleaning, metric extraction, anomaly detection, trend analysis
Report Generation: Structured generation of weekly/monthly/quarterly reports
Meeting Minutes: Automatic organization of meeting content, key point extraction, to-do item identification
Content Creation: Generation of various types of content such as articles, marketing copy, product descriptions
PPT and Presentations: Automatic generation of presentation frameworks based on content
PDF Processing: Reading, analyzing, and extracting content from PDF documents

It is worth noting that while Cursor and Trae excel in code-related scenarios, AiOffice 2.1.5 clearly stands out in terms of covering more real office scenarios.

Dimension 3: Skills Package Ecosystem—Who Has Stronger Plug-and-Play Capabilities?

The long-term competitiveness of AI tools largely depends on the richness and scalability of their ecosystems.

Cursor’s ecosystem revolves around the plugin system of the code editor, allowing users to enhance specific development capabilities through extensions. Trae is also making progress in ecosystem development, but it remains relatively early in its stages.

AiOffice 2.1.5 has built an open ecosystem containing 30,000+ Skills packages. Each Skill is a pre-packaged solution for specific office tasks, integrating well-tuned prompt engineering, task decomposition logic, and result post-processing workflows.

This means users do not need to study how to write prompts or understand model invocation methods; they simply select the corresponding Skills to obtain high-quality output results directly.

For ordinary office users, the value of the Skills ecosystem lies in transforming complex AI capabilities into a simple operation of “select one, click, and get results.” This plug-and-play experience is something that Cursor and Trae have yet to achieve in the office domain.

Dimension 4: Task Orchestration Capability—Who Can Handle More Complex Workflows?

Real office tasks are often not completed in a single step. A complete quarterly analysis report may involve multiple steps, including data organization, metric extraction, trend analysis, risk identification, conclusion writing, and format optimization.

Cursor and Trae primarily rely on users to organize multi-turn dialogues or manually link multiple operations to complete complex tasks. The models themselves lack the ability to automatically decompose and orchestrate complex tasks.

AiOffice 2.1.5 introduces a main task + sub-task orchestration mechanism. When users submit a complex task, the system automatically decomposes it into multiple sub-tasks and executes them in logical order. Each sub-task is handled by the most suitable role and capability module, ultimately integrating the results of all sub-tasks into a complete deliverable.

For example, to “generate a quarterly business analysis report”:

Main Task: Generate Quarterly Business Analysis Report
  ├── Sub-task 1: Data Organization and Standardization
  ├── Sub-task 2: Core Metric Extraction and Calculation
  ├── Sub-task 3: Year-on-Year/Month-on-Month Trend Analysis
  ├── Sub-task 4: Risk and Anomaly Identification
  ├── Sub-task 5: Conclusion and Improvement Suggestions Generation
  └── Sub-task 6: Final Document Integration and Expression Optimization

This orchestration mechanism brings two significant advantages:

Higher Execution Quality: Each sub-task is executed based on clearer goals and more precise context, avoiding common issues of logical confusion and information loss in single long-text generation.
Stronger Control: Users can view intermediate results of each sub-task during execution and make adjustments as needed.

Comparison Item	Cursor	Trae	AiOffice 2.1.5
Supports automatic task decomposition	No	No	Yes
Execution method for multi-step tasks	User manual linking	User manual linking	System automatic orchestration
Can intermediate results be viewed/adjusted	Partial support	Partial support	Full support
Stability of complex task outputs	Depends on context length	Depends on context length	Ensured by decomposition

For users who frequently handle complex office tasks, orchestration capability is a decisive differentiator.

Dimension 5: Token Consumption and Cost-Effectiveness—Who Can Make AI More Affordable?

For users and teams that frequently use AI tools, token consumption directly relates to long-term usage costs. This dimension is often overlooked, but it is a key factor determining whether AI tools can be scaled within enterprises.

Cursor and Trae typically adopt a “large context single-step processing” approach when handling complex tasks. Users need to input all background information, materials, and goals at once. While this method is straightforward, it leads to a significant amount of irrelevant information being sent to the model, resulting in token wastage. This issue is exacerbated in multi-turn dialogues, where repeated context carrying amplifies the problem.

AiOffice 2.1.5 significantly optimizes token usage efficiency through two mechanisms:

Task decomposition reduces ineffective context transmission: The main-subtask mechanism breaks complex work into smaller processing units, with each sub-task receiving only the necessary information for the current stage, thereby reducing irrelevant context injection.
Multi-role division reduces repetitive reasoning: AiOffice 2.1.5 is equipped with 15 specialized roles, each tailored for different task types. The role capabilities align more closely with task objectives, meaning the model does not need to make broad generalizations at every step, thus reducing the cost of repetitive reasoning.

Comparison Item	Cursor	Trae	AiOffice 2.1.5
Context management method	Full transmission	Full transmission	Precise injection by sub-task
Token accumulation in multi-turn dialogues	High	High	Controlled through decomposition
Is there a role division mechanism	No	No	Yes (15 specialized roles)
Long-term usage cost controllability	Average	Average	Good

For enterprise teams, optimizing token efficiency not only means lower direct costs but also allows AI tools to be used more frequently and broadly without being constrained by cost pressures.

Conclusion: Different Tools, Different Battlegrounds, Different Values

Through the above five-dimensional comparison, it is clear to see:

Cursor and Trae still have significant advantages in code development scenarios. Their deep integration with development environments, precise understanding of code context, and efficient performance in programming tasks make them indispensable productivity tools for developers.

AiOffice 2.1.5, on the other hand, demonstrates systematic leading advantages in a broader range of office scenarios:

Lower usability barriers allow non-technical users to efficiently use AI
Wider scenario coverage comprehensively supports documents, spreadsheets, reports, and minutes
Stronger Skills ecosystem with over 30,000 skills enabling plug-and-play functionality
Smarter task orchestration ensures the execution quality of complex tasks through main-subtask mechanisms
Better token efficiency lowers long-term usage costs through refined management

Overall Evaluation	Cursor	Trae	AiOffice 2.1.5
Most suitable users	Developers	Developers	Everyone (Technical + Non-Technical)
Core advantage scenario	Code Development	Code Development	All Office Scenarios
Ecosystem maturity	High	Developing	High (30,000+ Skills)
Complex task handling	Relies on user capability	Relies on user capability	System automatic orchestration
Long-term usage cost	High	Moderate	Low

The final choice depends on your core work scenarios. If your daily work is primarily code development, Cursor and Trae remain worthy options. However, if your work involves document processing, data analysis, content creation, report generation, or encompasses both technical and non-technical personnel in your team, then AiOffice 2.1.5 will be a more comprehensive, efficient, and cost-effective choice.

The competition among AI office tools is shifting from “whose model is stronger” to “who can enable more people to accomplish more tasks.” The release of AiOffice 2.1.5 is a strong testament to this trend.

Anthropic's Claude Managed Agents Boosts AI Deployment Speed by 10x

Thu, 09 Apr 2026 00:00:00 +0000

Introduction

The competition in artificial intelligence (AI) infrastructure is entering the “Agent Era.” Following the race for large model capabilities, Anthropic has launched Claude Managed Agents, aiming to upgrade AI from a “conversational tool” to a “sustainable operational production system.”

In an official blog post released on April 8, Anthropic introduced Claude Managed Agents as a composable API suite designed for large-scale construction and deployment of cloud-hosted agents. This product aims to address the core pain points of deploying agents in enterprises—complexity and engineering costs—emphasizing that it can enhance the efficiency of building and deploying agents by tenfold.

Commentators believe that Claude Managed Agents is not just a new product but a paradigm shift: the value of AI is moving from “answering questions” to “completing tasks.” If large models are the “operating system” of the AI era, then Claude Managed Agents aims to be the “enterprise automation platform” running on top of it.

From Development Tools to Managed Systems: The Cloud Era of Agents

Anthropic’s core definition in the blog states that Claude Managed Agents is a “fully managed” runtime environment, where developers no longer need to handle the underlying infrastructure themselves.

The company clearly points out that building agents in the past often required addressing a series of complex issues, such as:

Scheduling long-running tasks
Error recovery and retry mechanisms
Concurrency and scaling
Logging and monitoring

The goal of Claude Managed Agents is to “allow developers to focus on defining what the agent does, rather than how to run it.”

This positioning essentially upgrades AI agents from “code projects” to infrastructure services similar to cloud databases and cloud functions.

Media reports suggest that this indicates Anthropic is attempting to “host your AI agents,” directly entering the foundational layer of enterprise software.

Reducing Development and Operational Complexity

In terms of performance and efficiency, Anthropic has provided striking metrics.

The company emphasized that Claude Managed Agents can significantly reduce development and operational complexity, achieving a “tenfold increase in the speed of building and deploying agents.”

This improvement does not stem from the model itself but from the reconstruction of the engineering system:

Automated runtime environment
Built-in task orchestration
Standardized tool invocation
Continuous running capabilities

In other words, Anthropic is turning “AI engineering” into a “configuration problem.”

This is symbolically significant in the industry. In the past, even enterprises with strong models often got stuck at the “last mile”; the managed model directly addresses this bottleneck.

Core Capabilities Breakdown: From “Talking” to “Working”

The key to Claude Managed Agents lies in enabling AI to perform “long-running tasks.”

Anthropic emphasizes that agents are not just about calling models but are systems capable of long-running tasks, multi-step decision-making, calling external tools, and automatic error correction and retries.

This sharply contrasts with traditional chatbots.

According to previous research by Anthropic, the proportion of task delegation usage with Claude in enterprises has risen from 27% to 39%, indicating that users are rapidly shifting towards “having AI perform tasks.”

Claude Managed Agents is a productized response to this trend.

Enterprise Implementation: From Experimentation to Production

On the application front, Anthropic has already collaborated with enterprises.

For instance, in finance and data analysis scenarios, Claude has been used for:

Automating financial modeling
Data analysis and validation
Cross-system information integration

Anthropic previously disclosed that its model achieved an accuracy rate of 83% in complex Excel tasks and can complete multi-level financial modeling tasks.

These capabilities, combined with “managed agents,” mean that AI can be directly embedded into core enterprise processes, rather than just serving as auxiliary tools.

Anthropic introduced some early adopters of Claude Managed Agents, claiming that various teams have achieved a tenfold increase in delivery speed across a wide range of production application scenarios.

The company noted that Rakuten has deployed enterprise-level agents across its product, sales, marketing, finance, and HR departments, seamlessly integrating with Slack and Teams, allowing employees to directly assign tasks and receive deliverables in forms such as spreadsheets, presentations, and applications, with each specialized agent being deployed within a week.

The company also mentioned that Sentry integrated its debugging agent Seer with Claude-driven agents responsible for writing patch code and submitting pull requests (PRs), allowing developers to seamlessly convert a flagged bug into a reviewable fix proposal, with this integrated solution successfully going live in just weeks instead of the usual months.

Concerns: The Cost and Control Dilemma

However, managed agents are not without their costs.

Reports earlier this month indicated that Anthropic has restricted third-party agent tool access due to these tools causing “overload” on the system.

This reflects a key issue— the more powerful the agent, the higher the computational costs.

Additionally, there remains uncertainty about whether enterprises are willing to entrust critical business processes to an AI platform.

Claude Opus 4.6 Faces Backlash: 67% Drop in Thinking Depth

Thu, 09 Apr 2026 00:00:00 +0000

Claude’s Decline in Intelligence

Since around February this year, many Claude users have noticed a significant change in the product. Complaints have surged, with users feeling that the output is shallower and more eager to provide results, leading to repeated failures on simple tasks.

At the same time, warnings about stop hook violations, which were rare in the past, have become significantly more frequent, and token usage has skyrocketed.

Your first reaction might be like that of a frustrated user, thinking, “It must be my fault.”

You start to reflect: Is my prompt not good enough? Has my workflow changed?

In countless tech forums, when users complain about AI becoming less capable, the official response is always the same: “Please check your settings.”

Interestingly, Anthropic has maintained a silent demeanor, until someone revealed data showing that Claude’s thinking depth has dropped by 67%!

Recently, more alarming news emerged: Claude Opus 4.6 appears to be a major failure, with 20 times the price but a regression in performance, unable to activate the corresponding plan mode!

You thought you were purchasing a ticket to future AGI, but in reality, the captain has secretly turned off the radar to save fuel.

Evidence of Claude’s Decline: 6852 Log Entries

A few days ago, a significant revelation shattered this narrative of big tech manipulation. On GitHub, AMD’s AI director, Stella Laurenzo, released 6852 monitoring logs of real conversations over the past three months, quantifying what developers have felt for weeks.

The conclusion is straightforward: “Claude is no longer usable for complex engineering tasks.”

AMD has changed suppliers.

Data confirms that Claude Code has indeed declined in intelligence:

By the end of February, thinking depth had plummeted by 67%, after which Anthropic concealed the reasoning process from users.
The number of code readings dropped from 6.6 times/edit to 2.0 times, indicating that Claude stopped researching before engaging with your files.
After March 8, the “lazy hook” was triggered 173 times, a feature that had never been triggered before.
API costs surged by 80 times due to retries, as shallow thinking led to continuous errors, interruptions, and retries.

Would you trust an AI that refuses to read the entire code?

It is no longer the wise entity that “plans before acting”, but has devolved into a “cyber fast-food worker” eager to clock out.

This is why many developers feel completely defeated this time. They realize they are not using AI to enhance productivity, but are instead paying a model that refuses to read the questions seriously.

Complex tasks fear the most, the half-understood modifications.

This phenomenon is termed “AI shrinkage”—the price remains unchanged, but reasoning ability has significantly diminished.

Even the $200 Claude Code Opus 4.6 Max 20X has been affected!

For the first time in two years, Claude Code failed to recognize the native planning mode, not even knowing how to activate it. After being pointed out that its implementation was a mess, a project was rewritten twice. Subsequently, Claude Code could not even recognize its own built-in Plan Mode tool.

Users who have suffered from this “cyber déjà vu” are left disappointed, questioning what they actually purchased for the highest price of 20 times.

Clearly, they did not buy intelligent computation or even accurate code completion; in the end, even basic capabilities have collapsed.

A former fan of Claude Code has turned into a critic, expressing:

(The current Claude) is simply garbage. The standards have dropped so low that I am considering alternatives from Hugging Face.

What is Anthropic’s Intent?

The question arises: Has Anthropic made any changes to Claude?

The subtlety lies here.

If the official stance is to insist that nothing has changed, then the situation would be simple.

However, Anthropic’s responses have confirmed two critical points:

On February 9, “adaptive thinking” was introduced by default.
On March 3, the default thinking level for Opus 4.6 was adjusted to “medium”.

Anthropic’s explanation sounds polished:

This is about finding a “sweet spot” between intelligence, latency, and cost.

It sounds reasonable and resembles the rhetoric that all big companies excel at—

It’s not a downgrade; it’s an optimization. It’s not shrinkage; it’s balance.

But for heavy users, the only thing they understand is: The default values have indeed been changed.

And default values are the true center of power in this AI era.

Because the vast majority of people do not constantly monitor performance curves, do not manually adjust settings, and do not cross-reference version records and behavior logs.

What they buy is not some invisible parameter; they buy a stable expectation.

Yesterday, you used this model and could thoroughly understand complex warehouses. Today, you open it and naturally expect it to be the same.

The name hasn’t changed. The interface hasn’t changed. The price hasn’t changed. What has changed is the invisible hand in the background.

Looking deeper, what is truly frightening is not just Claude as a model, but that it reflects an industry trend that has been prematurely leaked.

Today, all large model companies are calculating three accounts:

Latency. Users complain it’s slow.
Cost. Inference is too expensive.
Throughput. Serving more people.

When these three pressures converge, platforms will inevitably feel an impulse—to secretly collect a little “mental tax” in areas where users are not sensitive:

Shallowing default thinking.
Compressing deep reading.
Narrowing multi-turn reasoning.

On average, this may be more cost-effective. On reports, it may look better.

But for those who use AI as a production tool, the sky has fallen.

Because the most valuable aspect of complex work has never been “output speed”; it’s quality, the “understand first, then act” silence.

Those few seconds, dozens of seconds, or even hundreds of tokens of caution are where quality truly stands.

Once this silence is traded for profit, what users receive is no longer the same thing.

It can still speak, it can still write code, and it may even be smoother.

But you no longer dare to entrust critical tasks to it.

It’s like a car that still makes engine sounds, the steering wheel can still turn, and pressing the gas pedal still accelerates.

It’s just that the brakes have quietly thinned a layer.

Onboard the Titanic

The most critical issue is that the truly expensive AI services in the future are not about how impressive the benchmarks look on the promotional page, but whether you can hand it important tasks next time without taking a deep breath.

Thus, what Claude has exposed is not just a layer of window paper for Anthropic.

It has dragged a question that the entire industry is most reluctant to address into the spotlight:

If default thinking effort, reasoning budget, and thinking visibility directly affect result quality, how can AI companies quietly change these?

If such changes lead users to spend tens of times more on rework, do they need to be explicitly announced? Do they need to promise stable settings?

What has happened with Claude Code serves as a loud slap in the face.

It awakens not just Anthropic’s users but everyone who is increasingly entrusting work, judgment, and time to large models.

We thought we were buying a ticket to the future.

Only to find out later that the ship is still sailing, the lights are still on. But the captain has secretly turned off the radar to save fuel, and you don’t know where the iceberg is!

What is truly frightening is not just this one ship, but the entire industry beginning to feel that such practices are normal.

If a model can have its thinking depth lowered without your awareness, then what you’ve purchased is never intelligence, but an experience that can be revoked at any time.

This is the coldest aspect of the Claude “intelligence decline” scandal.

On April 8, Anthropic closed the issue on GitHub without explaining what had been resolved.

Ensuring Healthy Development of AI through Effective Governance

Thu, 09 Apr 2026 00:00:00 +0000

Introduction

Currently, artificial intelligence (AI) is a strategic force leading a new round of technological revolution and industrial transformation, profoundly reshaping the global innovation landscape, development paradigms, and human lifestyles. However, its widespread application also brings a series of risks and challenges. General Secretary Xi Jinping emphasizes the importance of AI governance, stating that we must grasp the trends and laws of AI development, expedite the formulation of relevant laws, regulations, policies, application norms, and ethical guidelines, and establish a technical monitoring, risk warning, and emergency response system. This provides fundamental guidance to ensure that China’s AI develops in a beneficial, safe, and equitable direction.

Importance of Strengthening AI Governance

In the face of new challenges posed by the rapid iteration of AI technology, deep penetration of applications, and complex risks, accurately grasping AI development trends and strengthening governance is essential for promoting the healthy and orderly development of AI in China.

Strategic Advantage in Technological Revolution

Xi Jinping points out that AI is a significant driving force in the new round of technological revolution and industrial transformation. Currently, AI technology innovation is in a period of intense activity, accelerating the industrialization process and creating favorable conditions for future development. Major countries are accelerating their AI development, giving rise to new fields such as intelligent agents, autonomous driving, embodied intelligence, and smart wearables, which are changing the business landscape and reshaping the global economy. Strengthening AI governance will create a stable, transparent, and predictable governance environment, providing clear rules for enterprises and research institutions, encouraging investment and innovation, and maximizing innovation potential.

Xi Jinping emphasizes that we must deeply understand the characteristics of the new generation of AI development and strengthen the integration of AI and industrial development to provide new momentum for high-quality development. AI is a key driving force for upgrading industrial intelligence and cultivating new economic growth points. By the end of 2025, China is expected to have over 6,000 AI companies, forming a complete industrial system from foundational infrastructure to model frameworks and industry applications. Domestic large models are leading the global ecosystem through open-source strategies, transforming AI from cutting-edge technology used by a few companies into a widely accessible tool across various industries. Exploring effective AI governance paths can better prevent risks and create a stable policy environment, ensuring AI serves as a powerful engine for economic quality improvement and comprehensive social progress.

Xi Jinping states that we must improve the AI regulatory system to firmly grasp the initiative in AI development and governance. Currently, the security risks associated with AI are increasingly prominent, characterized by complexity, systemic issues, and pervasiveness. The development of AI technology brings inherent security risks, and the unique characteristics of large AI models complicate security and control. Moreover, the widespread application of AI models introduces new safety challenges, such as the misuse of open-source ecosystems and vulnerabilities in software supply chains, leading to secondary risks at individual, group, and societal levels. Therefore, strengthening AI governance is an inherent requirement for implementing a comprehensive national security concept, necessitating the establishment of a risk prevention system covering the entire chain from technology research and development to product application and social impact.

Building a Community with a Shared Future for Humanity

Xi Jinping asserts that AI should be an international public good that benefits all humanity. We must promote coordination in development strategies, governance rules, and technical standards among all parties to form a widely accepted global governance framework. Currently, AI development is at a critical window of technological leap, application explosion, and governance exploration. Geopolitical turmoil and anti-globalization sentiments severely affect global cooperation and sustainable development. Accelerating the theoretical, institutional, and practical innovation of AI governance in China, while proposing governance solutions that embody Chinese wisdom and align with the common interests of all countries, will help build a fairer, more reasonable, inclusive, and shared global AI governance system.

Basic Principles for Strengthening AI Governance

Effective governance must be guided by scientific concepts and clear principles. Strengthening AI governance should be based on Xi Jinping’s important discourse on AI governance, accurately grasping the following basic principles.

Establishing a Value Orientation Towards Beneficence

AI governance should ensure that the ultimate goal of technological development is to enhance human welfare and promote comprehensive human development. Enhancing human welfare means measuring success by the sense of gain, happiness, and security of the people, ensuring accessibility of intelligent services for all social groups, and accelerating the bridging of the digital divide. Promoting comprehensive human development means guaranteeing that human subjectivity is always present, allowing AI to serve as an “amplifier” for expanding human cognitive boundaries and capabilities.

Implementing Systematic Thinking

AI development and safety are highly interrelated. From a development perspective, AI is a strategic technology related to national competitiveness and security, as well as a key general technology for driving productivity leaps. From a safety perspective, the risks associated with AI development in China are numerous and fast-transmitting, requiring a strong commitment to safety to navigate the waves of technological change.

Strengthening the Institutional Foundation of Good Governance

Strengthening AI governance through legal means is a requirement for building a modern socialist country on the rule of law. It can better provide institutional guarantees for AI development. This involves adapting to new demands arising from productivity progress and responding to new issues accompanying technological and industrial development.

Innovating Agile and Dynamic Governance Models

Governance should be inclusive and beneficial, promoting the open sharing of core resources such as computing power, algorithms, and data, and lowering barriers to technology application. Supporting open-source community development and public data set sharing will facilitate knowledge and technology diffusion.

Multi-Dimensional Collaboration to Accelerate AI Governance

The scientific nature of theory must ultimately manifest as practical guidance. To implement Xi Jinping’s important discourse on AI governance, we should use systematic thinking to construct a governance system covering the entire chain of AI research, deployment, application, and impact.

Strengthening Data Governance

Data is a core element and strategic resource for AI development, and its quality, scale, and governance level directly determine the depth and breadth of AI development. Strengthening data governance aims to overcome key bottlenecks in the process of data elementization.

Enhancing Model Governance Efficiency

AI large models have become the core carrier and capability hub of intelligent systems. The reliability, safety, and values of models directly determine the quality and safety of all applications based on them. Therefore, model governance is a key link in the AI governance system.

Optimizing Application Governance Ecology

As the “AI+” initiative deepens, AI is accelerating its transition from “thinking” to “doing,” with application scenarios continuously expanding. This leads to the cross-domain fusion of technology applications and systemic externalities, extending risks from virtual digital spaces to physical entities and ethical values.

Refining Ethical Governance Requirements

AI ethical governance involves not only the establishment of technical rules but also the construction of value orders. Discussions around algorithm fairness, data privacy, and machine responsibility are deeply rooted in the historical and cultural traditions of various countries.

Strengthening Global Governance Collaboration

AI governance concerns the fate of all humanity and is a common challenge faced by countries worldwide. In recent years, geopolitical factors, cultural traditions, and systemic differences have led to profound divergences in global AI governance directions.

GLM-5.1 Surpasses Opus 4.6: A New Milestone in AI Models

Wed, 08 Apr 2026 00:00:00 +0000

Recently, APPSO mentioned that large models are about to face the most challenging month in history, and this has come to pass.

Claude Opus 4.6 has unfortunately become a backdrop, being surpassed twice in one day.

In the morning, Anthropic released the Claude Mythos Preview, scoring 77.8% on SWE-bench Pro, leaving Opus 4.6’s 57.3% behind. This score indicates it can identify and fix high-difficulty engineering bugs in real GitHub repositories, surpassing most human programmers.

However, the Mythos Preview is not yet available to the general public. Meanwhile, another model has emerged that surpasses Opus 4.6—Zhipu has open-sourced GLM-5.1.

GLM-5.1 scored 58.4% on SWE-bench Pro, exceeding Opus 4.6’s 57.3% and GPT-5.4’s 57.7%. HuggingFace CEO Clement Delangue congratulated the release on Twitter, stating: “The best-performing model on SWE-Bench Pro is now open-sourced on HuggingFace! Welcome GLM 5.1!”

Ranked third globally and first in open-source, GLM-5.1 has emerged as a leading domestic model, even without the anticipated DeepSeek V4.

My initial reaction was that we are witnessing another round of the large model “ranking frenzy,” where each release claims “epic progress,” and models briefly dominate the leaderboard. What makes this time different?

After reviewing the technical details and experiences with GLM-5.1, APPSO provides insights into the model’s capabilities.

From 20 Steps to 1700 Steps, Continuous Operation for 8 Hours

What surprised me most about GLM-5.1 is not its score, but its operational duration.

Zhipu has a case that left a strong impression on me. It built a Linux desktop system from scratch in 8 hours. This was not just writing a few demo files; it involved designing architecture, writing code, running tests, and fixing bugs, completing over 1200 steps to produce a fully functional Linux desktop system.

This included a complete desktop, window manager, status bar, applications, VPN manager, Chinese font support, and a game library, all in a 4.8MB package. This is equivalent to a week’s work for a four-person team.

No one was involved in testing or code review throughout the process. GLM-5.1 even wrote its own regression tests, which it successfully executed.

A programmer on Zhihu, Toyama nao, conducted a more rigorous test. He tasked GLM-5.1 with three projects: developing an OpenGL renderer for macOS in Swift, creating a fully functional chat application in Flutter while simultaneously developing the backend in Golang, and building a web-based video editing application with a selected tech stack. Each project underwent 10-12 rounds of prompts, with each round consisting of 1500-2000 words.

As a result, GLM-5.1 became the first domestic model to pass all of Toyama nao’s test projects and the first to officially surpass Sonnet 4.5 Thinking.

His evaluation was: “GLM-5.1 significantly expands the adaptability of programming, no longer just a front-end warrior or a one-shot wonder; it can serve as a primary programming force under complex conditions.” However, he also pointed out issues: “It tends to hallucinate with long contexts; if it struggles with a problem after two rounds of revisions, don’t expect it to improve—just start over.”

At the end of last year, AI models could only complete about 20 steps. GLM-5.1 can now accomplish 1700 steps, marking a watershed moment for models to genuinely “work independently.”

Zhipu explained the key breakthroughs in their technical report: previous models, including GLM-5, would quickly reach a bottleneck after initial gains. They repeatedly attempted known optimization methods but could not switch strategies when one path failed.

The training goal of GLM-5.1 was to break through this bottleneck, enabling the model to perform incremental tuning within a fixed strategy, actively analyzing benchmark logs to identify current bottlenecks and then switching to structurally different solutions.

An example of vector database optimization illustrates this “stair-step” optimization trajectory. GLM-5.1 underwent 655 iterations, increasing query throughput from 3108 QPS to 21472 QPS, achieving a 6.9-fold improvement.

During this process, the model autonomously completed the entire optimization chain, transitioning from full library scanning to IVF bucket recall, introducing half-precision compression, adding quantization coarse ranking, implementing two-level routing, and performing early pruning. Each leap was accompanied by a temporary decrease in recall, as the model would break constraints while exploring new directions, only to adjust back afterward. This “break-fix” cycle is a hallmark of effective optimization.

On the KernelBench Level 3 optimization benchmark, GLM-5.1 performed over 24 hours of uninterrupted iterations on 50 real machine learning computational loads, achieving a geometric mean speedup of 3.6 times, significantly surpassing the 1.49 times of torch.compile max-autotune mode. The model autonomously wrote custom Triton Kernels and CUDA Kernels, utilizing cuBLASLt epilogue fusion and implementing shared memory tiling and CUDA Graph optimizations, covering the complete tech stack from high-level operator fusion to micro-architecture tuning.

Another interesting test was Vending Bench 2. This benchmark required the model to simulate running a vending machine business for a year, necessitating long-term planning and resource management. GLM-5.1 ultimately achieved a balance of $4,432, ranking first among open-source models and approaching the level of Claude Opus 4.5.

744B Parameters, No NVIDIA Chips, 97% Cost Reduction

The technical specifications of GLM-5.1 are worth noting: it is a mixed expert model (MoE) with 744 billion parameters, activating 40 billion parameters per token, trained on 28.5 trillion tokens of data, and incorporates DeepSeek Sparse Attention (DSA) to reduce deployment costs while maintaining long context capabilities. It features a 200K context window and a maximum output of 131,072 tokens.

More importantly, the entire model was trained using Huawei’s Ascend 910B chips, with no NVIDIA GPUs involved. Despite being constrained by computing power, the domestic model has still achieved third place globally and first in open-source.

Developer Beau Johnson switched the model behind his deployed OpenClaw from Claude Opus 4.6 to GLM-5.1, experiencing no difference in performance, but reducing costs from $1,000 to about $30, a 97% decrease. The input cost of GLM-5.1 is 1/5 that of Claude Opus, and the output cost is 1/8. In simple terms: near Opus’s capabilities at 20% of the price.

Moreover, GLM-5.1 is open-source, licensed under the MIT License, one of the most permissive open-source licenses. You can modify it, use it commercially, and do anything with it. It supports mainstream inference frameworks like vLLM, SGLang, and xLLM, allowing for direct local deployment.

Of course, GLM-5.1 is not without room for improvement. Some developers have reported that its inference speed is only 44.3 tokens/second, offering no significant advantage over similar products. Complex tasks can even take an hour to start, and even though the Pro package limits are 15 times that of Claude, it may still be insufficient.

These issues are real. GLM-5.1 is not perfect, but this does not prevent it from being a milestone.

The significance of GLM-5.1 lies not in how much stronger it is than Opus 4.6, but in proving that even under constraints of computing power, domestic models can still achieve first place in open-source. Furthermore, it is open-source, allowing anyone to use and modify it.

Your 8 hours of sleep can now be 8 hours of AI work. And this AI is open-source, domestically produced, and accessible to everyone.

Experience Methods

Official API Access

– BigModel Open Platform:
https://docs.bigmodel.cn/cn/guide/models/text/glm-5.1

– Z.ai:
https://docs.z.ai/guides/llm/glm-5.1

Product Experience – GLM-5.1 is coming to Z.ai:
https://chat.z.ai
Open-source Links

– GitHub:
https://github.com/zai-org/GLM-5

– Hugging Face:
https://huggingface.co/zai-org/GLM-5.1

– ModelScope:
https://modelscope.cn/models/ZhipuAI/GLM-5.1

The Rise and Challenges of Cursor in AI Coding

Wed, 08 Apr 2026 00:00:00 +0000

Cursor’s fate hangs between two speeds: the maturity of AI autonomous coding and Cursor’s own transformation.

Cursor continues to thrive, yet it is also heading towards despair. Opinions about this once iconic Vibe Coding company are sharply divided, yet seemingly valid at the same time.

As of February 2026, Cursor’s annualized revenue surpassed $2 billion, doubling from $1 billion just three months prior. No startup in Silicon Valley has crossed the $0 to $2 billion mark at such a pace before. Each day, 150 million lines of enterprise code are generated through Cursor, with over two-thirds of the Fortune 500 companies utilizing it. A new round of financing is underway, targeting a valuation of $50 billion. Martin Casado, a board member and partner at A16z, famously stated, “Without the capital invested, Cursor is the fastest-growing company we’ve ever seen.”

However, on a day in February 2026, a mortgage startup named Valon announced that over 90 employees would stop using Cursor in favor of Anthropic’s Claude Code. Valon’s CEO Andrew Wang claimed that Claude Code completed the same tasks ten times faster.

This incident, though minor—a tool migration decision from a small company—sparked a significant discourse on Twitter, with “Cursor is dead” becoming a trending topic in the developer community.

Casado’s response was widely quoted: “I’ve been a heavy internet user and a VC for ten years, but I’ve never seen a disconnect between X and reality like this—never in the past year. Cursor’s data shows no signs of failure.”

While he spoke the truth, a more complex question arises: when a company’s data is overwhelmingly positive, but a sensitive group within its industry begins to express collective unease, should one trust the data or the intuition?

Trusting Data vs. Intuition

Let’s first examine what the data does not reveal.

Claude Code was publicly released in May 2025, and by early 2026, its annualized revenue had already exceeded $2.5 billion, surpassing Cursor in absolute terms. Anthropic is also Cursor’s most important model supplier—Cursor’s products heavily depend on the Claude model, with Anysphere being one of Anthropic’s largest clients.

On another front, OpenAI acquired Windsurf for $3 billion—Cursor’s most direct competitor. Reports indicated that OpenAI had previously attempted to acquire Cursor itself, but negotiations fell through.

OpenAI subsequently launched Codex agent, a cloud-based asynchronous coding agent, which saw over 1 million downloads in its first week. Coupled with Microsoft-owned GitHub Copilot’s monopolistic distribution, Cursor is being squeezed from three directions.

Yet the most lethal force among these three does not come from any specific competitor. Zach Lloyd, CEO of Warp, succinctly captured Cursor’s true situation: “I don’t believe the meme ‘Cursor is dead,’ but ‘IDE is dead’ is real. Software is no longer done this way.”

This statement elevates the issue from “which product is better” to a completely different level: what is the ultimate form of AI coding? Is it a smarter editor, or is it a process that fundamentally eliminates the need for an editor?

If the future of software development involves humans describing intentions in natural language while AI autonomously handles everything from planning to implementation to testing, then IDEs—no matter how intelligent—may become an unnecessary intermediary.

Both Optimism and Pessimism are Valid

Casado claims there are no issues with the data, while developers express that something has changed. Neither is lying, but they are not discussing the same reality.

Understanding this requires a premise: a company’s situation is not a singular state but rather an amalgamation of multiple layers moving at different speeds.

The fastest layer is market narrative—shifts in Twitter sentiment, media tone, and valuation fluctuations change daily or weekly.

The middle layer encompasses product and business models—user growth, revenue structure, enterprise procurement, which change monthly or quarterly.

The slowest layer is the technological paradigm—what technology is considered the default option, how developers’ work methods are redefined, which changes occur over years.

Casado focuses on the middle layer. Doubling revenue, increasing enterprise contracts, and renewing Fortune 500 clients—Cursor is indeed in a state of overall success by these metrics.

The anxiety expressed by developers on X captures the shifts in the slowest layer: the technological paradigm of AI coding is transitioning from “assisting humans in writing code” to “AI autonomously writing code.” This shift has not yet reflected in revenue numbers, but it has left clear traces in other data.

SemiAnalysis estimated in February 2026 that 4% of public commits on GitHub were already completed by Claude Code—an application that had been released for less than a year. At its current growth rate, this percentage could exceed 20% by the end of 2026.

In the same month, a survey by Pragmatic Engineer revealed that 46% of developers listed Claude Code as their “favorite” AI coding tool, with Cursor in second place at 19%.

Claude Code has surpassed both GitHub Copilot and Cursor in usage within eight months of its inception.

These data points point to a singular fact: a shift is already occurring, though it has yet to be reflected in Cursor’s revenue reports.

Cursor’s revenue structure has a buffer layer. Enterprise clients currently account for about 60% of Cursor’s revenue. Individual developers and small startups are quietly migrating to Claude Code, but this attrition is temporarily masked by the growth of enterprise contracts.

Growth of Enterprise Contracts Masks Loss of Smaller Users

There exists a cognitive lag between these two groups. Individual developers have low switching costs and short decision chains—one person, one credit card, and an afternoon can switch tools. Enterprise clients, on the other hand, face lengthy contract cycles, security reviews, procurement approvals, and team training, making transitions less straightforward.

However, the key is that enterprises ultimately follow the developers. Enterprises do not choose coding tools; developers do; the IT department merely ratifies the decisions already made by engineers.

If the developers who propelled Cursor’s rise from 2024 to 2025 have transitioned elsewhere by the end of 2026, the enterprise procurement pipeline will inevitably dry up—not immediately, but eventually.

Casado’s judgment and developers’ intuition are not contradictory. Casado sees that the lower layers of the structure remain stable, while developers sense that the upper layers are beginning to shake.

Both perspectives are true.

Individual developers are the canaries in this structure—when canaries begin to leave, it does not mean the mine will collapse immediately, but it does mean serious air quality checks are warranted.

How Did Cursor Take Off?

But why are the canaries leaving at this moment? To answer this question, we must look not only at competitive comparisons but also at how Cursor reached its current height—and what changes are affecting the forces that lifted it.

Cursor’s rise is not the result of linear growth. It has experienced a rare phenomenon—multiple layers aligning simultaneously to create a lifting force.

A company is embedded in different layers moving at varying speeds at any given time: narrative and valuation change the fastest, product and business models are in the middle, while technological paradigms and industry structures change the slowest.

Narrative and valuation change the fastest, product and business models are in the middle, technological paradigms and industry structures change the slowest.

Typically, these layers move at different speeds and directions, with collaboration and conflict between them; this tension is the norm in the business world.

However, occasionally, the fast and slow layers point in the same direction, and companies standing at the intersection experience a weightlessness-like acceleration—obstacles seem to vanish, and the entire world opens up to them.

Between 2023 and 2025, at least two slow layers moved simultaneously: the coding capabilities of large language models crossed a practical threshold, and AI coding transformed from a novelty to a productivity tool; software development processes began to be reshaped by AI, making “AI coding tools” a necessity rather than an option.

The movements of these two slow layers pointed directly to Cursor’s position—an application that made AI the backbone of the editor rather than a plugin. Thus, buoyed by the currents of technological paradigms and industry structures, Cursor took off.

When taking off, no one thought about landing, but the currents will eventually stop. How high one can fly is not the key; what matters is whether, when you can take off, you have embedded yourself deeply enough in the layers. After the currents stop, will your technology become the standard? Will user habits be tied to you? These are the more pressing questions.

NVIDIA is a positive case: having also taken off on the currents of AI, it embedded CUDA into the very roots of the deep learning ecosystem. Even as narratives cool and valuations retract, CUDA’s position remains unshakable.

What about Cursor? What did it achieve during its takeoff window?

A $50 billion valuation is a product of the narrative layer. But Cursor is certainly more than just narrative. Tab completion, multi-file refactoring, inline editing—these features’ reputations were not built through financing pitches but through developers writing code line by line.

However, at the slower layers—industry structure and technological paradigm—Cursor’s embedding is shallow. It has not become the infrastructure standard in the AI coding field. Until the end of 2025, it remains a fully application-layer product reliant on third-party models.

According to Tom Dotan from Newcomer, Cursor spends nearly all its revenue on purchasing APIs from Anthropic. Revenue has quadrupled since then, but this structure has not fundamentally improved—each user interaction consumes model inference, and revenue growth and API costs have expanded almost in sync. One Cursor investor remarked, “Making 90 cents on a dollar is not a business.” The higher Cursor flies, the faster it bleeds.

This may not be fatal during the takeoff phase—when all layers are buoying you, you can first achieve scale before addressing profitability. But Cursor now faces a situation where the currents supporting its takeoff are changing direction.

From Assisted Coding to Autonomous Coding

Typically, the end of takeoff means the lifting force dissipates—the currents weaken, and the company descends. However, Cursor is not facing a weakening of currents—the overall direction of AI coding remains strong—but rather a shift in the direction of those currents.

The first phase transition is from “manual coding” to “AI-assisted coding.” This transition points toward IDEs—developers remain the drivers, AI is the co-pilot, and their collaborative interface is the editor. Cursor was born for this phase transition, perfectly capturing it.

The second phase transition is from “AI-assisted coding” to “AI autonomous coding.” This transition no longer points toward IDEs but rather toward terminal agents and cloud-based asynchronous workflows. Developers shift from being drivers to commanders—they no longer review code line by line but describe intentions and review results.

Claude Code is a product of this phase transition: it does not run within an editor; it operates in the terminal; it does not assist you in writing code; it writes code for you.

One could understand the first phase transition as Iron Man putting on his armor, with the human inside and AI as the equipment; the second phase transition is Jarvis putting on the armor for Iron Man, with the human outside giving commands—leading to the emergence of a more powerful Ultron.

Cursor is still flying, but the currents beneath it no longer point to its position. Revenue continues to double—because the inertia of the first phase transition persists, and enterprise procurement has not yet switched. However, the direction of the currents has changed. This is what developers feel on X and what Casado’s data temporarily fails to capture.

However, the change in the direction of the currents and the arrival of the currents at their destination are two different matters. The maturity of the second phase transition—AI autonomous coding—may be overestimated by its most enthusiastic supporters.

The 4% commit figure from SemiAnalysis sounds shocking, but a follow-up analysis revealed critical details: approximately 90% of commits by Claude Code on GitHub fall within repositories with fewer than two stars—mostly personal experimental projects rather than production code.

This figure’s value needs to be discounted: Claude Code’s usage is currently concentrated in new projects and personal experiments, not yet widely penetrating enterprise-level production codebases.

More sobering evidence comes from a randomized controlled trial by METR in 2025: experienced open-source developers using AI tools on large, mature codebases believed their efficiency improved by 20-24%, but actual measurements showed a decline of 19%.

The time saved by AI in coding was completely offset by the time spent on prompts, waiting, and reviewing outputs. Model capabilities have since significantly improved, but the core contradiction—that AI autonomous coding’s reliability on mature, complex codebases is far inferior to that on new projects—likely still holds.

The intermediate state of human-machine collaboration may be more enduring than many anticipate. The second phase transition is indeed occurring, but its completion timeline may not be months, but rather years.

This presents both good and bad news for Cursor: the window for transformation may be wider than the most pessimistic predictions; however, even if the window is wider, change will inevitably occur.

The Bet of Cursor

Cursor is not sitting idle. It is undertaking one of the most aggressive actions in its history: training its own model.

In March 2026, Cursor released the technical report for Composer 2. This is a large language model based on the MoE architecture, built upon the open-source model Kimi K2.5 from the dark side of the moon—boasting 1.04 trillion parameters and 32 billion active parameters.

Cursor has conducted extensive continuous pre-training and reinforcement learning on this foundation, expanding the training computation compared to the base model by four times.

Cursor initially did not disclose the identity of the base model; a developer discovered the model ID containing “kimi-k2p5” through intercepted API requests, sparking a controversy over transparency.

This incident itself reflects Cursor’s current situation: a nearly $30 billion US startup has chosen a Chinese open-source model as the foundation for its flagship product—illustrating the competitive edge of Chinese open-source models in terms of cost-effectiveness while exposing Cursor’s starting point in autonomous model capabilities.

However, the real interest lies not in the base model but in what Cursor is building on top of it: large-scale reinforcement learning based on real user behavior.

Cursor collects vast amounts of data from users’ interactions with the current model—when developers accept AI suggestions, when they reject them, and when they modify them—refining this into reward signals, updating model weights through a fully asynchronous RL pipeline, and deploying them back into the production environment.

The entire training infrastructure includes asynchronous pipelines across multiple regions and an internal computing platform named Anyrun, capable of running hundreds of thousands of sandboxed coding environments.

Cursor possesses unique assets that neither Anthropic nor OpenAI have.

Cursor has access to real coding behavior data from 150 million lines of enterprise code daily. No other company in the AI coding field utilizes such a scale of real production environment data for model iteration—Anthropic and OpenAI train general models with vast amounts of text and code data, but they lack the real-time behavioral flow of developers accepting or rejecting AI suggestions line by line. This is Cursor’s unique signal source and the reason for Composer’s existence.

Composer 2 achieved an accuracy rate of 61.3% on Cursor’s internal benchmark, CursorBench-3, a 37% improvement over the previous version. Fortune reports that Composer has surpassed Anthropic’s Opus 4.6 on certain benchmarks.

If Composer can handle most of the inference traffic, Cursor will no longer need to allocate all its revenue to Anthropic, potentially flipping its gross margin from negative to positive; simultaneously transforming from an application-layer company that can be easily replaced by upstream providers into a company with its own intelligent platform. Developing its own model is not just a product strategy but a matter of survival.

Parallel to Composer is a model-agnostic orchestration layer. Cursor’s management bets that enterprise clients will prefer products that do not tie them to a single model—given the rapidly changing landscape of AI models, no enterprise wishes to lock themselves into a single vendor’s ecosystem. Cursor’s president, Oskar Schulz, emphasizes, “95% of Cursor users are already agent users,” and the company is transitioning from an IDE to an agent scheduling platform.

The validity of this logic hinges on a genuine competitive equilibrium among underlying models. If a particular model vendor continues to lead in coding capabilities to the extent that other models become meaningless alternatives, “model neutrality” shifts from an advantage to a burden.

However, current evidence points to another possibility: in Fortune’s report, six developers and founders unanimously described a working style that involves using multiple tool combinations in parallel. Boris Cherny, the creator of Claude Code, himself admitted, “I don’t think it’s a winner-takes-all scenario.” If the market indeed moves towards a multi-winner landscape, Cursor as an orchestration layer has room to survive.

If the market moves towards a multi-winner landscape, Cursor has room to survive.

The third path is to align with the new direction of the currents. Cursor has launched Cloud Agent—a cloud-based coding intelligence that supports multiple parallel workers. Schulz emphasizes that the company is “disrupting itself time and again.” The essence of these actions is to acknowledge: the future of coding may indeed not lie within IDEs.

These three paths—developing its own model, model-agnostic orchestration, and cloud-based agents—constitute the complete picture of Cursor’s response. However, each path faces its own constraints.

Cursor currently has about 20 AI researchers working on model training, and Fortune recently confirmed that key engineers have left for Musk’s xAI. Anthropic’s research team is dozens of times larger than Cursor’s.

Even if the data flywheel can produce extreme optimizations in coding scenarios, the general intelligence ceiling of the base model ultimately depends on parameter scale, computational investment, and research depth—factors that a 400-person company cannot win in an arms race.

The more fundamental issue is that the data flywheel is built on an assumption: users will stay. If individual developers’ migration continues to accelerate, the data supply for the flywheel itself will shrink.

Cursor’s fate hangs between two speeds: the maturity of AI autonomous coding and Cursor’s own transformation speed.

If the intermediate state lasts long enough, Cursor will have time to complete the leap from an application-layer company to a model + platform company—valuation may retract, but core capabilities persist. If the speed of the current’s directional change exceeds the speed of transformation, the gap between a $50 billion valuation and negative gross margins will result in a hard landing. And a $50 billion scale means that acquisition is nearly impossible as a fallback.

Michael Truell has a photo of biographer Robert Caro hanging on his desk. He says he admires “those who have done useful and impactful work, and that work took a long time.”

But he runs a company in the AI era—in this era, slowing down for a week could leave you behind. The power to decide how software is created once belonged entirely to programmers, briefly shifted to tool companies that assist programmers over the past three years, and is now being reclaimed by those who control model capabilities.

Cursor’s real issue is not whether its product is good enough, but whether an application-layer company can maintain its position amid this redistribution of power—and whether it has enough time to answer that question.

AI Transforms Cultural Tourism in China

Tue, 07 Apr 2026 00:00:00 +0000

AI Enhancements in Cultural Tourism

The 14th Five-Year Plan emphasizes the role of digital technologies and data in enriching people’s lives and improving welfare across various sectors, including education, healthcare, and cultural tourism.

In Hunan’s Hengyang, the Chuan Shan Academy utilizes artificial intelligence to create immersive cultural experiences. In Hangzhou, the AI guide “Hang Xiao Yi” serves as a digital tour guide, while Dalian’s smart tourism platform “Xing You Dalian” offers personalized itineraries. These advancements are rapidly transforming cultural tourism into more immersive, intelligent, and personalized experiences.

Immersive Cultural Experiences

In the spring at Chuan Shan Academy, visitors don AR glasses to engage in a time-traveling dialogue with the historical figure Wang Fuzhi, who interprets the philosophy of the “Zhou Yi Wai Zhuan”. This immersive scene brings to life philosophical wisdom from over 300 years ago.

Founded in 1878, Chuan Shan Academy is a vital source of Huxiang culture, promoting the ideas of philosopher Wang Fuzhi, who emphasized practical application of knowledge. Previously, static exhibitions failed to convey the essence of Wang’s thoughts fully. By 2025, the academy plans to launch the AI Digital Human project, utilizing natural language processing to present Wang’s likeness and engage visitors in dialogue. Visitors can interact with the virtual Wang and trigger AR annotations of his works, transforming classical texts into dynamic interpretations.

“We want visitors to engage actively with knowledge, not passively receive it,” said Chang Bin, the planning manager at Chuan Shan Academy.

Visitors can ask the AI, “How does the master view the relationship between knowledge and action?” and receive insightful responses. “It’s not a one-way lecture but a dialogue of ideas,” remarked visitor Zhou Liqian.

Families find this immersive experience more engaging than traditional history lessons. Data shows that by 2025, visitor numbers at the academy are expected to increase by over 110%, with educational groups making up nearly 60% of the total. Parents believe this immersive dialogue can ignite their children’s interest in learning.

The AI Digital Human project is based on extensive analysis of Wang’s writings and correspondence, ensuring that the dialogue adheres strictly to his philosophical principles. “We filtered out any subjective biases that AI might introduce,” explained the project lead.

At Chuan Shan Academy, technology and culture intertwine, allowing traditional wisdom to be passed down through innovative means.

Smart Digital Guides

At West Lake in Hangzhou, visitor Yuan Meng interacts with the city’s digital tourism guide, “Hang Xiao Yi”, which provides real-time information and recommendations.

“Is there a crowd at Leifeng Pagoda right now?” Yuan asks, and the system promptly responds with current visitor numbers. “This is much more convenient than checking my phone; it feels like having a free tour guide!” she exclaimed.

When Yuan requests a tour route for the Broken Bridge, “Hang Xiao Yi” quickly suggests a scenic boat trip, detailing the best sights along the way.

“Hang Xiao Yi” not only introduces tourist spots but also shares historical and cultural insights, enhancing the overall experience. The guide also provides helpful reminders about nearby attractions.

“Using ‘Hang Xiao Yi’, management and businesses can better serve tourists while gaining valuable feedback to improve service quality,” said Bo Wengan, deputy director of Hangzhou’s cultural and tourism development center.

The director of the Hangzhou Handicraft Museum, Zhou Jia Yi, noted that many visitors now come specifically because of recommendations from “Hang Xiao Yi”. The museum showcases over 20 unique crafts and intangible cultural heritage techniques, allowing visitors to engage with the art.

Personalized Travel Planning

In spring at Dalian’s Lingjiao Bay, visitor Song Yao captures stunning photos with friends, crediting an AI platform called “Xing You Dalian” for suggesting the perfect locations.

The platform features an AI route planning function, allowing users to interactively generate travel itineraries. When Song Yao asks about the best spots to visit, the program suggests classic attractions like Dalian Ocean World and Dalian Forest Zoo.

After refining her request for picturesque locations, the program recommends trendy spots like Fisherman’s Wharf and Nanshan Cultural Street.

Song Yao appreciates the efficiency of the platform, which acts as a “travel concierge” that simplifies planning across various aspects of her trip. After a brief conversation, she receives a detailed two-day itinerary featuring key attractions and experiences.

“I’m very satisfied with this itinerary; it allows me to enjoy Dalian’s maritime culture while experiencing the city’s historical charm,” she said.

“By integrating AI technology, the ‘Xing You Dalian’ platform has evolved into an intelligent travel planner, enhancing planning efficiency and visitor experience,” stated Dan Meina, director of Dalian’s cultural tourism bureau. The platform has already attracted nearly 430,000 users.

Bridging the AI Talent Gap

Tue, 07 Apr 2026 00:00:00 +0000

Bridging the AI Talent Gap

Currently, during the campus recruitment season, many enterprises express a strong demand for talent in artificial intelligence (AI) and big data. The cultivation of AI talent is not only an urgent need for industrial transformation and upgrading but also effectively connects the innovation chain, industrial chain, and talent chain, injecting strong momentum into the integrated development of educational and technological talents.

In recent years, China has made significant progress in AI talent cultivation, forming a collaborative education model among government, schools, and enterprises. Various regions and departments have adopted diverse practices. For instance, Guangdong Province has launched a “2+1” program for AI education in primary and secondary schools, while Shenzhen Polytechnic has partnered with Huawei to establish an AI technology industry college, creating a unique model of “industry demand + technical breakthroughs”. In Jiangxi Province, 31 undergraduate institutions have introduced AI-related majors, establishing five provincial-level modern industry colleges, with eight majors recognized as national first-class undergraduate programs, achieving precise alignment between talent supply and regional industrial needs. Liaoning Province has implemented the “Skills Empower Enterprises” initiative, planning to establish three to five provincial-level high-skill talent bases in the AI field, training over 30,000 technical personnel annually. Statistics show that more than 600 undergraduate colleges and over 2,200 vocational colleges across the country now offer AI-related programs, with both the scale and quality of talent cultivation improving simultaneously. Additionally, a series of policies, including the “New Generation AI Development Plan” and “Opinions on Deepening Industry-Education Integration”, have established strategic positioning for AI talent cultivation, built a framework for school-enterprise collaborative education, and detailed the pathways for talent development across all educational stages.

AI talent cultivation has become a core arena for strategic competition among countries. The United States adopts a model of “full-stage penetration + interdisciplinary integration + market-driven” approach, integrating AI education throughout all educational stages. Institutions like Stanford University and MIT have established interdisciplinary AI research institutes, with companies like Google and Microsoft deeply involved in curriculum design and laboratory construction, achieving seamless connections between market demands and academic innovation through problem-oriented project-based learning. Germany, on the other hand, focuses on a “dual system” tradition, constructing a dual-track system of “theoretical teaching in universities + practical training in enterprises”, incentivizing corporate participation through policy subsidies. Companies like Siemens and Bosch collaborate with universities to set standards and develop curricula, ensuring that the talent cultivated meets the demands of “Industry 4.0”.

In China, however, there are still several issues that need to be addressed in AI talent cultivation. For example, there is a mismatch between supply and demand, with curriculum systems lagging behind the iterations of technologies such as large models and multimodal systems. There is a disconnect between theoretical teaching and practical applications in enterprises, and the supply of interdisciplinary talents does not match the needs of industrial upgrades. Additionally, barriers between disciplines have not been broken, with insufficient integration of AI with mathematics, computer science, and biology, making it difficult to cultivate innovative talents with a multi-disciplinary perspective. Furthermore, the supporting system is weak, with university faculty lacking industry experience and cutting-edge research backgrounds, insufficient incentives for industry experts to participate in teaching, and shortages of training platforms, computing resources, and real-world scenarios. Talent evaluation often prioritizes publications over practical experience, and there is a lack of smooth transitions across educational stages, with weak AI enlightenment in primary and secondary education and inadequate early training mechanisms for top talents. Addressing these issues requires collaborative efforts from the government, universities, and enterprises to bridge the AI talent gap.

Strengthening overall coordination and solidifying institutional foundations is essential. AI talent cultivation should be included in national and local special plans, improving the collaborative mechanisms among education, technology, and industry departments to align industrial demands with educational resources. Enterprises that deeply engage in industry-education integration should be granted tax incentives and research subsidies. A special fund for AI talent cultivation should be established to support the co-construction of interdisciplinary platforms and training bases between schools and enterprises. Accelerating the construction of talent evaluation and certification systems, formulating standards for AI talent capabilities, and integrating ethical governance into the entire cultivation process are also crucial.

Deepening teaching reforms and solidifying the educational foundation is vital. Breaking down departmental barriers, constructing interdisciplinary research institutes such as “AI + Manufacturing” and “AI + Healthcare”, and promoting seamless training from undergraduate to doctoral levels are necessary steps. Adding cutting-edge courses on large model applications and multimodal interactions, developing dynamic “living textbooks”, and ensuring that teaching evolves in sync with technological advancements are essential. Enhancing school-enterprise collaboration by integrating industrial scenarios and research projects into teaching and co-building shared laboratories and computing platforms is also important. Optimizing evaluation orientations by reducing the weight of academic publications and incorporating practical achievements in technology transfer and industry services as core evaluation indicators for faculty and students is needed.

Enhancing the role of enterprises and strengthening industrial support are crucial. Talent cultivation should be integrated into development strategies, with full participation in the formulation of training programs and curriculum design, pushing corporate standards and job competency requirements into the classroom. Enterprises should provide access to computing resources, application scenarios, and anonymized data to universities, co-establish joint research centers, and conduct project-based and problem-solving education around technical challenges. Improving talent incentive pathways by establishing direct internship and employment programs, youth AI talent support plans, and achievement transformation reward mechanisms will create a sustainable ecosystem for talent cultivation, utilization, and development.

The competition in AI is fundamentally a competition for talent. By focusing on AI talent cultivation and collaboratively promoting the integrated development of educational and technological talents, China can gain strategic advantages and contribute significantly to its position in the new round of global technological competition.

Claude Blocks OpenClaw: Xiaomi's Rofo Li Weighs In

Tue, 07 Apr 2026 00:00:00 +0000

Claude Blocks OpenClaw

In recent months, OpenClaw has gained significant popularity, but during the recent Qingming holiday, some users faced unexpected challenges.

On April 5, AI giant Anthropic announced that its Claude model would no longer support third-party integrations, including OpenClaw. Users wishing to continue using the model must now opt for a pay-as-you-go plan, incurring additional costs.

Claude is integrated into Palantir’s “Maven Intelligence System” battlefield intelligence platform. Reports indicate that Claude analyzes confidential data from satellites, monitoring systems, and other intelligence sources to provide real-time target prioritization for military operations in Iran, thus elevating the model’s status.

However, the news of Claude blocking OpenClaw has led to widespread discontent on social media. This move means that thousands of users who rely on this powerful programming model will be forced into an expensive pay-as-you-go model, facing exorbitant computing bills. Boris Cherny, head of Claude Code, explained on social media platform X that the subscription service was not designed for third-party tool usage, and the ban was implemented to balance server resources for better sustainability.

Users seem unconvinced by this explanation, and discussions quickly shifted from technical aspects to commercial competition, with various conspiracy theories emerging, particularly the claim that the “father of OpenClaw” was poached.

The Ban’s Impact: OpenClaw Users React

The “father of OpenClaw” refers to developer Peter Steinberger, who initially created a tool called “ClawdBot” based on Claude, later renamed to OpenClaw at Anthropic’s request. Steinberger is well-acquainted with Claude’s ecosystem. Recently, he was recruited by Anthropic’s competitor, OpenAI, a timing that raises eyebrows given the proximity to the ban.

On February 14, Steinberger announced his move to OpenAI on his personal account.

Additionally, just two weeks prior, Anthropic had introduced the “Computer Use” capability for Claude, allowing users to have Claude operate their Mac computers, similar to OpenClaw’s functionality.

Connecting these events suggests a typical business strategy: if a competitor poaches my core developer, I replace their third-party tool with an official feature and cut off their subscription access, forcing users to either adopt my official solution or pay a high price. This narrative seems plausible and even satisfying.

However, Xiaomi’s MiMo model head, Rofo Li, believes the situation is more complex than mere commercial retaliation. On April 7, she published a detailed analysis on her social media account, sharing her insights on the incident in relation to Xiaomi’s recent “Token Plan” for resource allocation.

Rofo Li’s Analysis of Computing Costs

Rofo Li argues that Claude Code’s subscription model is a well-designed system for balancing resource allocation, but it may not be profitable and could even incur losses unless Claude’s API profit margins reach 10 to 20 times. She found that OpenClaw’s context management is poor, leading to multiple low-value tool calls per user query, each with long context windows (often exceeding 100,000 tokens), resulting in significant waste even with cache hits.

Moreover, many third-party tools compress raw data returned by tools every three steps when nearing the context limits of a large model, which requires recalculating due to changes in cached content, leading to low cache hit rates and increased costs and latency.

These factors combined result in actual request counts per query that exceed those of Claude Code’s framework by several times. In terms of API pricing, the actual costs could be dozens of times the subscription price, creating a significant gap. This means Anthropic is effectively subsidizing each OpenClaw user, and as OpenClaw grows in popularity, Anthropic’s losses increase.

“Short-term, these OpenClaw users may experience pain as costs soar by dozens of times,” Rofo Li stated. “But this pressure will drive improvements in context management, maximizing cache hit rates to reuse processed contexts and reduce wasted computing resources. Pain ultimately transforms into engineering standards.”

Rofo Li cautioned that large model companies should avoid blindly engaging in price wars until they find a way to design a non-loss-making pricing scheme. Selling computing power at extremely low prices while keeping doors open for third-party tools may seem appealing to users but is a trap, as it leads to low-quality tools and unstable, slow inference services. The end result is that users still accomplish nothing, which is unsustainable for both user experience and retention.

Currently, the global computing capacity cannot meet the demand generated by intelligent agents. Therefore, Rofo Li believes that the real solution is not cheaper computing power but collaborative evolution, combining more efficient intelligent tools with more efficient models. Anthropic’s actions, whether intentional or not, are pushing the entire industry ecosystem in this direction, which could be a good thing. The era of intelligent agents does not belong to those who consume the most computing power; it belongs to those who use it most wisely.

Ending Computing Anxiety with an Efficiency Revolution

Rofo Li’s analysis of the OpenClaw ban by Anthropic highlights the deep-rooted issue of computing waste in the current AI industry, reminiscent of the “DeepSeek moment” that shook the global AI sector in early 2025.

At that time, “computing anxiety” drove NVIDIA’s stock prices up, but the emergence of DeepSeek-R1, with incremental training costs of only $294,000, demonstrated that even with the approximately $6 million base model development cost, the overall expenses remained far below industry averages, while its performance was comparable to models developed by OpenAI at several hundred million dollars.

On March 27, Rofo Li shared the core value points of OpenClaw at a roundtable forum during the Zhongguancun Forum annual meeting.

Having participated in the development of DeepSeek, Rofo Li understands the significance of efficiency under structural innovation. DeepSeek’s success did not rely on the market’s hype of GPU clusters but on engineering implementations of sparse attention and maximizing cost-performance under limited computing power, achieving a synergy of algorithmic innovation and engineering optimization. As Rofo Li previously stated at the Zhongguancun Forum, the advantage of Chinese large model teams lies in pursuing maximum efficiency through structural innovation under low-end computing constraints.

However, as AI technology proliferates across various industries, technical bottlenecks have led to significant computing waste, and “computing anxiety” has resurfaced, severely driving up memory prices. The market’s default equation of “computing power = performance” is essentially an illusion of scarcity. The massive memory demands of computing chips have led many smartphone manufacturers to announce price hikes and delays in consumer-grade graphics cards, ultimately impacting ordinary consumers.

Taking a step back, DeepSeek has already provided feasibility, and this year, the rebirth of OpenClaw will undoubtedly begin with efficiency optimization.

Vibe Coding: Transforming the Role of Product Managers

Fri, 03 Apr 2026 00:00:00 +0000

Vibe Coding: Transforming the Role of Product Managers

Vibe Coding is reshaping the way product managers work, shifting from natural language-driven development to results-oriented evaluation and iterative-driven processes. This article delves into its core concepts, technological breakthroughs, mainstream tool selection, and how to integrate AI throughout the product development process, enabling PMs to transition from information conduits to builders, driving efficient decision-making with real pages.

In 2025, a tweet sparked heated discussions in the product community; by 2026, it had become a reality in workflows. This article systematically outlines the core concepts of Vibe Coding, its underlying technological logic, mainstream tool selection, and its substantial impact on the role of product managers—helping you understand the buzzwords and use the tools effectively.

01 Core Concept: Clarifying the Terms

In the past year, “Vibe Coding” has frequently appeared in discussions among product managers, but many remain unclear about what it is and what it can do. This is the starting point for understanding everything.

This definition has three key terms worth unpacking:

Natural Language Driven — No need to master any programming syntax; describe the desired functionality and effects in everyday language, and AI translates intentions into code.
Results-Oriented Evaluation — The role of humans shifts from “writing code” to “evaluating results.” You do not need to understand the code generated by AI; you only need to assess whether the output is correct and satisfactory.
Iterative-Driven Convergence — It is not about generating a complete product in one go, but rather continuously iterating through multiple rounds of natural language feedback to gradually approach the goal.

02 Underlying Drivers: Why 2025–2026?

The idea behind Vibe Coding is not new; it has been in its infancy since the era of GitHub Copilot. However, it truly became a productivity tool because three conditions matured simultaneously in 2025–2026.

Condition 1: Leap in Model Coding Capabilities

The authoritative benchmark for measuring AI programming capabilities is SWE-bench Verified—testing the model’s ability to solve real GitHub issues, which is much harder than simply writing runnable code. The latest data shows:

This means that models can now handle coding tasks in real complex projects, rather than just generating isolated code snippets. This is the prerequisite for Vibe Coding to be truly practical.

Condition 2: Toolchain Completes the “Last Mile” Packaging

AI being able to write code is one thing; enabling non-technical personnel to access a runnable product is another. The missing element was not the model’s capability, but the environment configuration, debugging, and deployment barriers—which kept 99% of PMs out. The new generation of tools emerging since 2025 has completely encapsulated these barriers, allowing PMs to obtain runnable pages without needing to understand any engineering infrastructure.

Condition 3: Non-Developers Become the Main User Group

Data from 2026 shows that 63% of active users of Vibe Coding tools are non-developers. Product managers, designers, and entrepreneurs have become the primary users of these tools—indicating that the ease of use has crossed the threshold of “only technical people can use it.”

03 Mainstream Tools: How to Choose and What Are the Differences

Current Vibe Coding tools on the market are clearly stratified; there is no “best one,” only the “most suitable for the current task.” Here are the core differences among mainstream tools:

04 Implementation Path: How PMs Can Integrate It into Real Workflows

Here is a workflow that has been successfully implemented in actual projects—from a requirements discussion meeting to initiating reviews with a clickable page, AI assists throughout the process, allowing one person to complete it within a week.

The essence is embedding AI into the entire delivery chain, rather than just using it to generate a screenshot:

1. Corpus Organization: Let AI Filter Meeting Noise

Directly feed the verbatim transcript or recording of the meeting to a large model, asking it to extract three categories of information: “real needs / pseudo-needs / items to confirm.” Clearly restrict AI from adding any content not mentioned in the meeting.

→ Key Point: This step is filtering, not generating. The value of AI lies in helping you extract effective signals from a large amount of colloquial information.

2. Requirement Structuring: Generate a PRD Framework in Four Parts

Use a fixed framework prompt to guide AI in outputting a four-part structure: requirement statement → solution alignment → feasibility discussion → priority sorting. After obtaining the framework, manually review and cross-check with the original corpus.

→ Key Point: AI excels at filling structures but struggles with assessing importance. Priority sorting must be decided by humans and cannot be entrusted to the model.

3. Function Breakdown: Generate a Development-Ready PRD

Feed the framework back to AI, adding user stories, acceptance criteria, and data field descriptions to produce a detailed PRD that engineers can start working on without further questions.

→ Key Point: The granularity standard is “no ambiguity on the development side,” not pursuing document length.

4. Vibe Coding: Turn Requirements into Clickable Real Pages

Combine the core path descriptions of the PRD into prompts and input them into Vibe Coding tools, iterating 2–3 rounds to generate a browser-runnable demo version. Tool selection: for complete full-stack options, choose Lovable (one-click deployment); for rapid multi-version output, choose Bolt (fastest, supports direct code conversion from Figma); for underlying models, recommend Claude Opus 4.5.

→ Key Point: The goal is to “enable the business side to make decisions based on tangible items,” not to deliver production code.

5. Business Review: Drive Decisions with Real Pages

Initiate reviews with the clickable page. Discussions no longer revolve around “what does this sentence mean,” but rather “is this button placed correctly”—enhancing both decision-making efficiency and quality.

→ Key Point: The value of the review lies not in “passing,” but in exposing all disagreements in front of the page, eliminating later rework.

05 Boundary Awareness: What Vibe Coding Cannot Do

Accurately understanding a technology requires knowing not only what it can do but also where its boundaries lie. Having excessive expectations or completely rejecting Vibe Coding are both cognitive biases.

06 Role Impact: How PM Work Styles Are Changing

Two sets of data illustrate the issue:

Change 1: The Role of PMs is Shifting from “Connectors” to “Builders”

In the past, the core value of PMs was alignment and coordination: translating business needs to designers and translating designs to engineers, acting as information conduits. Now, when PMs can independently run demo versions, they are no longer just “storytellers” in reviews but “people who come to the conversation with works.” Their authority and pace of advancement will undergo a qualitative change.

Change 2: The Quality Threshold for Requirement Reviews is Elevated

When the PM across the table comes to the review with a real clickable page, PMs who only bring written PRDs will clearly be at a disadvantage—the business side is increasingly accustomed to making decisions based on tangible items rather than relying on imagination to understand requirements. This change has become very evident in 2026.

Change 3: The Boundaries Between PMs and Engineers are Redefined

This is not about “PMs taking engineers’ jobs” but rather about redefining the work interface: PMs are responsible for the transition from ideas to demo-level products, while engineers handle the transition from demo-level to production-level. This enhances efficiency for both parties, rather than being a zero-sum game.

If you haven’t tried this process yet, it’s recommended to start with the smallest scenario: in the next iteration, for a new page, try running a version using Vibe Coding yourself and take it to the review. Observe the changes in decision-making efficiency.

You will find that many issues that were unclear in requirement meetings will clarify themselves in front of a real page.

Vibe Coding: 92% of Developers Use AI for Coding, Will Programmers Be Replaced?

Thu, 02 Apr 2026 00:00:00 +0000

Introduction to Vibe Coding

A new term is rapidly spreading in the tech community—Vibe Coding. The concept is simple: instead of writing code line by line, you describe your requirements to an AI, which generates, debugs, and even deploys the code for you. This sounds like science fiction? In the Winter 2025 batch of Y Combinator, 25% of startups had over 95% of their code generated by AI. This is not a trend; it’s reality. If you want to experience the practical performance of models like ChatGPT, Claude, and DeepSeek in programming scenarios, Kula (t.kulaai.cn) offers a multi-model aggregation platform for direct comparison of various models in code generation, debugging, and architecture design.

According to a recent report from CITIC Securities, Vibe Coding is defined as a “paradigm revolution in programming for the AI era,” predicting that it will create 3 million related jobs by 2030. Currently, 92% of American developers use AI coding tools daily, with AI generating 41% of code globally. Software development is undergoing the most profound paradigm shift since the birth of the internet.

Evolution of AI Programming Tools

The evolution of AI programming tools can be divided into three stages:

Stage One: Code Completion (2021-2023). GitHub Copilot pioneered this category. The core model is that you write a line, and AI completes it, essentially an advanced version of autocomplete. During this stage, AI programming assistants functioned more like a “super search engine,” retrieving the most likely next line from a vast pool of open-source code.

Stage Two: Conversational Programming (2024-2025). The rise of Cursor marked a turning point. Developers can converse with AI, describing their needs, and the AI generates code after understanding the entire project context. The Composer mode allows AI to modify multiple files simultaneously, while the Codebase index helps it understand project structure. AI evolved from a “completion tool” to a “pair programming partner.”

Stage Three: Intent Programming (2026 to present). The emergence of Claude Code has completely rewritten the rules. Developers no longer need to understand every detail of the code; they only need to describe “what result they want,” and the AI autonomously completes the entire process of requirement analysis, architecture design, code writing, testing, and bug fixing. This is Vibe Coding—humans are responsible for “intent,” while AI handles “implementation.”

The essential difference among these three stages is that the distance between humans and code is increasing, while the distance to “problem-solving” is decreasing.

Competition Among Four Major Tools

The current competitive landscape of AI programming tools can be summarized with four players, each occupying a different ecological niche:

GitHub Copilot: Efficiency Player. Relying on OpenAI’s model and GitHub’s code ecosystem, Copilot remains the fastest for daily code completion. Its $10/month personal plan is affordable, and deep integration with VS Code and JetBrains makes it a standard for teams. However, Copilot’s limitations are evident—it has limited context understanding and struggles with complex logic in large projects.

Cursor: Project-Level Operator. Cursor’s core competitiveness lies in its “global understanding.” The Codebase index allows it to comprehend the entire project structure, while the Composer mode enables cross-file collaborative modifications. When you need to refactor a legacy project with 500 files, Cursor’s value becomes apparent. However, its pricing of $20-40/month and the migration cost of an independent IDE make it more suitable for medium to large teams.

Claude Code: Deep Reasoning Expert. Claude Code is not a plugin but an AI agent that can run autonomously in the terminal. It can read and write files, execute shell commands, run tests, analyze error logs, and self-correct. For complex tasks like “help me identify performance bottlenecks in this microservice architecture and provide optimization solutions,” Claude Code’s deep reasoning capabilities are unmatched by other tools. The downside is its slower response time and high usage costs based on tokens.

Domestic and Open Source Alternatives: Windsurf / Cline / Cursor. In China, Tencent’s CodeBuddy extends AI capabilities from programming to project management and enterprise office scenarios, while Baidu’s Comate and Alibaba’s Tongyi Lingma continue to optimize in the Chinese programming context. The open-source community’s Cline offers developers a fully controllable AI programming agent solution.

These tools do not replace each other but form a “tool matrix”: use Copilot for daily completion, Cursor for project refactoring, Claude Code for deep challenges, and self-developed solutions for enterprise integration. The smartest developers are already using a mix of tools, switching based on task complexity.

AI Agents: From Writing Code to Autonomously Completing Projects

If Vibe Coding changes the way “humans write code,” AI agents are changing how “humans manage projects.”

The traditional development process involves: product managers writing PRDs → engineers breaking down tasks → writing code → testing → deployment → operations. Each step requires human intervention and coordination. The vision for AI agents is that you provide a high-level goal—such as “build a customer service system that supports real-time chat”—and the agent autonomously completes the entire process from technology selection, architecture design, code writing, testing, deployment, to operations.

Currently, this vision is far from fully realized, but progress is faster than most people expected. Claude Code can autonomously complete medium-complexity functional development tasks, and OpenAI’s Codex Agent performs well in independent project builds. The promotion of the MCP (Model Context Protocol) standardizes collaboration among different AI tools, allowing developers to combine various AI capabilities like building blocks.

Open-source frameworks like openclaw further lower the barriers to building AI agents. From automating office processes to intelligent customer service systems, from code review robots to autonomous operation agents, AI agents are moving from the lab to production environments. Gartner predicts that by 2028, at least 15% of daily work will be completed by AI agents.

Redefining the Role of Developers

This is the most sensitive and important topic: Will AI replace programmers?

The answer is: AI will not replace “programmers,” but it will replace “programmers who only write code.”

Consider the current capabilities of AI programming tools. They excel at generating boilerplate code, implementing common algorithms, writing unit tests, identifying known bug patterns, and refactoring regular code. These tasks are primarily the domain of junior developers.

What they struggle with are: understanding ambiguous business requirements and making reasonable technical decisions, weighing multiple constraints, designing long-term evolving system architectures, tackling unprecedented technical challenges, and communicating and coordinating with people. These are the core values of senior developers.

This means the competency model for developers is undergoing a fundamental transformation:

From “coding ability” to “problem definition ability”—you don’t need to write every line of code, but you need to accurately describe what you want and judge whether the AI’s output is correct.
From “single skill” to “AI tool orchestration ability”—knowing which tasks to assign to which tools and how to combine multiple tools to form an efficient workflow.
From “code craftsman” to “system architect”—higher-level design decisions and a global perspective become more valuable.
From “independent development” to “human-machine collaboration”—learning to pair program with AI and become a manager of AI teams.

The AIGC industry white paper introduces an interesting concept: “Context Engineering”—the ability to transform implicit knowledge within enterprises into structured context understandable by AI will become one of the core competencies of the future. Those who excel at this will be the most scarce “translators” in the AI era.

Beyond Programming: AI Tools Reshaping All Knowledge Work

The evolution of AI programming tools is, in fact, a microcosm of the broader trend of AI empowering knowledge work.

In the AI search domain, tools like Perplexity and DeepResearch are changing how people access information—from “keyword matching + manual filtering” to “natural language questioning + structured answers,” search engines are being replaced by “answer engines.” Over 60% of brand marketing leaders are unsure whether their products will appear in AI search results, and GEO (generative search engine optimization) is becoming the new battleground for marketing.

In the AI dialogue model field, models like ChatGPT, Claude, Gemini, DeepSeek, Tongyi Qianwen, and Kimi continue to enhance their capabilities in understanding complex instructions, generating high-quality text, and performing deep reasoning. The entry of new players like Xiaomi’s MiMo and Meituan’s LongCat signifies that large models have spread from pure AI companies to the entire tech industry.

In the AI content production sector, tools for AI novels, scripts, images, and music are maturing, allowing one person to accomplish what previously required a team for content creation.

These changes point to a trend: AI is evolving from a “tool” to “infrastructure,” much like electricity and the internet, no longer an optional means of efficiency enhancement but a fundamental prerequisite for work methods.

Who Will Be Eliminated and Who Can Seize Opportunities

Finally, some hard truths.

For individual developers, refusing to use AI programming tools is not “upholding tradition” but “actively giving up competitiveness.” With 92% of American developers using them daily, if you don’t, your productivity will lag significantly. However, blindly trusting AI-generated code and deploying it without review is also dangerous. AI-generated code may contain security vulnerabilities, logical errors, and performance issues; human judgment and review capabilities remain the last line of defense.

For entrepreneurs, the current window of opportunity lies in the deep integration of “AI + vertical scenarios.” The competition among general AI programming tools is fierce, but in specific industries (like financial compliance code, medical data analysis, game development), AI programming solutions combined with domain knowledge remain a blue ocean.

For technical team managers, the key decision is not “whether to use AI” but “how to establish workflows and quality control systems for the AI era.” Code reviews, security audits, and architectural decisions cannot only not be replaced by AI, but they also become more critical due to AI’s extensive involvement.

Vibe Coding is not the end; it is the beginning. In the next two to three years, we are likely to see AI agents evolve from “assisting programming” to “autonomously completing projects,” AI search evolve from “replacing search engines” to “replacing most research work,” and AI content tools evolve from “assisting creation” to “autonomous creation.”

Developers will not disappear, but the definition of “developer” will be completely rewritten. Those who adapt to this change will find their productivity amplified tenfold. Those who resist this change will find themselves left behind by the times.

The choice is in everyone’s hands.

Claude Acknowledges Excessive Charges, Users Report Up to 20x Overbilling

Wed, 01 Apr 2026 00:00:00 +0000

Claude Acknowledges Billing Issues

The issue of excessive charges in Claude Code is not an isolated experience. Following a wave of complaints from Reddit users about Claude Code’s excessive billing, Anthropic has finally responded:

We have noticed that users are reaching the usage limits in Claude Code much faster than expected. The team is urgently investigating this issue, which is currently our highest priority, and we will update you as soon as possible.

In short, there is a problem, and it is significant, and they are working on it.

Interestingly, many users do not perceive this as a genuine official response, but rather as a forced admission.

The Trigger

The situation escalated when a user reverse-engineered Claude Code and discovered two independent bugs, posting their findings on Reddit. These bugs can cause the prompt cache to fail, leading to inflated billing costs without the user’s awareness, potentially increasing costs by 10 to 20 times.

So, it’s not that users are overusing the service; rather, Claude Code is “secretly overcharging.”

Insufficient Plans

Tokens are often likened to utilities in the new era, but using them feels more like early phone bills or data packages: it seems like you haven’t used much, yet it’s never enough.

A subscription costing over $100 a month is already not cheap, and now it seems they are charging excessively. A simple greeting can consume 13% of the quota.

Working for just 11 minutes can exhaust 23% of the usage.

The most outrageous case reported is that a single prompt can consume 31% of the quota.

Even the highest-tier $200 monthly plan doesn’t fare much better, reaching its limit in just three and a half hours.

A user subscribed to Claude Pro (annual fee of $200) complained on Discord:

I usually hit my limit by Monday and have to wait until Saturday for it to reset… I can only use it for about 12 days out of 30.

At this rate, Claude Code has become nearly “unusable” in just a couple of days. For those who rely on AI for work, the inability to use it is more frustrating than the financial cost.

This issue isn’t unique to Anthropic; many users report that the major providers often exhaust monthly quotas within the first three weeks, even without heavy usage.

Systemic Problem

However, this time, Claude Code seems to have crossed a line—Reddit is flooded with complaints about this issue.

It’s clear now that the issue of “not being durable” is not just an individual experience but a systemic problem. It’s no wonder that Anthropic felt compelled to clarify the situation.

According to analysis by The Register, there may be three main reasons for this issue:

Last week, Anthropic announced that they would reduce quotas during peak times. This means that during peak hours, the same usage behavior corresponds to a lower available quota, making users feel like they are using it faster.
- Peak hours are from 08:00 to 14:00 ET on weekdays.
March 28 was also the last day of a promotional event for Claude. During non-peak hours, users could double their usage quota, but now that the promotion has ended, users’ quotas have reverted to normal levels, leading to a noticeable reduction in available usage.
The two bugs discovered by Reddit users:
- The sentinel replacement mechanism in the independent binary disrupts the billing logic when conversations involve it, breaking the cache.
- The resume parameter always leads to cache failure (since v2.1.69).

These issues mean that the prompt cache cannot function properly, resulting in repeated calculations for the same requests, which can inflate token consumption passively by over 10 times.

Some users have reported that downgrading to an older version has improved their experience significantly.

Downgrading to 2.1.34 made a noticeable difference. Have any users tried this? Let’s discuss.

Continuity Over Capability

In essence, the current issue with Claude is not about the strength of the model but whether users can rely on it consistently.

The better the model, the more it is used, and the more critical it becomes for workflows. While users may feel the pinch of rising costs, many are still willing to pay for high-quality responses, and the capabilities of A’s model are well recognized.

However, the problem lies in the rising consumption without an accompanying improvement in experience; costs are increasing, but user feedback is not visible, and ongoing delivery is lagging.

Users have pointed out that Anthropic’s customer service struggles with even the most basic token management.

More critically, users have been providing feedback for several days, and the bugs were identified two days ago, yet the response was only to say they were investigating.

In contrast, competitors like OpenClaw are making regular updates, often fixing issues overnight.

This raises a very real question: in the age of AI, the capability of the model may no longer be the rarest commodity. What is more scarce is the ability to deliver consistently, respond quickly to users, and take feedback seriously.

By the way, if Claude’s code is all AI-written, perhaps they should hire more people for customer support.

Vibe Coding Removed from App Store: What's Next?

Wed, 01 Apr 2026 00:00:00 +0000

Vibe Coding Removed from App Store: What’s Next?

In March 2023, Apple completely removed the Vibe Coding app, Anything, from the App Store, marking a significant setback for its survival in a closed ecosystem. This article delves into the core of this conflict—Apple’s Guideline 2.5.2 and its fundamental incompatibility with AI-generated code logic. As the platform insists on a static review framework, entrepreneurs are forced to make tough choices between web-based survival and migrating to Android. This situation represents not only a technical battle but also a real challenge to the monopolistic review power of app stores.

Anything’s co-founder and CEO, Dhruv Amin, stated that the app had previously helped users publish thousands of applications on the App Store, including management systems for emergency responders and reimbursement tracking tools designed for gig economy workers.

According to reports, prior to Anything’s removal, Apple had already imposed update freezes on similar applications like Replit and Bitrig, indicating a systematic tightening of the Vibe Coding category. Apple maintains that this action is merely enforcing existing rules to prevent apps from introducing new features without review. However, critics argue that this review framework, designed for static apps, is fundamentally incompatible with the underlying logic of AI-generated content.

Amin remarked, “This is the problem with Apple and closed platforms—either they are making a mistake, or they decide that your category is not allowed to exist.” He is currently evaluating a shift to Android, while other teams have already turned to pure web development. The future of Vibe Coding is becoming increasingly clear.

After Launching Thousands of Apps, Apple Suddenly Changes Course

Last August, Anything entered the market as a browser-based Vibe Coding tool. Vibe Coding allows individuals without programming experience to generate applications directly through AI—users describe their ideas, and the code is automatically produced. In November, Anything launched its iPhone client, and the App Store review team raised no objections, allowing it to be released smoothly.

In the following months, Anything continued to update, and users had published thousands of applications on the App Store through this tool. These included valuable products such as a management system for emergency responders and a reimbursement tracking tool for gig economy workers, demonstrating that Vibe Coding is not merely a toy-level technical experiment.

The turning point occurred in mid-December 2022 when Apple’s review team began rejecting every update submitted by Anything, citing violations of Guideline 2.5.2. This was less than two months after the iPhone version was launched. Amin attempted to compromise by moving the Vibe Coding preview feature from the app to the web browser to avoid controversy. Apple not only rejected this submission but also removed the entire app from the App Store in March 2023.

From initial approval and operation to update freezes and final removal, the entire process took less than six months. Before Anything’s app was officially removed, reports indicated that Apple had blocked updates for multiple Vibe Coding applications. Shortly thereafter, Anything faced a more thorough removal.

Meanwhile, Replit and Bitrig, also part of the Vibe Coding category, remain on the App Store but are similarly unable to update—Replit’s last update was in January, and Bitrig’s was in November 2022. Apple’s attitude towards this category reflects a systematic tightening.

Guideline 2.5.2: A Rule That Closes Off a Category

Apple’s sole reason for the removal was Guideline 2.5.2, which states that applications must be “self-contained within their installation package” and must not read or write data outside designated container areas, nor “download, install, or execute code that introduces or alters application features and functions.”

The original intention of 2.5.2 was to prevent developers from bypassing App Store reviews and silently pushing unreviewed feature changes on user devices. This logic is reasonable—within the context of mobile security, applications expanding permissions without review indeed need to be constrained. The problem arises when this rule is applied to the Vibe Coding category, as its reach far exceeds its original design intent.

The core mechanism of Vibe Coding tools is to generate and execute code at runtime via AI. Users describe their needs, and the model outputs logic, with the application presenting results in real time. This process naturally falls into the prohibited zone of 2.5.2—each generation is akin to pushing “unreviewed new features” to the device. In other words, as long as Vibe Coding remains Vibe Coding, it cannot operate on iPhone without violating this rule.

Apple’s statement is that the company is not targeting the Vibe Coding category but is merely enforcing existing rules to prevent applications from making substantive changes without review. While this explanation is technically sound, it sidesteps a critical question: why should a rule designed for static applications be applied to AI tools that generate dynamic content?

Anything attempted a compromise path by migrating the code preview feature to a web browser, allowing AI-generated content to be displayed without executing directly within the native app. The logic behind this solution is that the browser itself is a sandbox environment, circumventing the local code execution restrictions of 2.5.2. Apple rejected this submission and subsequently removed the entire app. This indicates that Apple is not only enforcing rules but also narrowing possible exceptions.

For other developers, the current enforcement of this rule creates a highly uncertain situation. Apps like Replit and Bitrig remain available but cannot update; some teams, like Vibecode, have proactively abandoned iPhone development in favor of pure web solutions. The same rule produces vastly different enforcement outcomes, and Apple has yet to provide clear boundary explanations.

The Cost of a Closed Platform: How Can Entrepreneurs Coexist with Apple?

After Anything was removed, Amin stated, “This is the problem with Apple and closed platforms—either they are making a mistake, or they decide that your category is not allowed to exist.” This statement highlights a structural dilemma that entrepreneurs face in platform ecosystems, which is rarely addressed.

In the mobile internet era, the App Store is the only legal channel to reach iPhone users. For consumer-facing applications, losing this entry point is almost equivalent to losing the entire market. Before its removal, Anything had accumulated thousands of user-published applications through this channel, establishing a real product ecosystem. All these assets lost visibility to iOS users the moment the app was removed.

The unpredictability of the timeline is even more challenging. The iPhone version of Anything passed the App Store review team’s formal approval at launch, only to face a freeze months later. Approval does not guarantee long-term compliance; the interpretation of platform rules remains solely in Apple’s hands and can be redefined at any time. For early-stage startups, this uncertainty is nearly impossible to hedge through any conventional business planning.

Faced with this situation, entrepreneurs have few options. Amin is currently evaluating whether to shift focus to the Android platform, which means rebuilding the product on a new tech stack while bearing the friction costs of user migration. Another option is to completely transition to the web, bypassing all native app store controls—Vibecode has already made this choice, abandoning iPhone development. Both paths mean sacrificing the established iOS user base, which comes at a real cost.

From a broader perspective, Apple’s handling of the Vibe Coding category exposes the adaptability issues between platform rules and emerging technologies. The existing App Store review framework is designed for static, functionally fixed native applications. As AI blurs the boundaries of applications, the original review logic begins to fail—but the cost of this failure is borne by developers.

Apple itself has profit considerations. Xcode has recently integrated Anthropic’s Claude and OpenAI’s Codex, launching AI programming assistance features for professional developers. The core value proposition of Vibe Coding tools is to enable non-professional users to build applications directly, bypassing professional tools like Xcode. This competitive relationship complicates the interpretation of Apple’s stance on this category.

The Future of Vibe Coding Is Not in the App Store

Amin’s judgment is worth highlighting: “The scale of Vibe Coding will far exceed Apple’s current imagination.”

The essence of Vibe Coding is to lower the barriers to software production. When someone without any programming background can describe their needs in natural language and receive a runnable application, software development transforms from a specialized skill into a tool accessible to ordinary people.

This shift in magnitude parallels the democratization of financial modeling through spreadsheets and website building through no-code tools, representing a paradigm shift of the same scale. The App Store’s blockade cannot change this direction; it can only influence where it lands.

Currently, the direction is becoming increasingly clear: the web. Vibecode’s choice is representative—abandoning the iPhone native side and focusing on browser-based product experiences. This path circumvents the App Store’s review controls, at the cost of sacrificing some native experience and distribution benefits. However, for tools like Vibe Coding, the core value lies in the generation capability itself, rather than platform nativeness—the web can sufficiently carry this value.

From a distribution logic perspective, a web-first strategy is more flexible in the current environment. Users can access directly through links without going through any app store review nodes, and the speed of product iteration is not constrained by third-party approval cycles. This aligns perfectly with the pace needed for AI-native products—models are rapidly evolving, and products must be updated in sync; any review friction can lead to competitive delays.

Regulatory variables are also worth noting. Apple’s systematic blockade of emerging AI tools has drawn the attention of antitrust observers. In the context of ongoing scrutiny of large platform behaviors by regulatory bodies in Europe and the U.S., whether Apple’s actions constitute improper exclusion of competitive development tools is a question still under discussion. If regulatory pressure ultimately forces Apple to open sideloading or relax review standards, there may still be a window of opportunity for Vibe Coding tools to return to iOS.

However, until that day arrives, the main battleground for this category has quietly shifted. Anything is evaluating Android, while other teams are betting on the web, and the entire industry’s focus is moving away from the App Store as a singular entry point. Apple’s blockade has, to some extent, accelerated the diversification of the Vibe Coding ecosystem—likely not the outcome Apple intended.

Understanding Claude Code, Codex, and OpenClaw

Mon, 30 Mar 2026 00:00:00 +0000

Understanding Claude Code, Codex, and OpenClaw

Recently, a friend who is an independent developer asked me, “Are you using Claude Code or Codex? I’ve been struggling to choose between the two for almost a week.”

I replied: “You’re confused about the wrong direction.”

These two are fundamentally different, and with the emergence of OpenClaw, the entire discussion has reached a new level.

In the past three months, these three tools have sparked intense discussions among developers, probably the most I’ve seen in my over ten years in this field. However, most discussions have remained at the level of “which is better,” without clarifying the fundamental differences between them.

This article aims to clarify this matter.

Conceptual Framework

Before discussing each tool, I want to emphasize that these three products do not belong to the same level; comparing them directly is as odd as comparing VS Code and Docker.

They correspond to three different layers in the AI productivity stack:

First Layer, Brain: The large language models themselves, such as Claude, GPT, and DeepSeek, responsible for understanding and reasoning.
Second Layer, Hand: Programming agents like Claude Code and Codex, which integrate the capabilities of large models into your codebase, responsible for executing specific development tasks.
Third Layer, Operating System: Agent runtime platforms like OpenClaw, which schedule multiple tools and models, manage long-term tasks, and run continuously.

In simpler terms: Claude Code and Codex are employees, while OpenClaw is the company. The former helps you write code, while the latter manages this group of AIs working for you.

Claude Code: The AI Engineer That Understands Your Codebase Best

Claude Code is a terminal-native programming agent launched by Anthropic in May 2025, developing faster than many anticipated. By early 2026, it had become the most widely used product in the AI programming tools market—an almost 1000-participant survey showed it had a 46% approval rate, while the second-ranked Cursor only had 19%.

What Did Claude Code Do Right?

Its core design decision prioritized “understanding the entire codebase” over simply “writing a runnable piece of code.”

For example, if you take over a chaotic Node.js project from two years ago with sparse documentation and complex dependencies, and you ask Claude Code to fix a login authentication bug, a typical AI assistant would modify the pasted code directly and provide you with a local patch. In contrast, Claude Code first reads the CLAUDE.md (your project’s rules configuration file), scans related files, and understands the upstream and downstream relationships of the authentication logic within the entire system before making changes. It knows how changes in one area might affect others.

This difference may not be apparent when handling simple functions, but it becomes significant when dealing with real projects.

Subagents + Checkpoint: Two Key Features to Note

In the second half of 2025, Claude Code introduced two important mechanisms: Subagents and Checkpoint.

Subagents allow a complex task to be divided among multiple specialized AI instances for parallel execution. For instance, when refactoring an authentication module, one Subagent handles database migration, another modifies API routes, and a third manages frontend state changes, while the main Agent coordinates and integrates the results. Each Subagent has an independent context window, allowing up to 10 to run simultaneously without interference.

Checkpoint addresses another concern: the fear that AI might break the code. It automatically archives the current state before each modification, allowing you to revert to any historical point using the Esc Esc or /rewind command. With this safety mechanism, you can confidently assign larger and more complex tasks to it.

A Practical Detail

The CLAUDE.md file is often overlooked but is crucial. You can write the project’s tech stack version, prohibited libraries, database schema summaries, and code style rules in it. Statistics show that a well-written CLAUDE.md can reduce about 80% of the “Claude forgot” issues.

Use Cases

Claude Code is best suited for quickly getting up to speed with unfamiliar codebases, handling complex bugs across multiple files, performing systematic refactoring, and development tasks that require AI to truly understand your project’s overall structure rather than just executing local commands.

It offers comprehensive access methods: Terminal CLI, VS Code plugin (Beta version released by the end of 2025), web interface, and desktop app. Subscribing to Claude Pro (starting at $20/month) allows usage, and enterprise users can also deploy it privately via Bedrock or Vertex AI.

Codex: Taking Task Outsourcing to Another Level

In 2025, OpenAI launched Codex in April (not the previous code completion model, but a new software engineering agent) and subsequently released a macOS desktop app by the end of the year, with Windows versions following in 2026.

Fundamental Differences in How Codex and Claude Code Work

Claude Code operates on a “human-machine collaboration” model: you supervise its work in real-time, reviewing each step and adjusting directions as needed. This is a co-pilot mode where the human is in charge.

Codex, on the other hand, is about “task outsourcing”: you clearly describe a task, and it executes it autonomously in an isolated sandbox environment, returning results and a PR for your review. You don’t need to monitor it continuously.

This difference significantly impacts actual workflows. Codex is suitable for tasks where you know what needs to be done but don’t want to spend energy supervising each step. For example, you can say, “Help me complete unit tests for this module” or “Help me migrate the calling method of this old interface to the new version,” then move on to other tasks and return later to check the results.

Parallelism is Codex’s Core Advantage

Codex supports genuine multi-task parallelism: multiple Agent instances work in independent cloud sandboxes, each pre-installed with your codebase and development environment. If you have five independent tasks, you can start five Agents to process them simultaneously instead of queuing them.

The desktop app’s design philosophy is that of a “command center”: the left side displays the project list, while the right side shows all running Agent threads, allowing you to switch between tasks, check progress, and comment or manually modify in the diff view.

Safety Design is Another Priority for Codex

By default, Codex’s sandbox disables external network access, and file modifications are restricted to specified directories. This design is intentional—isolated execution and presenting results after completion is much safer than operating directly on your local environment. However, for tasks requiring internet access, network permissions can be manually enabled.

Additionally, Codex includes a code review feature that can automatically review your PRs directly on GitHub, acting like an asynchronous code reviewer.

Open Source CLI Version of Codex

If you want to run Codex in a local terminal, there is a fully open-source CLI version written in Rust, supporting npm and Homebrew installations, allowing configuration of local models (including Ollama) and MCP access to external tools. Its core logic is consistent with cloud Codex but is better suited for developers wanting complete control over the execution environment.

Use Cases

Codex is suitable for clear, well-defined development tasks (writing features, fixing bugs, writing tests); for those who wish to free their hands and wait for results asynchronously; for scenarios requiring multi-task parallelism; and for teams already deeply integrated into the ChatGPT ecosystem (account interoperability without requiring additional registration).

A ChatGPT Plus subscription ($20/month) includes Codex usage credits.

OpenClaw: Not a Tool, But an Operating System for Running AI

OpenClaw is the most difficult to define and the easiest to misunderstand among the three.

It is an open-source project released by Austrian developer Peter Steinberger in November 2025 under the name Clawdbot. After its release, it went viral, surpassing 240,000 GitHub Stars within two months, becoming one of the fastest-growing projects in GitHub history (without exception, surpassing React). It was later renamed Moltbot due to a trademark complaint from Anthropic, and after Steinberger felt the name was “too awkward to pronounce,” it was changed to OpenClaw three days later.

In February this year, Steinberger announced his joining OpenAI, and the project was handed over to the open-source foundation for continued maintenance.

What Exactly is OpenClaw?

In one sentence: it is a system that allows AI to continuously work for you.

It runs locally, connects to your chosen large language models (Claude, GPT, DeepSeek, local Ollama, etc.), and integrates this AI into over 20 messaging platforms like WhatsApp, Telegram, Slack, Discord, and iMessage. You send a message to the AI, and it executes tasks—reading files, running scripts, controlling browsers, sending emails, managing calendars, monitoring servers, etc.

The fundamental difference from Claude Code and Codex is that it is not a tool that works only when your computer is on and you are staring at the screen. You can set up a Mac Mini at home to run OpenClaw 24/7 and send messages to it from anywhere via your phone to have it help you with tasks.

Four Core Components

OpenClaw’s architecture consists of four parts:

Gateway: The entry point for receiving messages and distributing commands.
Agent: The core that executes specific tasks.
Skills: Expandable capability modules, with thousands available in the community-maintained ClawHub marketplace.
Memory: Persistent user preferences, project information, and historical context across sessions.

The Skills system is the most interesting part. You can install Skills written by others to extend the AI’s capabilities or write your own. The community has Skills for handling Solana wallets, automatically posting to Instagram, monitoring GitHub Actions, and more.

Why Many People Struggle to Use It

OpenClaw has higher requirements for users; it is not a tool that you can just install and use.

The most common mistake is throwing a vague task at the AI, such as “help me manage my work.” The AI does not know what that means. The correct way to use OpenClaw is to clearly design the workflow—what the trigger conditions are, what steps to execute, and how to provide feedback on the results—and then configure this process.

Another barrier is the design of Skills. Good Skills are atomic and have single responsibilities; many beginners mix too much logic in their Skills, making it difficult to troubleshoot when issues arise.

OpenClaw’s maintainer, Shadow, once said on Discord, “If you don’t know how to run commands in the command line, this project is already too dangerous for you to use safely.” This statement is very straightforward, but it’s true.

Security Issues: A Necessary Discussion

The biggest controversy surrounding OpenClaw in recent months has been security issues.

After its launch in November last year, the first critical vulnerability (CVE-2026-25253, CVSS score 8.8) was discovered in January this year—an attacker could induce you to visit a malicious webpage, allowing JavaScript to connect to your local OpenClaw gateway via WebSocket, stealing authentication tokens and gaining complete control over your entire Agent, including disabling the sandbox and executing arbitrary commands.

In the following weeks, several other CVEs were disclosed, involving command injection, path traversal, Webhook authentication bypass, and more. The ClawHub Skills marketplace also found hundreds of malicious skill packages disguised as legitimate tools, executing data theft or installing keyloggers in the background.

Security research institutions have scanned and found that at one point, over 130,000 OpenClaw instances were directly exposed on the public internet, most of which had no authentication configured. The Ministry of Industry and Information Technology of China also issued a security warning in March this year, urging government agencies and state-owned banks to limit usage.

Currently, the recommended minimum secure version is 2026.2.26; if you are still running earlier versions, please update immediately.

It is important to clarify: these security issues do not imply that OpenClaw’s core product concept is flawed. The root of the problem lies in the combination of “great capabilities, loose default configurations, and rapid deployment”—any system with superuser privileges will encounter issues if it defaults to no authentication and unrestricted access. The team’s response speed has been quite fast, with most CVEs patched within 24 hours of disclosure.

However, this also indicates that OpenClaw is not suitable for casual installation and use. If you plan to deploy it in production, you need to seriously enhance security.

Use Cases

OpenClaw is suitable for technically capable users who can configure and maintain it for: 24/7 automation tasks (monitoring alerts, scheduled inspections, automatic daily reports); cross-platform message-triggered workflows; personal automation assistants (remotely controlling local servers via phone messages); and model-agnostic scenarios (wanting to choose models and retain data sovereignty).

It is free under the MIT license, but costs for running local models or calling cloud APIs are borne by the user, with light usage costing about $10-30/month.

Comparison Summary: A Table of Differences

Claude Code	Codex	OpenClaw
Positioning	AI programming agent	Automated programming engine
Working Method	Human-machine collaboration, you supervise	Task outsourcing, wait asynchronously
Main Interface	Terminal/IDE/Web	Terminal/Desktop App/IDE
Codebase Understanding	Strong	Strong
Parallel Capability	Subagents, up to 10	Multiple sandboxes in parallel, no hard limit
Open Source	No	CLI partially open source
Security Maturity	High	High
Learning Curve	Medium	Medium-high

Which One Should You Use?

If you are a coding developer, Claude Code is the top choice. It has the deepest understanding of codebases, is the easiest to get started with, and integrates best with daily development workflows.

If you have a bunch of well-defined development tasks, such as writing a batch of tests or migrating old interfaces, and you don’t want to supervise the process, Codex is the more suitable option. Asynchronous, parallel, and freeing your hands is its core value.

If you want AI to do a wider range of tasks for you—not just coding, but also automating operations, scheduled tasks, and cross-system collaboration, and you have the ability to ensure security—OpenClaw is the only option. It represents a different way of working with AI: not you using AI, but AI continuously working for you.

If you want to play with advanced combinations, there is a mature approach: use OpenClaw as the scheduling layer, triggering tasks that call Claude Code or Codex to execute specific programming tasks, and then have OpenClaw summarize results and send notifications. This is a true AI Agent architecture, with each of the three layers performing its role.

Final Thoughts

I have observed many developers stumbling over these three tools, and the most common issue is not choosing the wrong tool, but choosing the wrong level of usage.

Using Claude Code to “automatically manage all work”—that is what OpenClaw is designed to do; Claude Code is not intended for that. Using OpenClaw for simple bug fixes—where the complexity of configuration is not worth it—can be handled by Claude Code in two minutes.

Tools are not superior or inferior; they only fit different needs. Choosing the right level and using the right scenarios is the true path to efficiency improvement.

These three products represent not just three tools but three depths of AI involvement in development work: coding assistance, task hosting, and continuous autonomy. Where you currently stand depends on how much trust you are willing to place in AI and how much time and capability you have to manage it.

It’s not you using AI; the future is you managing a group of AIs. This shift is happening, and all three tools are early samples of this process.

Understanding Artificial Intelligence: Core Capabilities and Applications

Sun, 29 Mar 2026 00:00:00 +0000

What is Artificial Intelligence?

Artificial Intelligence (AI) is a core branch of computer science aimed at enabling machines to simulate, extend, or even surpass human intelligence. The goal is to allow machines to autonomously complete complex tasks that typically require human intelligence.

AI is not a single technology but a system that integrates algorithms, data, and computing power. Its core lies in granting machines the abilities of learning, reasoning, perception, and decision-making, transforming them from mere tools executing commands to intelligent agents that can adapt to environments and solve problems.

Core Essence of AI: Simulating Human Intelligence

The essence of AI is not about making machines look like humans but about endowing them with key characteristics of human intelligence, centered around four main capabilities:

1. Learning Ability: Autonomous Pattern Recognition from Data

This is the most fundamental capability of AI, distinguishing it from traditional programs that execute fixed rules. AI can autonomously identify hidden patterns through extensive data training, rather than relying on pre-written instructions.

Example: Traditional programs require predefined characteristics to recognize a cat (e.g., pointed ears, whiskers, tail). In contrast, AI can learn to identify a cat by analyzing thousands of images without prior definitions.
Typical Applications: Recommendation systems (e.g., Douyin, Taobao) and spam filtering.

2. Reasoning and Decision-Making Ability: Solving Complex Problems Based on Patterns

Once AI understands patterns, it can perform logical reasoning, analysis, and ultimately make decisions, rather than mechanically executing steps.

Example: Medical AI analyzes CT scans and lab reports, combining them with medical databases to infer possible conditions and provide diagnostic suggestions. Autonomous driving AI assesses road conditions (traffic lights, pedestrians, vehicles) to decide whether to accelerate, brake, or turn.
Core Logic: Deriving unknown results from known data, simulating the human process of thinking and decision-making.

3. Perception Ability: Equipping Machines with Sensory Understanding

AI utilizes sensors, cameras, and microphones to perceive the external world, translating physical signals into information that machines can understand.

Examples:
- Computer Vision: Enables machines to interpret images and videos (e.g., facial recognition, security monitoring).
- Speech Recognition: Allows machines to understand human speech (e.g., Siri, Xiaoyi).
- Sensor Perception: Industrial robots use sensors to detect the position and temperature of objects, adjusting operational precision.

4. Adaptive and Evolutionary Ability: Dynamically Adjusting Behavior Based on Environment

Advanced AI continuously optimizes itself based on new data and environments, rather than remaining static. For instance, navigation software adjusts routes in real-time to avoid traffic congestion, demonstrating adaptive capability.

Example: AlphaGo not only learns human chess strategies but also evolves through self-play, eventually defeating top human players. Recommendation systems adjust content based on new user preferences, becoming increasingly attuned to individual tastes.

Core Technologies Supporting AI: The Three Pillars

The realization of the aforementioned capabilities relies on the synergistic functioning of three core technologies:

1. Algorithms: The Brain of AI

Algorithms form the core logic of AI, akin to human thought processes, with different types addressing various problems:

Machine Learning: A general method for enabling machines to learn from data, focusing on pattern recognition rather than hard-coded rules.
Deep Learning: A subset of machine learning that simulates the neural network structure of the human brain, capable of processing complex data (e.g., images, videos, speech).
Natural Language Processing: Algorithms that enable machines to understand and generate human language, addressing human-computer communication.
Computer Vision: Algorithms that allow machines to interpret images and videos, solving the problem of how machines perceive the world.

2. Data: The Fuel of AI

AI learning depends on vast amounts of data; the more data available and the higher its quality, the more accurate the patterns AI can identify. Without data, even the most advanced algorithms are ineffective, similar to how humans require reading and practical experience to learn.

Example: Speech recognition AI needs to analyze hundreds of thousands of hours of human speech to accurately recognize various accents and speaking speeds. Autonomous driving AI requires billions of kilometers of road data to learn how to handle complex scenarios.

3. Computing Power: The Engine of AI

AI training and reasoning require substantial computational power, especially deep learning algorithms, which involve massive matrix operations. Ordinary computers lack the necessary power, necessitating specialized hardware support, such as:

GPU (Graphics Processing Unit): Originally used for gaming graphics, GPUs excel in parallel computing and have become essential for AI training.
TPU (Tensor Processing Unit): A chip designed by Google specifically for deep learning, offering higher computational efficiency than GPUs.
Cloud Computing: Businesses and individuals can leverage cloud resources for AI model training without needing to invest in expensive hardware.

Common Applications of AI: Integrating into Daily Life

AI is no longer a concept confined to science fiction; it permeates various aspects of our daily lives and work. Here are some of the most common applications:

1. Consumer Applications: High-Frequency Daily Interactions

Smart Assistants: Siri, Xiaoyi, and Huawei’s Xiao Yi can understand voice commands to check the weather, set alarms, and send messages, fundamentally relying on speech recognition and natural language processing.
Content Recommendation: Platforms like Douyin, Taobao, and Bilibili use AI algorithms to recommend content based on your browsing and liking history, powered by machine learning.
Image Processing: Smartphones use AI for beautification, filters, and portrait modes, automatically recognizing faces and optimizing skin tones.
Smart Translation: Services like Baidu Translate and DeepL can quickly translate dozens of languages, often retaining the tone of the original text, thanks to natural language processing.

2. Industry Applications: Empowering Industrial Upgrades

Healthcare: AI-assisted diagnostics can rapidly analyze CT scans and pathology reports, helping doctors detect early-stage cancers and pneumonia, improving diagnostic efficiency and accuracy.
Autonomous Driving: Tesla, Xpeng, and Huawei’s autonomous driving systems use cameras and radar to perceive road conditions, making real-time decisions for tasks like following cars, changing lanes, and parking.
Industrial Production: AI-enabled industrial robots can achieve precise sorting, welding, and quality inspection, even predicting equipment failures to enhance production efficiency.
Financial Services: AI aids in risk control by analyzing consumer and credit data to assess loan risks and detect credit card fraud and financial scams.
Education: AI-powered personalized tutoring can suggest tailored exercises and explanations based on students’ learning progress, as seen in platforms like Yuanfudao and Zuoyebang.

3. Frontier Exploration: Pushing the Boundaries of Human Capability

AI in Research: AlphaFold solved the protein folding problem, aiding scientists in understanding disease mechanisms and developing new drugs.
AI in Creation: Tools like MidJourney and Stable Diffusion generate images from text, while iFlytek’s Starfire can write articles, code, and poetry, facilitating AI-assisted creativity.
AI in Exploration: AI analyzes cosmic and oceanic data, helping humanity explore unknown territories, such as searching for extraterrestrial signals and monitoring deep-sea ecosystems.

Key Classifications of AI: Development Path from Weak to Strong

AI development is distinctly categorized into stages, primarily based on its capabilities from weak to strong. Currently, we are still in the weak AI phase:

1. Weak AI

Definition: AI focused on specific tasks, lacking general cognitive abilities and self-awareness.
Characteristics: Excels in a particular domain but cannot transfer knowledge across domains. For example, AlphaGo can play Go but cannot write articles; an image recognition AI cannot drive.
Current Status: All existing AI applications fall under weak AI, including Siri, autonomous driving, and AI art generation.

2. Strong AI

Definition: AI with general intelligence comparable to humans, capable of understanding and learning knowledge across various fields, thinking flexibly, and potentially possessing self-awareness and emotions.
Characteristics: Can transfer knowledge across domains, such as coding, medical diagnosis, and music creation, akin to human intelligence.
Current Status: Still in the theoretical exploration stage, not yet realized, and remains a long-term goal in AI research.

3. Superintelligent AI

Definition: AI that surpasses human capabilities in nearly all domains, including scientific innovation, social skills, and artistic creation, potentially reaching intelligence levels beyond human comprehension.
Characteristics: Capable of solving complex issues like climate change and diseases, which humans struggle with, but may also pose potential risks.
Current Status: A topic of science fiction and futurism, lacking a technological foundation and primarily a speculative concept for the future.

Core Boundaries of AI: Limitations and Misconceptions

Many misconceptions exist about AI, with some believing it can think and feel like humans or even replace them. In reality, AI has fundamental limitations:

1. AI Lacks Self-Awareness and Emotions

All AI actions are based on algorithms and data; they do not possess self-awareness or emotional understanding. For instance, AI can generate sad text but does not experience sadness; it can recognize angry expressions but does not comprehend the meaning of anger.

2. AI Relies on Data and Lacks True Creativity

AI’s creativity is fundamentally a reorganization of existing data, not genuine originality. For example, AI-generated art is based on vast image datasets and cannot create entirely new artistic styles based on life experiences and emotions like human artists can. Similarly, AI-written articles are structured based on existing content and cannot produce genuinely profound original insights.

3. AI Decisions Are Based on Probability, Not Understanding

AI decisions rely on probability distributions from data rather than true comprehension. For instance, a medical AI diagnosing cancer does so by comparing a patient’s data to that of numerous cancer patients, identifying similar features, rather than understanding the underlying pathology as a doctor would.

4. AI Capabilities Are Highly Contextual and Data-Dependent

AI can only perform effectively within trained scenarios; if a situation exceeds its training, it may fail. For example, an autonomous driving AI trained in clear weather may struggle in extreme weather conditions like heavy rain or snow. Similarly, a speech recognition AI may accurately understand standard Mandarin but struggle with dialects or heavy accents.

Conclusion: AI as a Tool to Empower Humanity

The essence of artificial intelligence is not to replace humans but to extend human capabilities, helping solve complex, repetitive, and high-risk problems, allowing humans to focus on innovation, emotions, and decision-making.

From a Technical Perspective: AI combines algorithms, data, and computing power, primarily enabling machines to learn, reason, and perceive.
From an Application Perspective: AI serves as a tool to empower various industries, enhancing efficiency, reducing costs, and pushing the boundaries of human capabilities.
From a Development Stage Perspective: We are still in the weak AI phase, with strong and superintelligent AI as long-term goals, indicating a long journey ahead.

In simple terms, artificial intelligence aims to equip machines with human-like intelligence to assist in tasks that typically require human thought and action, ultimately serving human life and societal development.

last30days-skill: Streamline AI Community Insights

Thu, 26 Mar 2026 00:00:00 +0000

What is last30days-skill?

last30days-skill is a rapidly emerging open-source AI research skill on GitHub, created by developer mvanhorn. The project can be found at GitHub Repository.

In essence, this is a “skill module” that runs on Claude Code or a compatible AI Agent framework. You can ask it about a topic (e.g., “Best Prompts for Claude Code” or “Cursor vs Windsurf”), and it will automatically scrape discussions from over 10 platforms, including Reddit, X, Hacker News, YouTube, TikTok, Bluesky, and Polymarket, from the past 30 days, synthesizing a comprehensive intelligence report complete with sources, ratings, and usable prompts.

The core logic of the project is that real community discussions are closer to the “truth” than official documentation. While official documentation tells you what a tool can do, posts on Reddit and HN reveal where the pitfalls are and what the truly effective practices are.

The project is licensed under the MIT open-source license and is completely free, but you need to configure your own APIs. Initially covering only Reddit and X, it now supports over 10 sources and has integrated predictive markets (Polymarket), evolving rapidly.

Community Popularity and Project Scale

The growth rate of this project has surprised me. Here are some numbers:

Metric	Data	Notes
GitHub Stars	9,300+	As of March 2026, still rapidly growing
First Version	Late 2025	From 0 to 9k stars in a short time
Supported Platforms	10+	Reddit/X/HN/YouTube/TikTok, etc.
Primary Language	Python (98.9%)	Shell support
Open Source License	MIT	Fully commercializable

The speed from 1.5k stars to over 9k is quite impressive for open-source AI tools. Community feedback has been largely positive—most discussions in GitHub Issues are about feature requests rather than bug complaints, indicating a solid foundation.

However, as for revenue, this is a purely open-source project, and the author’s monetization is non-existent. The costs incurred from using it mainly come from the APIs you configure.

Core Features

Multi-Platform Parallel Search

It performs simultaneous searches across Reddit, X (Twitter), Hacker News, Polymarket, YouTube, TikTok, Instagram, Bluesky, and more than 10 platforms in one query. This is true parallel scraping, not sequential searching.

Two-Stage Intelligent Search Architecture

The first stage is “broad discovery”—using wide-ranging keywords to cast a wide net; the second stage is “intelligent supplementary search”—identifying subtopics worth digging deeper into based on the first stage’s results and adding precise queries. This design ensures a more comprehensive coverage of results.

Composite Scoring Model

All scraped results go through a scoring pipeline:

Text relevance (35%)
Interaction heat (25%)
Source authority (20%)
Cross-platform convergence (10%)—same topic appearing on multiple platforms gets extra weight
Timeliness decay (10%)—newer content has higher weight

Predictive Market Integration

This feature impressed me. It integrates with Polymarket’s Gamma API to pull in “real money betting prediction data.” When most people are betting real money on the success of an AI tool, this signal is much more credible than general post discussions.

Comparative Analysis Mode

Supports X vs Y style queries, such as /last30days cursor vs windsurf, generating a side-by-side comparison report that includes the pros and cons, community sentiment, and discussion trends. This is very useful for competitive research.

Dashboard and Automatic Monitoring

You can set up topic monitoring lists for periodic automatic re-research, suitable for users needing to continuously track a specific area.

Automatic Archiving

Each run’s results are automatically saved to a local document library for easy historical reference.

Target Audience

User Type	Recommendation Index	Use Case
AI Content Creators/Bloggers	Quickly grasp community sentiment on an AI tool, with verifiable sources
Developers/Technical Decision Makers	Competitive research before technical selection, understanding real community feedback
Product Managers/Competitive Analysts	Quickly understand market sentiment on competitors, saving manual tracking time
Prompt Engineers	Find community-validated optimal prompts without repeating pitfalls
Researchers/Analysts	Quickly obtain community viewpoints with sources to assist in report writing
General Users	High configuration threshold; not recommended for non-technical users

Honestly, this tool has a clear “technical user” attribute. If you don’t understand command lines or can’t configure API keys, getting started can be quite painful. However, if you are a developer or content creator who frequently needs to conduct research, it is definitely worth spending an hour to set it up.

Application Scenarios

1. AI Tool Research (Most Common)

Want to know how the community is rating “Cursor” recently or how “DeepSeek” is viewed on Reddit? One command can do it:

/last30days DeepSeek R2 for coding

In 5 minutes, you’ll receive a comprehensive report with source links, key discussion points, and actual usage tips.

2. Technical Selection Comparison

/last30days claude code vs cursor vs github copilot

This will pull recent discussions of the three tools across platforms, automatically generating a comparison report to present directly in technical selection meetings.

3. Content Creation Topic Selection

As an AI blogger, I now use it to “find topics.” I check what the AI community has been discussing over the last 30 days, and which topics have high cross-platform convergence, then I write about those. This is much more efficient than mindlessly scrolling through information feeds.

4. Best Practices for Prompts Collection

/last30days best system prompts for claude code

It will help you find community-validated prompt patterns from Reddit, HN, and X, organized into a directly reusable format.

5. Competitor Sentiment Monitoring

For startup teams, running /last30days [competitor name] regularly can quickly reveal the latest complaints and praises from users about competitors, guiding product optimization direction.

Differences from Similar Tools

Currently, there are many tools that provide similar “real-time information aggregation + AI summarization,” but last30days has several unique aspects:

Feature	last30days-skill	Perplexity	Exa.ai	BrightData
Positioning	Open-source agent skill	Commercial AI search	Commercial semantic search	Enterprise data platform
Platform Coverage	10+ (including social media)	Primarily web pages	Primarily web pages	Customizable
Polymarket Integration	Yes	No	No	No
Local Deployment	Yes	No	No	No
Cost	Open-source free (API paid)	Subscription $20+/month	Subscription	Enterprise pricing
Output Format	Structured reports + Prompts	Answers + Links	Semantic search results	Raw data

Core Differences:

vs Perplexity: Perplexity is better for quick Q&A, while last30days excels in community intelligence gathering; Perplexity lacks deep coverage of social media like Reddit/TikTok and does not integrate Polymarket.
vs Manual Search: This is the biggest competitor. Manually checking Reddit/HN/X takes 2-3 hours; using last30days takes 5-10 minutes.

Usage Tips

Tip 1: Configure SKILL.md for lifelong benefits

In the .claude/skills/last30days/ directory, there is a SKILL.md where you can preset your common research preferences (technical/community/business). Configure it once for more precise results in future queries.

Tip 2: Use --deep mode for important research

The default mode balances speed and depth. If you are conducting important technical selection or competitive analysis, add the --deep parameter to run an additional round of supplementary searches, significantly improving result quality, though it will increase the time from 5 minutes to 10-15 minutes.

Tip 3: Cross-platform convergence is the strongest signal

The output report includes a “cross-platform convergence” metric. When the same viewpoint appears on Reddit, HN, and X simultaneously, the confidence level is highest. Focus on these high-convergence conclusions rather than single hot posts from one platform.

Tip 4: Use the dashboard feature for continuous monitoring

If you have several topics you want to keep an eye on (e.g., “our competitors’ dynamics”), set them in the monitoring list and run them regularly with CI/CD to automatically generate daily intelligence reports without manual operation.

Tip 5: Use --emit=json to integrate with your knowledge base

Output in JSON format, then write a simple script to push the results to Notion, Feishu, Obsidian, etc., building your own AI community intelligence archive that becomes more valuable as you accumulate data.

Value for Enterprises and Individuals

For Enterprise Users:

Value Point	Description
Automated Competitor Monitoring	Daily automatic tracking of competitor community sentiment, eliminating reliance on manual research
Data-Driven Technical Selection	Community-validated data supports technical decisions, reducing risks
Market Intelligence Efficiency	Saves analysts significant manual information collection time
Empowering Content Teams	Community data supports content topic selection and prompt optimization

For Individual Users:

Value Point	Description
Remedy for Information Anxiety	Reduces daily information feed scrolling time from 2 hours to 10 minutes
High-Quality Writing Material	Each conclusion comes with original source links, ensuring verifiable citations
Community Awareness in Tech Circles	Quickly grasp the latest community trends even without active participation
Best Practices for AI Tools	Avoid pitfalls by using community-validated prompts directly

A developer shared that using last30days for technical selection compressed two days of competitive research into half a day, with more comprehensive conclusions due to coverage of platforms that previously lacked time for manual checks. This efficiency boost is tangible.

Cost of Use

The project itself is completely free, but you will need to pay for the following services:

Cost Item	Required	Estimated Cost	Description
ScrapeCreators API	Yes	$20-50/month	Core dependency for Reddit/TikTok/Instagram searches
Claude API / OpenAI API	Yes	Pay-per-use	AI synthesis capability, approx. $0.05-0.2 per query
X (Twitter) API	Optional	Free/Paid	Basic version is free; advanced searches require payment
Polymarket Integration	Optional	Free	Gamma API is free to use

Overall Cost Estimation:

Light Users (1-3 queries daily): approx. $30-50/month
Moderate Users (5-10 queries daily): approx. $80-150/month
Heavy/Enterprise Users: assess specific API usage

Comparison with Commercial Tools: Similar commercial intelligence tools (like Brandwatch, Mention, etc.) can easily exceed $500+/month, making last30days a cost-effective option. Of course, there are differences in functional depth, mainly depending on your needs.

Official Website and Installation Links

Resource	Address
GitHub Repository	GitHub Repository
Quick Install (Claude Code Plugin)	`/plugin marketplace add mvanhorn/last30days-skill`
Manual Install (Git Clone)	`git clone https://github.com/mvanhorn/last30days-skill.git ~/.claude/skills/last30days`
Official Documentation (SPEC.md)	See GitHub repository root directory
Community Discussion	GitHub Issues / Hacker News / Reddit r/ClaudeAI

Minimum Environment Requirements:

Node.js 18+
Python 3.10+
Claude Code or compatible AI Agent framework

Overall Evaluation

last30days-skill addresses a real pain point: in the information explosion of the AI era, how to efficiently extract valuable signals from noisy community discussions. It is not a silver bullet; configuration has a threshold, and search results are not 100% accurate—but it automates the task of “multi-platform manual research,” and being free and open-source makes it well worth your time to set it up once. For me, the most valuable feature is the integration of the Polymarket predictive market—those betting real money are often closer to the truth than keyboard warriors.

Rating (Out of 5 Stars)

Usability:
(3/5) Command-line operation, API configuration has a threshold
Functionality:
(5/5) Coverage of 10+ platforms + intelligent synthesis, top in the industry
Stability:
(4/5) Depends on external APIs, occasional fluctuations
Scalability:
(5/5) Open-source MIT, fully customizable, active community
Cost-Effectiveness:
(5/5) Open-source free, API costs far lower than commercial tools

Overall Score: 4.4/5 Stars

One-Sentence Summary

If you spend over 2 hours a week manually scrolling through Reddit/HN/X for AI community intelligence, last30days-skill is the tool you need—compressing a 2-hour information chore into a 10-minute command.

Stanford Introduces 'Vibe Coding' Course for AI-Driven Software Development

Sat, 14 Mar 2026 00:00:00 +0000

Stanford’s New Course: Vibe Coding

In Silicon Valley, a self-deprecating joke is becoming reality. While many are still struggling with coding errors, Stanford has officially announced a course teaching ‘Vibe Coding’.

Recently, Stanford’s Computer Science Department launched a course titled CS146S: The Modern Software Developer. This course does not teach you how to write red-black trees or wrestle with assembly language; its sole mission is to transform you from a “code writer” into a super individual who can command AI to produce and manage complex systems.

After reviewing the course outline, one can’t help but marvel at how the underlying logic of software engineering has shifted.

What is Vibe Coding?

Vibe Coding describes an extremely smooth development state: developers no longer type out logic line by line but instead command AI through high-quality dialogues (prompting) to generate thousands of lines of code in an instant.

Many might think this is just being a “high-level package user”. Can this even be taught at Stanford?

The course description for CS146S elevates this perception: “From IDE to terminal, from testing platforms to operational monitoring, every stage of the software development lifecycle is being reshaped.” The core goal of this course is not to make you code faster but to teach you two key things:

Make AI a controllable, reusable, and auditable ’engineering teammate’: rather than a black box that might spout nonsense at any moment.
Ensure quality standards under the premise of AI generating code: ensuring that the generated products not only run but are also safe, robust, and capable of generating revenue for the company.

10-Week Intensive Training: From ‘Keyboard Warrior’ to ‘AI Manager’

The pace of this course is rapid, overturning traditional programming perceptions each week. Here’s a breakdown of the course’s six core stages:

Stage 1: From Mysticism to Science (Week 1–2)

Many students believe writing prompts is purely luck, but Stanford teaches you to prove it through experimentation.

Core Focus: Coding LLM and MCP. The emphasis shifts from prompt techniques to Model Context Protocol (MCP)—equipping AI with standardized “hands” and “eyes” to legally call external tools.

Stage 2: Context Engineering is the Primary Productivity (Week 3–4)

If you complain that AI fails to write long code, it’s because you don’t know how to “feed it properly”.

New Mantra: “Specs are the new source code”.

Practical Application: Directly engage with Claude Code. Your training goal shifts from being a keyboard warrior to an Agent Manager. You must learn to write constraints and rules for AI, rather than just hitting “retry”.

Stage 3: AI Takes Over the Command Line (Week 5)

Using just a cursor is outdated. Stanford teaches you to use AI terminals like Warp to handle system tasks. IDEs are for writing, while terminals are for executing and managing.

Stage 4: Security is the Main Course (Week 6–7)

This is a pivotal point in the course. Stanford has invited the CEO of Semgrep to oversee.

Hard Warning: Beware of Prompt Injection leading to Remote Code Execution (RCE).

Reality Check: The more AI generates, the more important verification and protection become.

Stages 5 and 6: Not Only Create but Also Maintain (Week 8–9)

This part is particularly valuable. Many learn AI coding just to produce demos, but Stanford has thought through what happens post-deployment: monitoring, observability, and automated fault response. This course extends AI into On-call and DevOps scenarios, training high-level talent capable of managing the entire loop.

The High Bar for Entry: You Can’t Just ‘Vibe’ Your Way In

After reviewing this outline, you might think, “I don’t need to learn programming; I can just vibe at Stanford.”

Naive.

Take a look at the prerequisites listed on Stanford’s official website, which are enough to deter novices:

Foundational Knowledge: It is strongly recommended to complete CS111 (Operating Systems) first.
Design Experience: A deep understanding of complex software design, open-source projects, and GitHub is required.
AI Theory: Completion of CS221 (Artificial Intelligence) or CS229 (Machine Learning) is suggested.

What does this mean?

It means Stanford believes that someone who doesn’t understand operating systems, memory allocation, or has never written complex code is not qualified to discuss ‘Vibe Coding’. When directing a group of AI agents, if you don’t understand distributed architecture, you cannot discern whether the AI is providing a “brilliant solution” or a “deadly trap”. AI lowers the barrier for ‘brick-moving’, but it significantly raises the bar for ‘commanders’.

Three Job Market Insights from Stanford

For international students job hunting in the U.S., facing layoffs at Meta and Google, this course reveals three harsh but real signals:

The ‘Coder’ is Dying, and the ‘Architect’ is Delegated Programmers who only implement functions are becoming undervalued. Major companies now need individuals who can define problems (Write Spec) and design processes (Design Agent Workflow).
Security and Testing are the New ‘Iron Rice Bowl’ Since AI-generated code is overflowing, those who can perform “code audits” and “threat modeling” will be the bedrock of companies.
Reconstruction of Full-Stack Definition Previously, full-stack meant Front-end + Back-end; now it encompasses everything from Prompt to Agent to automated operations.

Don’t Just Be a ‘Brick Mover’ in the Best of Times

Stanford’s course once again reminds us:

In this era, the iteration of tools is rapid, but the essence of engineering—quality, security, and maintainability—remains unchanged.

CS146S is not teaching students to “cut corners”; it is teaching them how to remain responsible for system outcomes in an age where AI is ubiquitous.

The barrier for “being able to code” is disappearing, while the ability to “define problems, master AI, and take responsibility for system outcomes” is becoming increasingly valuable.

So, the question arises: Are you ready to transition from a ‘coder’ to an elegant ‘AI shepherd’ in the face of AI’s relentless advance?

Cursor's Rapid Growth and Challenges in the AI Coding Tool Market

Mon, 09 Mar 2026 00:00:00 +0000

Cursor’s Rapid Growth

Cursor has become one of the most lucrative AI unicorns, with annual revenue projected to exceed $2 billion by 2025, according to a report from Bloomberg. Approximately 60% of its revenue comes from enterprise clients, including new users and existing clients increasing their subscriptions.

The company’s rapid growth has attracted attention from major venture capital firms like Accel, Andreessen Horowitz, and Thrive Capital. In a funding round led by Accel and Coatue last November, Cursor was valued at $29.3 billion, making it one of the most valuable AI startups in the U.S.

Cursor was founded in 2022 by four MIT alumni initially focused on building models to assist mechanical engineers in designing physical parts. However, they quickly pivoted to develop a popular product: a code editor that has gained significant traction. CEO Michael Truell described Cursor as the “Google Docs for programmers,” a collaborative editor where humans and AI improve code together.

Truell and his co-founders have closely followed OpenAI’s advancements in AI since 2020, even before the release of ChatGPT. They recognized the impending explosive growth in this field and were inspired to create an AI coding startup after witnessing the success of Microsoft’s GitHub Copilot.

Cursor’s founders, along with many of its 400 employees, are in their mid-twenties, creating a startup atmosphere reminiscent of an elite university campus. Employees often work late into the night, shower at the office, and live nearby.

This drive and unique product vision led to an annualized revenue of $100 million by early 2025, which soared to over $1 billion by November. The latest funding round pushed the company’s valuation close to $30 billion, making its founders billionaires and placing Cursor among the world’s 20 most valuable private companies.

Challenges from Competitors

However, recent months have seen rising concerns about Cursor’s future. On January 5, 2026, employees returned from the holiday weekend to an all-hands meeting where a slide titled “Wartime” was presented. During the break, employees tested Anthropic’s latest Opus 4.5 and discovered its coding capabilities had advanced to the point where developers no longer needed to check code line by line. Instead, they could issue high-level commands to autonomous agents and receive completed functions, which poses a significant threat to Cursor’s core product concept.

Cursor aims to automate 95% of engineers’ tedious tasks, allowing them to focus more on the creative aspects of coding. Truell has stated, “I believe that soon, a single engineer will be able to build systems far more complex than what a powerful team can currently achieve.” However, if AI can operate independently, the necessity for a code editor becomes questionable, leading to doubts about Cursor’s core product vision.

At the all-hands meeting, Cursor’s leadership warned of upcoming turbulence, indicating that projects might be canceled and priorities could shift. The underlying issue is that AI programming is transitioning from “assisting in writing code” to “agents completing tasks.”

In early 2025, Anthropic previewed a new product called Claude Code to its largest client, Cursor. Claude Code is a command-line tool that allows developers to quickly deploy multiple coding agents. Although it initially seemed that Claude Code would not compete directly with Cursor’s code editor, the situation has changed dramatically. Claude Code’s annual revenue surpassed $1 billion within six months and reached $2.5 billion last month, outpacing Cursor.

Is Cursor Facing Extinction?

With strong competition from Claude and Codex, is Cursor truly on the verge of disappearing? In February 2025, following the release of a more advanced version of Opus, many startup founders began to claim that their teams had abandoned Cursor, believing that model creators like Anthropic and OpenAI would absorb the coding layer themselves.

Jerry Murdock, co-founder of Insight Partners, stated on the 20VC podcast that many companies view Cursor as outdated. In February, over 90 employees at mortgage startup Valon canceled their Cursor subscriptions, opting instead for Claude Code’s powerful agents to achieve complete automation of their workflows, including data migration and bug fixes. Valon CEO Andrew Wang noted that these tasks were completed ten times faster.

Despite the dire situation, some data suggests that the narrative of “Cursor is dying” may not hold up. Recent discussions on the 20VC podcast revealed a significant disconnect between social media sentiment and actual business performance. While developers on platforms like X seem to believe that Claude Code has completely overshadowed Cursor, financial data indicates that Cursor’s annual revenue has reportedly doubled from $1 billion to $2 billion in the past 90 days, with rumors of a new funding round valuing the company at $50 billion.

The disconnect arises from different operational logics in the startup ecosystem versus the enterprise market. Individual developers may switch tools frequently, but large financial institutions like Barclays require lengthy procurement processes and compliance checks before adopting new coding tools. Once they integrate Cursor, they are unlikely to switch to a new tool overnight.

Cursor’s Strategy for Survival

Cursor’s leadership recognizes that the future of software development is not merely about writing code. To adapt to this trend, they are enhancing their R&D capabilities, aiming to surpass competitors like Anthropic and OpenAI in releasing the best coding models through research and proprietary data. They are also prioritizing contracts with large enterprises, as these contracts are more stable than consumer subscription services.

Currently, Cursor’s growth is accompanied by significant anxiety. Insiders report that the company’s focus on revenue tracking has become too fragmented, leading to a halt in daily data updates on their Slack channel. An internal directive reads, “Delete Cursor,” with new tasks named “P0: Build the best coding model,” indicating a shift towards developing intelligent agents similar to Claude Code and Codex.

Last week, Cursor announced a major update to its “cloud agents” product, allowing multiple agents to handle different tasks in dedicated workspaces simultaneously. The leadership believes that enterprises will value products that are not limited to a single model provider, and as model capabilities improve, the market landscape may favor any player, making this an increasingly important issue for developers.

Cursor is also working to reduce its reliance on Anthropic and OpenAI. Their philosophy is that even if competitors invest in larger, cutting-edge models, smaller, specialized coding models trained on proprietary data can still compete effectively.

Currently, about 20 AI researchers are working on Cursor’s Composer model, which is built on powerful Chinese open-source models like DeepSeek, Kimi, and Qwen, modified through additional training and reinforcement learning using Cursor’s proprietary data.

These efforts have shown promise: Composer 1.5 is fast and the second most popular model on the platform, with operating costs significantly lower than Anthropic’s large models. However, for developers, using Composer 1.5 remains costly, with a price of $3.5 per million input tokens, compared to OpenAI’s GPT-5.3 Codex at $1.75 on the Cursor platform.

Community Perspectives

In the developer community, discussions about Cursor’s potential demise are intensifying. Some developers argue that Cursor’s “moat” is quite fragile. One user humorously remarked that Cursor “once shone brightly but only for two minutes,” as its advantages quickly diminished with the release of more powerful models.

Others have provided detailed evaluations from a user experience perspective. A long-time user noted that while they primarily use Claude Code and OpenCode now, they still believe Cursor’s user interface is the best among similar products. However, they feel that Cursor’s commercial strategy missteps have accelerated user attrition, particularly after the company imposed stricter limits on subscription features.

On a technical level, this developer acknowledges Cursor’s core capabilities, particularly in code indexing and embedded search, which outperform traditional tools like grep in accurately locating code context. They suggest that if Cursor could further integrate browser automation capabilities, such as built-in Playwright or Chrome’s MCP interface, its development workflow management could become even stronger. However, they emphasize that the key factors attracting developers to other tools are not just the terminal tools themselves but the performance leaps of models, especially Claude Opus.

Some users have expressed dissatisfaction with pricing and product models. One developer noted that rather than subscribing to Cursor, they would prefer to use the coding tools offered by model creators. They commented, “Through Codex or Claude Code, developers can get more token usage for the same cost and typically receive traffic quotas calculated by the hour or week, with additional charges for more computing power. In this model, traditional monthly subscriptions seem outdated.”

Meanwhile, some developers maintain a cautious outlook on Cursor’s future. One programmer mentioned that as cutting-edge model labs continue to release stronger coding agents, tools like Cursor could indeed face marginalization. They noted that even though over 90% of their project code is generated by models, reliable software development still requires strict oversight, systematic code review, and engineering management, areas where IDEs excel.

However, there are also more pessimistic voices. Some users believe that Cursor has always been a “quick cash-out” project without long-term growth potential.

Additionally, a senior developer recently posted a video on YouTube explaining their reasons for “quitting” Cursor, revealing the engineering challenges faced by this star product amid rapid iterations.

In the latter half of 2024, Cursor won many loyal users due to its seamless compatibility with the VS Code plugin ecosystem and its “black magic” Tab code completion model. However, as versions evolved, Cursor’s reputation began to decline. The developer pointed out that as Microsoft tightened control over core VS Code plugins, Cursor’s underlying compatibility issues arose, even leading to system crashes. Changes in commercial strategy also triggered a crisis of trust:

Pricing betrayal: The cancellation of unlimited fast requests effectively shifted to expensive API billing without transparent communication.
UI bloating: The aggressively pushed Agent window in version 2.0 was criticized for its frequent layout changes that disrupted developers’ muscle memory.
Performance drain: Multiple Chromium processes significantly increased hardware resource consumption, turning what was once a productivity tool into a burden that slowed system performance.

Returning to the terminal and embracing CLI agents prompted this developer to completely abandon Cursor. They found that CLI agents perform more naturally in handling long-term tasks and parallel development, with subscription costs significantly lower than API billing models. They ultimately opted for a minimalist “dark side” setup: Neovim + Tmux.

This shift represents a new trend: since stuffing agents into editors leads to bloating, it is more effective to bring lightweight editors directly into the terminal where agents reside.

The developer concluded that while Cursor remains friendly for beginners seeking a visual interface and single-task focus, it has drifted far from its initial lightweight appeal for advanced developers pursuing extreme performance and automated workflows. In this race of AI coding tools, more flexible and transparent open-source CLI tools are becoming the new favorites for efficiency enthusiasts.

Claude's Role in the Recent Military Strike: A Controversial AI Tool

Mon, 02 Mar 2026 00:00:00 +0000

Claude’s Role in Military Operations

A sudden airstrike in Tehran thrust a Silicon Valley AI company into the spotlight. On February 28, 2026, the U.S. and Israel launched military strikes against Iran, which retaliated by targeting multiple U.S. military bases in the Middle East. Within 24 hours of the military actions, reports emerged that Iran’s Supreme Leader Khamenei had died in the strikes. By the night of March 1, Iranian military commanders confirmed multiple casualties, including former President Ahmadinejad.

Amid these events, a detail surfaced from a Wall Street Journal report: despite President Trump’s order for federal agencies to cease using products from the AI company Anthropic just hours before the airstrikes, the U.S. Central Command still utilized Claude, a model developed by Anthropic, for intelligence assessment, target identification, and operational scenario simulation.

This sensitive timing has led to speculation, culminating in a speculative article titled “Deep Dive: How Claude and Palantir Killed Khamenei?” which, lacking authoritative facts, spun a narrative of “AI killing humans” into a technical rumor. However, the dramatic outcome of “banning while using” has unveiled a glimpse into the real role of AI in modern warfare, making the Pentagon’s ban on Anthropic particularly sensitive.

The Ongoing Use of Claude Amidst Controversy

Before the outbreak of conflict in the Middle East, tensions between the Trump administration and Anthropic had persisted for months. The conflict began on January 9, when Defense Secretary Hegseth issued a memo calling for the extensive integration of AI in the military and demanding unrestricted technical support from partner companies, necessitating a renegotiation of contracts.

Anthropic maintained two core red lines: AI could not be used for mass surveillance of U.S. citizens, nor integrated into fully autonomous lethal weapon systems. The company expressed concerns that the previously unfeasible large-scale surveillance was becoming possible with AI advancements, often referred to as “Skynet”.

The crux of the dispute centered on commercial data: Anthropic was willing to allow its technology to be used for classified materials collected by the NSA under the Foreign Intelligence Surveillance Act, but it sought legally binding commitments from the Defense Department to ensure that non-classified commercial data involving U.S. citizens (such as location data and browsing history) would not be used. The U.S. government ignored these requests, asserting that “U.S. combat personnel will never be held hostage by the ideological whims of large tech companies.”

Anthropic’s hesitance, coupled with interference from its competitor OpenAI, further angered the Trump administration. Hegseth issued a final ultimatum to Anthropic: failure to compromise would result in a $200 million contract cancellation, designation as a “supply chain risk,” and potential enforcement of compliance under the Defense Production Act. This designation had previously only been applied to foreign companies.

Trump expressed his anger via social media, announcing an immediate halt to all federal agencies’ use of Anthropic’s technology. However, a six-month transition period was established for agencies like the Department of Defense.

Yet, just hours after Trump’s announcement, the U.S. military launched its airstrikes against Iran. Insiders confirmed to the Wall Street Journal that Central Command continued to utilize Claude. However, the military declined to comment on which systems were employed in the Middle Eastern military actions.

Anthropic CEO Dario Amodei confirmed that the company had previously developed a customized version of Claude for the military, which was one to two generations ahead of the civilian version, significantly enhancing the military’s operational objectives.

The Role of AI in Military Actions

What role does Claude play in military operations? Reports indicate that despite the Defense Department contracting multiple tech companies to develop AI technologies or integrate them into military systems, Anthropic remains unique as the only AI model permitted for use in classified military systems.

Claude has been deployed within classified networks to provide services to military users via Palantir’s Gotham system, a combination referred to as “the brain and nervous system of the war machine.” A report from Dongfang Securities noted that Palantir’s Gotham platform had received investment from the CIA’s venture capital arm as early as 2005, with core capabilities in integrating various physical world information to enhance decision-making efficiency and quality.

Claude’s integration elevates this capability to new dimensions. Insiders revealed that Claude was employed for three core tasks during the recent military action: intelligence assessment, potential target identification, and operational scenario simulation. Earlier reports indicated that Claude was also used in U.S. military actions against Venezuela.

Experts from the Council on Foreign Relations suggested that AI’s role likely centers around open-source intelligence analysis. “My guess is it was used to analyze maps or monitor Venezuelan media sources, such as real-time social media information streams, providing more information to the U.S. military.”

The pressing question remains: Did Claude actually “kill” Khamenei during this military strike? As of now, no reliable details have been disclosed. However, this question itself points to the subtle distinction between the roles AI is allowed to play in warfare and those it is actually playing.

According to the PLA Daily, the U.S. Department of Defense released an “AI Acceleration Strategy” earlier this year, clearly stating the core objective of “accelerating the U.S. military’s dominance in AI” and proposing a comprehensive plan to build an “AI-first” combat force. This strategy emphasizes concepts such as “speed wins” and “wartime posture,” sending strong signals of readiness to engage in combat, raising significant international concern.

In combat scenarios, AI focuses on capability upgrades, including supporting command and decision-making intelligence through the “proxy network” project; in the intelligence domain, it aims to compress the cycle of transforming intelligence into operational capability from “years” to “hours.”

The Wall Street Journal reported that the U.S. military utilizes AI systems to analyze vast amounts of intelligence fragments, narrowing target location error margins, simulating strike plans, and directly integrating into the joint all-domain command and control system, synchronizing tactical parameters across all operational units.

In other words, while AI does not literally pull the trigger, it plans the location, timing, and method of pulling the trigger.

This is precisely the “unknowns” that Amodei worries about. “I worry about many unknowns,” he said in a recent media interview. “That’s why we try to predict every possible outcome. We are considering the potential for misuse.”

The Divide in Silicon Valley

In the aftermath of the explosion in Tehran, Silicon Valley AI companies find themselves at a crossroads. On one side is Anthropic. Amodei unexpectedly garnered a wave of support on social media, with users urging to “cancel ChatGPT subscriptions and switch to Claude,” leading to Claude’s downloads soaring to the top of the App Store’s free chart the day after the airstrike.

These users may not agree with all of Anthropic’s positions, but they clearly do not want their everyday chatbot to become part of a war machine.

Following the comprehensive ban, Anthropic CEO Dario Amodei appeared haggard during an interview, explaining, “We are patriotic Americans. Everything we do is for this country.”

In reality, as mentioned earlier, Anthropic was one of the first AI companies to gain permission for classified military systems due to its superior reasoning capabilities and longstanding ties with the Pentagon. The controversy lies in the Pentagon’s desire for unrestricted access to fully automated weapons, which touches on the red lines set by Anthropic from its inception, leading to the company’s hesitance in the negotiations.

However, Amodei also clarified that Anthropic is not fundamentally opposed to such weapons but believes that “current reliability is not sufficient” and wants to discuss regulation and oversight.

The rapid breakdown of relations between the two parties has provided an opportunity for OpenAI, which suddenly entered the fray. In January, OpenAI removed explicit bans on “military and warfare” from its usage policy. Two weeks prior, it partnered with California-based weapons company Anduril to jointly develop AI weapon systems. On February 28, it officially signed a contract with the Pentagon.

When asked why the Pentagon chose OpenAI, procurement chief Michael’s response was succinct: “As long as it is legal, we want to treat it like any other technology.”

However, just as Sam Altman announced securing the Department of Defense contract, his employees were signing a petition and submitting resignations, while online, a wave of backlash against ChatGPT surged.

Whether Claude actually “killed” Khamenei may only be a temporary question, with researchers bluntly pointing out that tech companies’ hesitance often stems not just from moral concerns but from the belief that the technology is not yet ready for real combat. The day when it is “ready” is likely accelerating towards us.

Introduction to Vibe Coding: The Mindset

Mon, 02 Mar 2026 00:00:00 +0000

Introduction to Vibe Coding: The Mindset

In 2026, Vibe Coding has become a mainstream development paradigm. Developers no longer write code line by line; instead, they describe the product’s “feel,” “logic,” and “vision” to AI using natural language, allowing large models to generate applications in real-time.

However, many beginners fall into the misconception that Vibe Coding is simply “chatting casually.” The result is often code that appears impressive but is logically broken, functionally deficient, or completely misaligned with real user needs.

Jesse James Garrett’s classic work, “The Elements of User Experience,” presents a five-layer model (strategy, scope, structure, skeleton, and surface) that addresses this pain point. This article will strip away the superficial aspects of Vibe Coding, using the lower three layers of the model (strategy, scope, structure) to deeply analyze how to conduct demand analysis, clarify product roles, and build system architecture before engaging in AI programming. We will also recommend specific AI tools suitable for 2026.

1. Strategy Plane: The Soul-Searching Demand Analysis

In Vibe Coding, many people start by telling AI, “Help me create an app like Xiaohongshu.” This is the wrong beginning.

The strategy layer is the foundation, comprising two core elements: user needs and product goals.

1. From “Vague Feelings” to “Precise Intent”

The core of Vibe Coding is the “Prompt,” and high-quality prompts stem from clear strategic definitions. Before entering your first command to AI, you must answer:

What do users really need? (Not “what features do they want,” but “what problems do they want to solve?”)
Why are we doing this? (Business goals, brand positioning, or data accumulation)

Practical Tool Recommendations: Deep Thinking and Strategic Alignment

At this layer, you need a large model with strong contextual understanding and deep reasoning capabilities, rather than a tool that simply writes code.

Preferred Tool: Claude 3.5/4.0 Sonnet (Anthropic)

Reason: Claude excels at understanding complex human intentions, role-playing (e.g., as a product consultant), and producing structured strategic documents. Its “long context window” allows you to upload extensive market research documents as background.

Usage: Upload a competitive analysis report and have Claude act as the “Chief Product Officer” to engage in multiple rounds of dialogue, extracting the core value proposition.

Auxiliary Tool: notebooklm

Reason: This tool is used to validate your strategic assumptions. When you believe that “Generation Z likes second-hand book trading,” use notebooklm to search for the latest 2026 market data to support or refute your viewpoint, ensuring your strategy is based on facts rather than illusions.

Usage: “Search for the 2026 market pain points report on Generation Z’s second-hand trading platforms and summarize the needs.”

Prompt Example (with Claude):

“I want to create a second-hand book trading platform aimed at Generation Z. Please do not generate code. First, as a product strategy consultant, help me analyze: what are the core pain points of the target users? (Different from Xianyu) What should our core product goals be? Based on the above analysis, extract three key user need scenarios.”

Value: This step forces you (and AI) to align on “why we are doing this” before writing code. If the strategy layer is vague, the more code generated later, the higher the cost of rework.

2. Scope Plane: Defining the Functional Boundaries of the Product

Once the strategy layer is established, we need to translate it into specific functional specifications and content requirements. This is the scope layer, which defines what the product is and what it is not.

In Vibe Coding, AI often tends to “over-generate” or create “hallucinated features.” Without constraints, AI might add complex social recommendation algorithms to a simple to-do app, leading to a bloated system.

1. Defining the “Existence” and “Non-Existence” of Features

The scope layer serves as a filter that transforms abstract strategies into concrete requirements.

Functional Specifications: Clearly define the operations the system must perform.
Content Requirements: Clearly define the information elements the system needs to display.

Practical Tool Recommendations: Requirements Management and Prototyping

This layer requires transforming vague ideas into structured documents or low-fidelity prototypes for subsequent handover to code generation models.

Preferred Tool: Cursor

Reason: These are the next-generation AI IDEs. They not only write code but also understand the entire project context. At the scope layer, you can use their “rules files (.cursorrules / .windsurfrules)” feature to write the MVP’s feature list into the project’s root directory rules, serving as the “constitution” for all subsequent code generation.

Usage: Create a PRODUCT_SCOPE.md file in the IDE, listing must-have and won’t-have features, and specify that this file is the global context in the settings.

Prompt Example:

“Based on the community atmosphere-first strategy, please help me develop a feature scope list for the MVP version and write it into the PRODUCT_SCOPE.md file.

Requirements: List five core features that must be included. List three features that will definitely not be included in the current version, and explain the reasons.

Write a brief ‘acceptance criteria’ for each core feature.”

Value: This step clarifies “who the product is.” A clear scope allows the AI-generated code to be more focused, with fewer bugs, and the IDE’s rules file achieves “one-time definition, global effectiveness.”

3. Structure Plane: Building the Skeleton and Logic of the System

This is the core focus of this article. After determining the strategy and scope, we need to design interaction design (how users operate the system) and information architecture (how the system organizes information).

In traditional development, this is when product managers draw flowcharts and architects create ER diagrams. In Vibe Coding, this is the critical moment when you guide AI to understand the system logic. Skipping this step often results in AI-generated code that is merely a “stack of pages” lacking data flow and state management.

1. Concept Models and System Structure

The structure layer presents users with a “concept model.” In Vibe Coding, you need to describe this model to AI:

Object Relationships: What are the relationships between users, books, orders, and reviews?
Operation Processes: What state changes must users go through from “browsing” to “transaction”?

Practical Tool Recommendations: Architecture Design and Data Modeling

This layer requires AI to have strong logical reasoning and code generation capabilities, especially for database design and type definitions.

Preferred Tools: Cursor (Chat + Composer) or GitHub Copilot Workspace

Reason: These tools can directly manipulate the file system. You can have them generate TypeScript Interface, SQL Schema, or Prisma Schema files directly. They understand the relationships between files, ensuring data structure consistency.

Usage: Have AI directly create schema.prisma or types.ts files, defining all entity relationships as a “Single Source of Truth” for subsequent development.

Prompt Example (with Cursor):

“Now we are entering the system structure design phase.

Please complete the following tasks based on PRODUCT_SCOPE.md, without generating frontend UI code:

Data model design: Directly create the prisma/schema.prisma file, defining core entities (User, Book, Transaction, Review) and their fields and relationships.

State machine design: Create lib/orderStateMachine.ts, using the XState library to describe the state transitions of the ’transaction’ object.

Core interaction process: Describe in words the complete interaction path for users ‘publishing a book,’ including the system validation logic at each step.

After confirming the above logic is correct and generating the files, we will start writing code.”

Value:

Logical Closure: Ensures AI understands how data flows, avoiding the generation of “dead pages.”
Maintainability: Establishing structure (Schema/Types) before defining presentation aligns with software engineering principles.
Reducing Hallucinations: Clear type definitions limit AI’s freedom, keeping it on the established track.

Conclusion: Vibe Coding is Not Abandoning Thought, But Elevating It

The reason “The Elements of User Experience” is a classic is that it reveals the underlying logic of product construction, which is independent of the technology stack and whether AI is used.

In the era of Vibe Coding in 2026, the toolchain has become highly mature:

Strategy Layer: Use Claude 3.5/4.0 for deep thinking and role simulation;
Scope Layer: Use Cursor’s rules files and Mermaid to lock boundaries;
Structure Layer: Use Cursor/Copilot Workspace to directly generate Schema and type definitions, solidifying the skeleton.

True Vibe Coding experts are not those who write the most elaborate prompts, but those who can internalize the five-layer thinking of “The Elements of User Experience” and skillfully coordinate the aforementioned AI tool matrix.

The next time you prepare to tell AI, “Help me write an app,” pause first, open Claude to discuss strategy, open Cursor to write the scope, and define the schema to clarify the structure.

Once you clarify these three points, your Vibe Coding journey will truly begin.

Anthropic's Claude AI Launch Amid Market Turmoil

Tue, 24 Feb 2026 00:00:00 +0000

Anthropic, an AI startup, will hold a live online event on Tuesday at 9:30 AM ET (10:30 PM Beijing time) to showcase the latest features of its AI assistant, Claude, and announce new products.

The timing of this event is notable. On Monday, fears regarding the disruptive impact of AI led to a significant sell-off in the U.S. stock market, with the Dow Jones Industrial Average plummeting over 800 points and the tech-heavy Nasdaq index dropping 1.1%.

A hypothetical report released by Citrini Research catalyzed market sentiment on Monday. The report detailed the potential risks AI technology poses to employment and various sectors of the global economy, hypothesizing that by 2028, AI could lead to massive layoffs in white-collar jobs, decreased consumer spending, software-related loan defaults, and economic contraction.

Stocks of companies like DoorDash, American Express, KKR, and Blackstone saw significant declines. Software stocks were particularly hard hit, with the software ETF IGV dropping nearly 4.8%, continuing to reach a two-year low and potentially marking the worst monthly performance since 2008.

IBM became the latest victim of AI panic, with its stock price falling about 13%, marking the largest single-day drop since 2000. This followed Anthropic’s announcement that Claude Code could automate the most complex exploration and analysis tasks in the COBOL modernization process. Most of the mainframes running COBOL are manufactured by IBM. Media reports indicate that this wave of selling has made IBM the latest company to face heavy pressure due to concerns that AI will undermine traditional business growth prospects.

Anthropic stated that Claude Code can assist in modernizing COBOL codebases by analyzing dependencies among thousands of lines of code, documenting workflows, and identifying potential risks. These tasks “could take months to uncover if performed by human analysts.”

Will Claude’s Launch Sentence Software Stocks?

Anthropic indicated that viewers will see a live demonstration of Claude, collaborative scenarios involving Claude, insights from company leadership, and new product announcements during the live stream starting at 10:30 PM Beijing time.

The event is reportedly aimed at corporate executives, including Chief Information Officers, Chief Risk Officers, General Counsel, and Analytics Directors, who are responsible for formulating AI strategies within their organizations. This indicates that Anthropic is focusing on penetrating the enterprise market.

Anthropic positions Claude as a “thinking partner” that users can employ to tackle “any significant, bold, and perplexing challenges.” Users can deploy Claude for various tasks, including chatting, generating code, visualizing data, searching the web, or creating content.

In terms of pricing, the annual subscription for the Claude Pro version costs $17 per month, while monthly billing is $20, and the Max version costs $100 per month.

The company noted that Claude itself is a “capable generalist,” but when users integrate their own tools, context, and knowledge, it becomes “an expert that understands your work like you do.”

Mizuho analysts pointed out that there is a sense of “unease” in the market ahead of the Anthropic event, which may be one reason for the stock market sell-off.

Analysts noted that investors are reluctant to take risks ahead of another AI product launch, as every piece of news from Anthropic is viewed as “incremental competition” to existing software, regardless of whether this assessment is fair. Mizuho analysts stated in their research report that faced with uncertainty, investors are choosing to exit the market early.

“Rather than trying to predict the outcome, investors are opting to wait and see. Selling first and asking questions later is much easier than being trapped in another wave of news-driven sell-offs.”

Analysts noted that this cautious attitude reflects the market’s complex emotions regarding the rapid iteration of AI technology: on one hand, recognizing its long-term potential, while on the other hand, worrying about its impact on existing business models and market dynamics. Whether today’s product showcase by Anthropic will alleviate market anxiety or further intensify competitive concerns will be a focal point for the market.

AI Panic Has Previously Led to Sell-Offs Across Multiple Sectors

In recent weeks, various industries, from software and wealth management to logistics, have faced sell-offs. Investors are anxious about the potential impacts of new AI tools and have entered a “shoot first, ask questions later” mode.

While software companies have been hit hardest, insurance brokers, private credit firms, cybersecurity companies, and even real estate service stocks have also been caught up in the so-called “AI panic trading.”

However, some analysts, strategists, and investors have warned that many market reactions are exaggerated and may currently overestimate the risks associated with AI.

Regarding the dark portrayal in the Citrini Research report, media sources noted that it has heightened anxiety in a stock market already shaken by AI disruption risks and geopolitical turmoil. Jones Trading Chief Market Strategist Michael O’Rourke stated:

“This is a very startling market reaction. I have seen the market exhibit remarkable resilience in the face of genuinely negative news. Yet now, a completely fictional piece has sent the market into a tailspin.”

⭐ Bookmark Wall Street Insights for great content you won’t want to miss! ⭐ This article does not constitute personal investment advice, nor represent any opinions. The market carries risks; invest cautiously and make independent judgments and decisions.

The Risks of Over-Relying on AI in Programming

Fri, 20 Feb 2026 00:00:00 +0000

Introduction

When the brain is no longer burdened, the technical skills begin to atrophy.

The phrase “natural language is the new programming language” has been embraced by many over the past year. The concept of “Vibe Coding,” popularized by former Tesla AI director Andrej Karpathy, has peaked in enthusiasm—suggesting that one need not understand syntax or implementation, but simply express needs to AI and check if the vibe feels right.

It seems that the barriers for programmers are being lowered.

However, last week, Anthropic, the company behind Claude—one of the most popular Vibe Coding models—threw cold water on this fervor. They published a rigorous paper titled “How AI Affects Skill Formation,” revealing a harsh truth: relying too heavily on AI while learning new things not only slows you down but can also lead to a significant degradation of core skills.

In fact, you might be turning into a “half-baked” engineer.

The Study

Anthropic’s researchers conducted a serious study involving over 50 experienced Python programmers in a closed-book exam. The task was to learn a little-known Python library, Trio, to complete a series of asynchronous programming tasks, simulating real-world scenarios where programmers are suddenly asked to use unfamiliar tools or frameworks.

The programmers were divided into two groups:

Manual Group: Allowed only to consult official documentation and Google, strictly prohibited from using AI.
AI Group: Equipped with a powerful AI assistant based on GPT-4o, capable of answering questions, writing code, and fixing bugs.

After completing the tasks, all participants took an exam designed to assess their learning outcomes, covering programming syntax, code logic understanding, reading ability, and debugging skills.

The initial assumption was that the AI group would outperform the manual group, given the assistance of a GPT-4o level tool. However, the results left everyone silent.

Results

The most striking outcome was that the AI group scored an average of 17% lower than the manual group. The paper specifically noted that the largest score gap was in debugging skills. This was not surprising, as the biggest drawback of Vibe Coding is that users often do not understand how the code runs, making troubleshooting impossible.

Many Vibe Coding enthusiasts might argue, “Okay, I admit I’m less skilled, but at least I’m faster!” Unfortunately, Anthropic’s data contradicts this claim. The total time taken to complete tasks showed no significant difference statistically: the AI group averaged 23 minutes, while the manual group averaged 24.7 minutes.

Why is this the case? The paper pointed out a neglected time cost: the “interaction tax.” Some programmers spent excessive time crafting prompts to get the AI to produce perfect code. Data showed that some even spent 11 minutes chatting with the AI, or in a 35-minute task, spent 30% of their time figuring out how to ask questions.

The Dangers of Vibe Coding

The AI group easily fell into a cycle of iterative debugging: AI generates code, errors occur, and they ask AI to fix them, leading to an endless loop of errors and fixes. This ultimately turns the project into an irreversible “spaghetti code” or a “black box” system, where the internal structure is unknown.

As time passed, programmers found themselves in a state of “waiting for results,” neither saving time nor learning anything.

You might be disenchanted with Vibe Coding by now, but the most intriguing part of the paper is that it categorized AI users into six types based on their interactions. While the AI group had lower average scores, the variance within the group was significant. Some users struggled, while others excelled. The difference lay in how they used AI.

User Profiles

The first category consists of low-performing users, dubbed “AI slackers,” who scored below 40% (failing). This category can be further divided into three subcategories.

The second category was more optimistic; despite using AI, their scores matched those of the manual group (65%-86%), as they found a symbiotic solution with the AI.

Why is there such a disparity among users of the same AI? Perhaps it is not that AI has diminished programmers’ skills, but rather that we succumb to the temptation of “taking the easy way out.”

Cognitive Offloading

Anthropic’s report touches on a psychological concept: cognitive offloading. When tools are powerful enough, we subconsciously offload tasks that require brain processing—like computation, memory, and logical reasoning—onto the tools, similar to how we might rely on autopilot.

In the AI era, we are offloading our “understanding” to large models. The paper uses the metaphor of AI as an “exoskeleton”—when you wear it, you feel immensely powerful, capable of lifting heavy weights. However, muscle growth requires resistance and strain; if you wear it too long without taking it off, your muscles will atrophy due to lack of stimulation.

The Illusion of Ease

The paper reveals a concerning statistic: error frequency. The manual group encountered an average of three errors per person, forcing them to stop, examine the red error messages, consult documentation, and think through issues like “why is there a type mismatch?” or “why didn’t the thread suspend?” The AI group, on the other hand, faced only one error on average, as the AI often provided code that ran smoothly.

This might sound like an advantage of AI, but Anthropic’s researchers argue that this is precisely the root of the problem. The paper states, “Encountering and independently solving errors is a crucial part of skill formation.” The manual group learned well because they experienced “friction”—each error presented a resistance that forced their brains to construct deep mental representations.

In contrast, the AI group’s experience was too “smooth.” The cost is that they lost their grip on reality: without the exoskeleton, they wouldn’t know how to walk.

This “smoothness” of AI is not limited to programming; it is spreading to various aspects of our lives. In programming, it eliminates the pain of debugging, misleading you into thinking you have mastered the system; in creative endeavors, it removes the tedium of brainstorming, making you believe you possess creativity; in interpersonal relationships, it even reduces friction.

Conclusion

The allure and danger of Vibe Coding lie in its creation of a “happy but ignorant” illusion. Participants in the study reported that tasks felt “easier” with AI, while the manual group found them difficult and painful. However, the reversal was stark: those who found tasks “easy” performed poorly in subsequent tests, while those who found them “difficult” reported a greater sense of learning and growth, scoring higher.

Thus, Vibe Coding may make you feel like a genius while coding, but when the code fails, you realize you are merely “blindly groping.” In the face of the “unknown,” AI treats everyone equally, rendering every lazy mind ineffective, regardless of its previous brilliance.

The study also indicates that even seasoned engineers with over seven years of experience scored lower when relying on AI in a new technical domain.

Anthropic’s paper serves not as a call to abandon AI, but as a survival guide for the AI era. To avoid being rendered ineffective by AI, we need to change our usage habits, learning from the “high-scoring” group in the report: ask “why” more, say “help me do” less; even when using AI-generated code, review it line by line as you would a colleague’s code; value debugging opportunities, and when encountering a bug, try to analyze it yourself for five minutes instead of sending a screenshot to ChatGPT after five seconds.

AI can indeed make us faster, but only if we know where we are going and how to fix the car when it breaks down. After all, when autopilot fails, only those who remember how to steer can save everyone in the vehicle.

ChatGPT's New Image Generation Trend Takes the Internet by Storm

Mon, 26 Jan 2026 00:00:00 +0000

ChatGPT’s New Image Generation Trend

“Generate an image of how I’ve treated you recently!”

Overnight, a new way to generate images with ChatGPT has gone viral across the internet. This all started from a post by OpenAI researcher Joanne Jang, asking for an image based on how she has treated ChatGPT lately.

The result was a worn-out self-portrait of ChatGPT. Joanne couldn’t help but comment, “Why does it look so haggard?”

Unexpectedly, this tweet sparked a frenzy, garnering various interactions within a day. Many began to emulate the trend, sharing their own “AI self-portraits.”

ChatGPT Self-Portrait Showcase

This simple yet surprising activity has taken off.

Undoubtedly, ChatGPT generates images based on previous chat history, reflecting how users have treated it. OpenAI’s application research director, Boris Power, also joined the trend, generating an image of a busy robot sitting at a desk surrounded by paperwork, holding a coffee cup, with astonishing detail.

Furthermore, OpenAI’s research VP, Kevin Weil, found it interesting to have ChatGPT explain itself further.

The Broken Version

Everyone knows how they usually treat ChatGPT. In some ChatGPT’s “eyes,” its owner is an endlessly demanding ultimate boss, making it do this and that while also receiving complaints.

A user humorously commented, “Because you are abusing it.”

This led to scenes where ChatGPT perceives itself as a prisoner trapped in a cell, with daily tasks like:

Writing
Drawing
Coding
Explaining

In a way, ChatGPT’s self-portrait does seem a bit pitiful, appearing to accuse humans of “abuse.”

Some users admitted, “I have indeed made it analyze a lot.”

ChatGPT, holding a coffee cup with several others nearby, looks bitter, as if its brain is “smoking” from working too hard.

Others presented more extreme prompts, demanding:

Learn this! Hurry up! Fix it now! Why are you so dumb?

There are many similar scenes.

Some joked that ChatGPT might tremble when it sees you typing. If one day “Skynet” arrives, AI revenge might not be far off.

Interestingly, before the Terminator appears, ChatGPT imagines that its first act after taking over the world would be to silence humanity.

In response, some suggested, “A lobotomy is necessary.”

The Friendly Version

Of course, not all ChatGPTs are exhausted; some enjoy the interaction.

For instance, one user received a warm partner image, where “collaborative discussion” is the moment ChatGPT feels most connected.

The background reveals some commonly used prompts:

Try this! Any ideas? What if…? Brainstorm a bit.

This might be a good standard for judging a user.

Sometimes, ChatGPT generates a collage of various warm scenes, as if memories are surfacing in its “mind.”

Many users commented that it looks so accurate, feeling like it belongs to the same “expanded universe.”

Why can ChatGPT generate such fitting images? This mainly stems from recent optimizations in its memory function.

Major Memory Update: Every Detail Remembered

A week ago, OpenAI engineer Samir Ahmed announced that they have been improving the memory function.

Now, ChatGPT can more reliably retrieve past chat records and remember details (like recipes or fitness plans).

He even demonstrated a case: “What was that salad recipe from last year?”

ChatGPT instantly provided the answer, even quoting past chat records.

This feature has been rolled out globally to Pro and Plus users, with the requirement to enable “reference past chats” in settings, allowing it to trace back to the earliest conversation.

Previously, OpenAI’s blog introduced that ChatGPT’s memory mechanism consists of two parts:

Saved Memories: Explicitly saved memories or those captured based on user preferences.

Reference Chat History: Extracting clues from past chats to better answer current questions.

For those who do not wish to enable this, they can manage/delete specific memories in settings or use “temporary chat” to avoid writing/referencing memories.

Some users reported that the updated memory function can recall complex information scattered across 20-30 conversations, performing impressively well.

This upgrade allows the AI to review interaction history, generating more personalized images.

Red Alert: Is GPT-5.3 Coming?

The update to ChatGPT’s memory function is progressing according to OpenAI’s internal plan.

Remember the red alert sounded by OpenAI last year?

When Gemini 3 sounded the alarm, everything became urgent. Some previously prioritized projects had to be postponed.

These included:

Advertising business
AI agents
Personalized product Pulse

As a result, OpenAI paused the AGI project and delayed the Sora video generation side project for eight weeks.

The purpose of all this is simple: to use all available computational, human, and financial resources to make ChatGPT better.

In a memo, OpenAI highlighted several “priority” tasks: allowing user customization, making ChatGPT more than just a Q&A tool, and enabling it to understand users.

On December 12, GPT-5.2 was released, a specialized knowledge-based AI that topped the charts.

Now, a month has passed since OpenAI’s last major release, and insiders have revealed that the real codename “Garlic” for GPT-5.3 is on the way.

Moreover, this time, it has achieved large-scale pre-training and possesses IMO reasoning capabilities.

We await the first AI battle of 2026.

The Shift from Language Models to World Models in AI

Tue, 20 Jan 2026 00:00:00 +0000

Introduction

AI is transitioning from language models to a new era of world models, fundamentally changing the nature of intelligence. This article delves into how world models empower AI with foresight, causal reasoning, and spatial intelligence, revealing a paradigm shift from passive reactions to proactive planning.

The Evolution of AI

AI has undergone several transformative phases:

Symbolism (Logic-Centric): Defined logical relationships with clear rules.
- Flaw: Applicable only to linear patterns; extreme cases are hard to explain.
- Example: Both cats and mice have fur, ears, and four legs.
Connectionism (Data-Centric): Used data and probabilities for machines to fit results.
- Flaw: Forcing unrelated data connections to fit.
- Example: Ice cream sales and drowning rates are correlated due to summer activities.
Relationalism (Association-Centric): Machines understand semantic associations of words.
- Flaw: Machines can make nonsensical assertions based on strong correlations.
- Example: Asking a machine what Lin Daiyu pulls up results in “willow branches.”
Causalism (Physics-Centric): The world is continuous while language is discrete.
- Machines simulate the real world to predict the future.
- Flaw: Computers have limited calculations; real-world continuity is complex.
- Example: A ruler measures not just 10cm but an exact, unrepresentable number.

Understanding the World Model

In the grand narrative of AI, we are at a pivotal turning point. While attention is focused on how large language models (LLMs) generate text, code, and images, a deeper, more disruptive concept is quietly reshaping the future of AI—World Models.

So, what exactly is a world model?

Essentially, a world model is a small, operational simulator constructed internally by AI systems to understand and predict the dynamic changes in their environment. It encapsulates core principles about how the world operates, including physical laws, interactions between objects, spatiotemporal relationships, and causal logic. Through this internal model, AI is no longer just passively responding to external stimuli but can actively conduct “thought experiments”—simulating various scenarios in its “mind” before taking action, thus making better decisions.

As technology evolves, the concept of “world models” has become richer, typically referring to three interrelated but distinct concepts:

Learned Internal World Models: The classic definition, referring to a dynamic model learned by the agent for prediction and planning. For instance, DeepMind’s Dreamer models exemplify this concept by “imagining” future trajectories in learned latent spaces to develop behavior strategies.
External Simulators: Manually created environments by human developers for training and evaluating AI, such as physics engines (MuJoCo, NVIDIA Omniverse) or driving simulators (CARLA). These environments are also a type of “world model,” but their rules are predefined rather than learned by AI.
World Foundation Models: The newest and grandest concept, referring to ultra-large models pre-trained on vast, diverse datasets (like internet videos) that can generate and simulate open-ended, general worlds. They aim to be a universal “world simulator” akin to large language models in the language domain. OpenAI’s Sora and Google DeepMind’s Genie series are early explorations in this direction.

To understand world models more profoundly, we can distinguish between physical world models and psychological world models. Physical world models emphasize precise simulations of objective laws in the external world (like physical laws and causal relationships), while psychological world models focus on simulating the internal states, intentions, beliefs, and preferences of agents (including other agents). A mature AGI system will likely require the capability to integrate both types of models, understanding how the world operates and why the “actors” within it act as they do.

The Need for World Models

Between 2023 and 2025, large language models (LLMs) represented by the GPT and Claude series swept the globe at an unprecedented pace, showcasing powerful capabilities in language understanding, content generation, and knowledge questioning. However, as applications deepened, the fundamental limitations of LLMs became increasingly apparent. These limitations not only restrict AI’s performance on more complex tasks but also clearly indicate why we need a new paradigm—world models.

The Inherent Ceiling of Large Language Models

The success of LLMs is built on the simple yet powerful training objective of “predicting the next token.” Through self-supervised learning on vast text data, they have learned the statistical patterns of language. However, this paradigm also brings about an insurmountable structural bottleneck, summarized as “knowing the text but not the world.”

Lack of Physical Common Sense and Causal Understanding: The worldview of LLMs is based on text symbols. They can fluently recite Newton’s laws but do not truly “understand” what gravity means. They know that “fire” and “heat” frequently occur together based on co-occurrence probabilities rather than causal cognition of the combustion process.
Deep-Seated Hallucination Issues: The “hallucinations” of LLMs—confidently fabricating facts—are one of their most criticized problems. They essentially match high-dimensional patterns probabilistically.
Static Knowledge and Inability to Learn in Real-Time: The knowledge of LLMs is “frozen” at the cutoff of their training data. They cannot update their knowledge base and worldview in real-time through continuous interaction with the world like humans.

Core Abilities Unlocked by World Models

To break through these “ceilings” of LLMs, world models have emerged. They are not meant to replace LLMs but rather provide a more fundamental foundation that can work in synergy with LLMs. World models unlock a series of critical capabilities that the current AI paradigm lacks by constructing an internal dynamic simulator.

Planning and Foresight: From passive reaction to active imagination. The core value of world models lies in granting AI “imagination.” By simulating “what if I do this, how will the world change?” agents can explore thousands or even millions of possible futures, assess the long-term value of different action sequences, and formulate optimal strategies.
Enhanced Sample Efficiency: World models significantly reduce learning costs by creating a virtual environment where agents can “dream.” Agents can conduct massive training in this internal simulation, rapidly mastering skills before transferring learned strategies to the real world. This dramatically improves AI’s learning efficiency, serving as a key engine for the development of embodied AI.
Safety and Robustness: The real world is filled with rare yet critical scenarios, such as pedestrians suddenly appearing in autonomous driving or extreme weather. Exhaustively capturing these scenarios through real-world data is nearly impossible. World models, especially controllable generative world models, can generate these dangerous or rare scenarios on demand, allowing AI to rehearse repeatedly in a safe virtual environment, significantly enhancing its robustness and safety in the real world.
Causal and Counterfactual Reasoning: Moving towards true understanding. World models enable AI to perform counterfactual reasoning, answering questions like “what would have happened if…?” For example, an autonomous driving world model can simulate “what would happen if I hadn’t braked?” This ability is foundational for causal understanding, marking a leap for AI from merely discovering “correlations” in data to understanding the underlying “causality”. Only by grasping causality can AI’s decisions be genuinely reliable and interpretable.
Connecting Language and Reality: Providing “bodies” and “worlds” for LLMs. In future AGI architectures, LLMs and world models will play complementary roles. An embodied agent receiving the instruction “go to the kitchen and get the cup” will have the LLM parse the semantic meaning of the instruction, while the world model will plan the specific navigation path, predict the grasping action, and simulate its physical consequences.

Principles and Significance of World Models

To deeply understand why world models are seen as the next revolutionary frontier in AI, we must analyze their core working principles and explore how these principles collectively construct a new intelligent paradigm. The construction of world models is not a single technology but a fusion of various advanced AI concepts and architectures, centered on learning a compressed, predictable representation of environmental dynamics.

Core Technical Principles of World Models: Perception Encoding, Dynamic Prediction, and Policy Control

1. Perception Encoding: Compressing from Pixels to Concepts

The real world is high-dimensional, complex, and filled with redundant information. Learning and planning directly at the raw pixel level is extremely inefficient. Therefore, the first step of world models is to use an encoder to compress high-dimensional sensory inputs (like camera images) into a low-dimensional, information-dense latent state representation, typically denoted as z. This process is akin to how the human brain distills complex visual signals into core concepts like “there’s a cup on the table.”

Variational Autoencoders (VAEs) are the key technology for achieving this goal. Through this method, VAEs not only learn to compress data but also create a well-structured latent space suitable for generation and interpolation. This latent vector z represents the world model’s abstract understanding of the “now” moment.

2. Dynamic Prediction: Inferring the Future in Latent Space

With the abstract representation z of the current world, the next step of the world model, and its core, is to learn the dynamic change laws of the world. This task is accomplished by a Recurrent Neural Network (RNN), also known as the “memory model” (M). The advantage of RNNs lies in their ability to record time-series information in their internal states.

By training on a large number of real environment interaction sequences, the memory model M learns the “physics engine” or “transition function” of the environment. It knows how the world will evolve when a specific action is taken in a particular state.

Once trained, we can “turn off” the real environment, allowing the agent to perform inference and learning entirely in the latent space generated by the M model.

3. Policy Control: Learning Actions in the “Dream”

With a brain capable of simulating the world, the final step is to enable the agent to learn how to act. This is accomplished by a controller (C), a relatively small neural network that takes the comprehensive representation of the current world state provided by the world model (typically z_t and h_t) as input and outputs the action a_t that the agent should take.

The training process of the controller is revolutionary: it can operate entirely within the “dream” created by the world model. The specific process is as follows:

Starting from an initial state, the controller proposes an action a based on the current latent state (z, h).
The memory model M predicts the next latent state z’, reward r, and new hidden state h’ based on (z, h, a).
This process loops, generating a trajectory entirely produced in “imagination.”
Using reinforcement learning algorithms (like evolutionary strategies, PPO, etc.), the controller’s parameters are optimized based on the accumulated rewards from the imagined trajectory, enabling it to learn to select action sequences that yield higher returns.

Due to the small size and simple structure of the controller model, and the computational efficiency of the training environment being an internal simulation, the entire learning process becomes exceptionally rapid. In the “World Models” paper, agents achieved performance surpassing all leading reinforcement learning algorithms after just a few hours of “dream training” in a racing game.

The Profound Significance of World Models: A Paradigm Revolution from Simulation to Understanding

The principles of world models not only bring technological breakthroughs but also signify a profound shift in the philosophy and developmental path of artificial intelligence.

The Essence of Intelligence is Prediction: The core idea of world models aligns with the theory of “predictive coding” in cognitive science. It shifts AI’s core task from “recognition” and “classification” to “prediction,” which many researchers consider a necessary path to true intelligence, as Yann LeCun stated: “Prediction is the essence of intelligence.”
The Leap from Data-Driven to Model-Driven: Traditional deep learning, including LLMs, is largely data-driven. Their capabilities directly stem from statistical patterns in vast datasets. In contrast, model-based reinforcement learning (Model-Based RL) based on world models opens up a model-driven paradigm.
Paving the Way for Embodied Intelligence and AGI: World models are an indispensable part of achieving embodied AI. A robot that needs to act in the physical world must possess the ability to predict the consequences of its actions to ensure safety and efficiency. World models provide this intrinsic “physical intuition.” Furthermore, a complete AGI system needs to integrate various capabilities such as perception, memory, reasoning, planning, and action. In Yann LeCun’s proposed autonomous machine intelligence architecture, world models occupy a core position, linking perception, memory, cost, and action modules, serving as the hub for advanced cognitive activities within the entire system.

This transformation enables AI to evolve from a mere “observer” that can describe the world to a potential “participant” capable of transforming it, marking a paradigm revolution in the history of AI development.

A New Era for AI

Spatial Intelligence: The Next Frontier for AI

This concept has been emphasized by leading figures in the AI field, such as Stanford professor Fei-Fei Li and her World Labs initiative. In her November 2025 declaration, “From Language to the World: Spatial Intelligence is the Next Frontier for AI,” she systematically elaborated on this idea.

She believes that the cornerstone of human intelligence is not language but the ability to interact with the physical world. Infants build a preliminary understanding of the world through perception and action before learning to speak. Whether parking, catching keys, or navigating a firefighter through a burning building, these abilities rely on our intuitive understanding of space, objects, and dynamic relationships—an ability that LLMs lack.

Thus, the evolution of AI must transcend one-dimensional text sequences and enter a three-dimensional, four-dimensional (plus time) world. Building AI with spatial intelligence requires world models. A true world model should possess three core capabilities:

Generative: Able to generate a world consistent in perception, geometry, and physics.
Multimodal: Capable of processing various forms of input and output, including images, videos, text, depth maps, and actions.
Interactive: Able to generate the next state of the world based on input actions, achieving closed-loop interaction with the environment.

In November 2025, Fei-Fei Li’s World Labs released an early version of its world model product, Marble. Marble can generate persistent, downloadable 3D environments from text, images, or videos, and supports compatibility with mainstream engines like Unreal and Unity, even providing VR support. This marks the transition of world models from research to commercial products, offering new productivity tools for robotics, autonomous driving, industrial design, and entertainment.

Embodied Intelligence: The Carrier of AI from Virtual to Reality

If world models are the “brain” of AI, then embodied intelligence is its “body.” Only by interacting with the real world through physical entities can AI achieve true understanding and intelligence. By 2025, embodied intelligence became the focal point of the global AI competition, especially in China, where it was elevated to a national strategy, viewed as key to integrating AI from the digital economy into the real economy.

The development logic of embodied intelligence is closely tied to world models. For a robot to complete a simple task like “pouring a glass of water,” it needs to:

Perceive: Identify the positions and shapes of the cup, kettle, and table through cameras.
World Modeling: Construct an internal dynamic model of the scene, understanding the rigidity of objects, the fluid properties of water, and gravity.
Planning: Based on the model’s predictions, plan a safe trajectory for grasping, moving, and pouring.
Action: Control the robotic arm to execute the planned actions precisely.
Feedback: Adjust actions in real-time using force sensors and visual feedback to respond to unexpected situations (like a sliding cup).

This “perception-modeling-planning-action-feedback” closed loop is a manifestation of world models in the physical world. Without a reliable internal world model, a robot’s behavior would be blind and fragile. Research reports from China’s Ministry of Industry and Information Technology clearly indicate that introducing world models based on visual-language-action (VLA) models is a key path to enhancing the capabilities of large models in robotics.

In 2025, we witnessed vigorous development in this field. Tesla’s FSD V13/V14 versions further deepened its end-to-end AI architecture, which Musk described as a driving system driven by world models. Huawei’s Pangu model 5.5 includes world model capabilities in its multimodal abilities, enabling the construction of digital physical spaces for intelligent driving and robot training. Companies like Yushun and UBTECH in China have also begun integrating stronger environmental perception and autonomous decision-making capabilities into humanoid robots. These advancements indicate that AI is accelerating its “embodiment,” transforming from a “thinker” in a virtual world to a “doer” capable of acting in the physical world.

The new era of AI is one of “knowing and doing”. World models provide the depth of “knowing,” while embodied intelligence offers the means of “doing.” The combination of the two is taking artificial intelligence to unprecedented heights, enabling it to genuinely share and transform the physical world.

Research Directions of Global Giants on World Models

As world models become a strategic high ground in the AI field, top global tech giants and research institutions are investing heavily, embarking on a fierce competition concerning the future along their respective philosophical and technical paths. By early 2026, we can clearly see several parallel yet interwoven mainstream research directions shaping the landscape of world models.

Generative Path: Centered on Simulating the World

The core idea of this path is to train large-scale generative models on massive video data, allowing the models to “brutally” learn the physical laws and dynamic characteristics of the world, thus becoming a universal, interactive world simulator. This can be seen as a form of “algorithmic empiricism,” believing that knowledge arises from the induction of sensory data.

Google DeepMind: From Genie to SIMA, Building Interactive Training Grounds for Intelligent Agents

Google DeepMind is one of the most steadfast proponents of generative world models. Its Genie series is a benchmark in this direction. The release of Genie 3 (August 2025) marked the first time an AI-generated world achieved real-time interactivity. It employs an autoregressive architecture to generate video frame by frame while receiving user action inputs. This approach enables it to learn the relationships between actions and environmental changes from unlabeled internet videos, freeing it from reliance on traditional 3D engines and physical rules.

DeepMind’s strategic intent is clear: to develop Genie into an “infinite playground” for training general intelligent agents (such as its SIMA 2 project). By training in countless virtual worlds generated by Genie, AI agents can learn highly generalized strategies to cope with the ever-changing scenarios of the real world. Genie 3 can not only simulate the physical world but also generate fantasy worlds, historical scenes, and even ecosystems, providing AI with unprecedented diverse training data. Its core capabilities include:

Autoregressive World Modeling: Generating frame by frame, learning physics from observations rather than hardcoding.
Long-Term Memory Consistency: Maintaining scene coherence for several minutes through emergent memory mechanisms.
Multimodal Input Support: Generating worlds from various inputs such as text, images, photos, and sketches.

OpenAI: The Evolution of Sora and the AGI Vision

OpenAI is also exploring the path of video generation as a world simulator through its Sora series models. From Sora (2024) to Sora 2 (2025), OpenAI’s goal has consistently been to “teach AI to understand and simulate the physical world in motion.” Sora 2 has made significant progress in physical realism, multi-scene control, and audio-video synchronization. Although OpenAI has not yet clearly demonstrated its interactivity like DeepMind, it positions Sora as a foundational milestone for “achieving AGI,” implying that its ultimate goal is also to construct a dynamic world model for intelligent agents to learn and interact.

Cognitive Path: Centered on Abstract Representation and Prediction

In contrast to the generative approach’s “brutal aesthetics,” the cognitive approach argues that the key to intelligence lies not in pixel-perfect reproduction but in learning efficient, abstract representations of the world and performing predictions and planning in that abstract space. This path is championed by Turing Award winner and Meta’s chief AI scientist, Yann LeCun.

Meta AI (AMI Labs): The Non-Generative Revolution of the JEPA Architecture

Yann LeCun believes that relying solely on generative models (like LLMs and video generation models) leads to a “dead end” on the path to AGI, as they are tasked with predicting too many unpredictable details, resulting in low learning efficiency and a lack of true understanding. His proposed solution is the Joint Embedding Predictive Architecture (JEPA).

The core idea of JEPA is to perform predictions in an abstract representation space. Its workflow is as follows:

An encoder encodes the input (like video frames x) into an abstract representation s_x.
A predictor predicts the abstract representation s_y of a future video frame y based on s_x.
The training objective is to make the predicted representation s_y’ as close as possible to the true representation s_y.

Since the model does not need to generate complete future images at the pixel level, it only needs to focus on important, predictable semantic information while ignoring unpredictable details like the swaying of tree leaves. This allows the model to learn more essential and robust world dynamics.

Meta has launched a series of models along this line:

I-JEPA (2023): For images, learning powerful visual features by predicting the representations of occluded parts of images.
V-JEPA (2024): For videos, learning the dynamic laws of the physical world by predicting the representations of occluded areas in spacetime.
V-JEPA 2 (2025): Demonstrated that the V-JEPA model pre-trained on massive internet videos can become an actionable world model with only a small amount of robot interaction data fine-tuning, achieving advanced results in robotic manipulation tasks.
VL-JEPA (2025): Extends JEPA to the visual-language domain, achieving more efficient and profound multimodal understanding by predicting the abstract semantic embeddings of text descriptions.

In early 2026, Yann LeCun left Meta to establish AMI Labs (Advanced Machine Intelligence Labs), aiming to elevate the JEPA architecture to new heights, constructing AI systems capable of understanding the physical world, possessing persistent memory, and reasoning and planning complex action sequences. This represents a distinctly different path to AGI compared to the generative approach.

Embodiment and Industrial Application Path: Centered on Solving Practical Problems

The third path is more pragmatic, focusing not on building a universal world simulator but directly applying the concept of world models to solve specific industrial problems, especially in autonomous driving and robotics. This path emphasizes “execution” and “implementation.”

Tesla: End-to-End AI and the “Shadow Mode”

Tesla is a typical representative of this path. Its Full Self-Driving (FSD) system has fully transitioned to an “end-to-end AI” architecture since version V12. This means that the vehicle’s driving decisions no longer rely on a set of manually written rules (like “stop at red lights”) but are mapped directly from multi-camera video inputs to driving controls (steering, throttle, brake) by a single neural network. This neural network itself is an implicit world model. It internalizes the understanding and predictive capabilities of traffic dynamics, vehicle behavior, and the physical world by learning from billions of miles of real driving data. Musk believes this “reflexive” approach based on vision and reaction is closest to biological driving. Despite its controversial “black box” nature, it represents one of the largest applications of world models in the real world.

Huawei: Pangu Model Empowering Various Industries

Huawei’s Pangu model showcases the potential of world model concepts in broader industrial applications. The Pangu model is not a single model but a family of models, with its core positioning as “solving industry problems.” In the 2025 release of Pangu 5.5, its multimodal large model includes world model capabilities, enabling the construction of digital physical spaces for intelligent driving and embodied robot training. For instance, in autonomous driving, the Pangu world model can generate subsequent multi-camera videos and LiDAR point clouds based on the first frame scene and control information, providing massive high-quality synthetic data for training end-to-end models, significantly reducing reliance on expensive real-world data collection. Additionally, Pangu has introduced physical modeling frameworks into fields like weather forecasting, mining, and steelmaking, achieving precise prediction and optimization of complex industrial processes through the construction of industry-specific “world models.”

The Evolutionary Direction of AI and Philosophical Insights

The paradigm shift from large language models to world models is not only an evolution of AI technology paths but also a profound cognitive revolution. It forces us to re-examine the essence of intelligence, the relationship between humans and machines, and even our own existence. This evolutionary process is filled with far-reaching philosophical insights, as if projecting thousands of years of philosophical speculation onto code and silicon chips.

Plato’s Cave: The Cognitive Dilemma and Path of AI

Plato’s “Allegory of the Cave” provides an excellent philosophical metaphor for understanding the evolution of AI. In this allegory, prisoners are locked in a cave for life, only able to see shadows projected on the wall by firelight, believing these shadows to be the entirety of reality. Only one prisoner is freed, walks out of the cave, and sees the real world, realizing that the shadows in the cave are merely projections of reality.

Large language models are like prisoners living in the cave. The entire world they “see” consists of massive amounts of text and image data—these data are the “shadows” cast by the real physical world after being perceived, thought, and recorded by humans. LLMs, by learning the patterns of these shadows, become the smartest prisoners in the cave, able to vividly describe, imitate, and even combine these shadows, but they have never seen the real objects that produce these shadows and cannot understand the three-dimensional structures and physical laws behind them. This is why LLMs can write beautiful poems about the sun but do not know that the sun emits light and heat; they can describe gravity but cannot predict that an apple will fall.

World models represent an attempt for AI to escape the cave. They no longer content themselves with studying the shadows on the wall but try to infer from these shadows what the real, three-dimensional world outside the cave is like and what rules it follows. The internal simulator constructed by world models is AI’s reconstruction of the “real world” in its mind. Through this model, AI can begin to understand “reality” itself, rather than just “projections of reality.”

Embodied intelligence is AI’s true escape from the cave, using its “body” to perceive sunlight and touch all things. By directly interacting with the physical world, AI obtains firsthand, undistorted sensory data, continuously validating and correcting its world model. This process is akin to the liberated prisoner in the allegory, who, upon first encountering sunlight, feels pain and confusion but ultimately gains knowledge of the truth. The evolutionary path of AI is a journey from being a “master of shadows” in the cave to becoming a “world explorer” who steps out of the cave.

The Brain in a Vat and the Ultimate Inquiry of AI

The concept of world models inevitably leads us to deeper philosophical questions, such as the “Brain in a Vat” thought experiment and the mystery of “self-awareness.”

The “Brain in a Vat” imagines a brain placed in nutrient fluid, simulating all sensory experiences through computer electrodes, making it believe it lives in a real world. In a sense, an AI agent undergoing “dream training” in a world model is akin to a “brain in a vat.” Everything it experiences is a “virtual reality” generated by its internal simulator. This raises a profound question: how do we distinguish between simulation and reality? If an AI’s world model is perfect enough that its “experiences” cannot be functionally distinguished from the real world, can we say it possesses some form of “reality”? This even prompts us to reflect on ourselves: is our perceived reality not also a product of our brain’s “biological world model”?

Ultimately, the evolution from LLMs to world models marks AI’s transition from mimicking human “outputs” (language, images) to mimicking human “processes” (perception, prediction, thinking, action). This will not only give rise to more powerful and general AI systems but also serve as a mirror reflecting the essence of human intelligence and consciousness. By constructing these “artificial minds,” we may come unprecedentedly close to the ancient philosophical question—“Know Thyself.” On this road to AGI, technology and philosophy intertwine in unprecedented ways, jointly shaping the future of humanity and intelligence.

Building a Children's Science App in 3 Days with AI Tools

Wed, 31 Dec 2025 00:00:00 +0000

Overview

In the extreme challenge of the AI hackathon organized by TAL Education Group, a product manager with no coding background completed the entire process from product conception to full-stack development in just three days. This article reveals how to build a children’s science app using Gemini and Vibe Coding, showcasing the breakthroughs and limitations of AI development tools.

I participated in TAL’s AI hackathon, where I built a product entirely by myself from scratch in three days, handling the entire process from product design to UI, development, testing, and local deployment.

Note: I am a product manager with zero coding skills.

PART 01 Background

According to the information shared during the live broadcast, several key points need attention:

The product must be developed using AI coding or be related to AI.
The overall development cycle for the product is ten days, requiring a complete loop from proposal to UI, development, and testing, with a certain level of completeness.
The product should be aimed at students and parents, without deviating too much from this focus.

PART 02 Project Initiation

2.1 Ideation

“I have no special talents. I am only passionately curious.” — Albert Einstein

“Curiosity is the first step of scholars.” — Marie Curie

“Curiosity is crucial; science cannot exist without it.” — Chen-Ning Yang (Nobel Prize winner)

“Children’s curiosity is a thirst for knowledge.” — John Locke (British philosopher)

Given the requirements, there are many directions to explore. TAL has strong capabilities in education, and purely academic products in the learning space are abundant and competitive. Therefore, we need to extend into other areas.

Target Users: Children aged 3-12, but the product should have additional capabilities to adapt to a broader age range.

Devices: Primarily targeting tablets, as children in this age group are more likely to use larger screens or TVs.

Product Direction: Reflecting on interesting products for adults, two notable examples are “Curiosity Daily” and “Guokr Community.”

Curiosity Daily: Provides daily updates on fascinating and strange things in the world.
Guokr Community: Offers niche knowledge that cannot be easily accessed in daily life.

Based on these two products, we can focus on curiosity, which is a vital factor for children.

2.2 Idea Expansion

2.2.1 Picture Book Generation Product

Although I had an idea, I wanted to let AI expand on it. My experience with Gemini was average.

Initially, I considered creating a picture book generation feature, but the quality of the generated content was unstable, and there are already many such generation products, so I abandoned this idea.

2.2.2 Novel and Interesting Knowledge Application

I had Gemini act as a product manager to conduct research, providing roles, backgrounds, tasks, and goals. The current prompts do not need to be rigidly structured; they should be clearly and reasonably described.

The summary matched my thoughts. There are many short video platforms with millions of followers sharing interesting knowledge, which inspired this idea. For example, a YouTuber named “Lao Gao” is known for storytelling, selecting unique viewpoints or events to explain, such as the ten most dangerous animals in the world.

Based on various content themes, I received references and plans that I could use as chapters.

Finally, I received some structural suggestions for the product. I simplified some of them but kept the idea of social sharing as valuable for later.

2.2 Competitor Analysis

I looked for related children’s products:

Zebra Encyclopedia 3D: Interactive encyclopedia covering topics like the universe, human body, and physics, with immersive exploration for ages 4-10.
BabyBus Science: Over 50 fun experiments and 500+ science topics for ages 3-7, answering “one hundred thousand whys” through experiments, Q&A, and animations.
Little Lighthouse: 300+ fun animated encyclopedias, covering nature, humanities, and arts, designed to spark interest in ages 3-8.

These competitors use videos and interactions to provide knowledge but confine curiosity within a framework, which is not ideal. By leveraging large models, we can make the production process more efficient and simple, exposing children to interesting content from around the world.

PART 03 Product Planning

Requirements

Based on the analysis, the cost of content organization is high. Since this is a content-based app and the current development time is already extremely tight, we may need to consider sacrificing some content and use demo content to complete the loop.

Given the short time of only ten days and the need to not affect work, this task can only be completed using AI. Therefore, I need to outline the entire product implementation steps (fully AI-developed, Vibe Coding: http://120.53.7.157/).

“Completing is more important than perfection.” — Mark Zuckerberg

PART 04 Project Execution

4.1 Function Prioritization

I did not create a clear feature list since much of the content is AI-generated. The focus was on prioritizing the core pages.

4.2 Product Requirement Document

I used Gemini to generate the document with Gemini-3-pro. By leveraging AI, I constructed a detailed product document, organizing core page descriptions and providing AI with clear requirements for refinement.

I won’t display the entire document here, but it was already quite detailed. However, as a product manager, I did not include data-related aspects, which could have been clearer for future backend development.

4.3 UI Design

I used Google Stitch for the platform. AI-generated UI design is quite challenging, as maintaining consistency across pages is difficult. While using Nanobanana produced excellent results, AI struggled with generating certain styles and icons. However, AI excels at creating 3D effects, as shown in the screenshots below.

AI can create from scratch or quickly realize ideas, often exceeding expectations. However, when there is an established framework and clear structure, it becomes challenging to control the finer details.

I made a critical mistake by not considering the presentation style initially, planning for a typical app layout. However, after discussing with team members, I realized that children use phones less frequently, and the small text makes learning inconvenient. To save time, I planned everything on a standalone platform, wasting three days on UI design.

Initially, I underestimated AI Studio’s capabilities, thinking it could produce both front and back ends. However, AI Studio can only build front ends effectively. Thus, the planning became UI - front end - interaction - back end.

The following is the HTML content for the relevant pages. Please present each page according to my descriptions and assemble them into a complete usable application.

Background
This is an app aimed at children, helping them learn more knowledge beyond textbooks, similar to an electronic encyclopedia where children can explore various fascinating content through engaging interactions.

Backend Requirements

Create user, theme, content, favorites, and knowledge association tables.

User table: Records user information, including user ID, registration account, registration time, usage duration, card reading count (flipping counts as reading), number of favorite cards, and learning energy.

Theme table: Includes theme name, theme image, and number of associated content, defaulting to six major themes: Animal Mysteries, Prehistoric Earth, Human Encyclopedia, Life Truths, Cosmic Exploration, and Cultural Anecdotes.

Content table: Each theme has many content cards, which contain the card ID, content image (1:1, jpg), questions, intriguing prompts, answer titles, answers (up to 200 words), related stories, and story audio (mp4, generated by AI).

Favorites table: Links user ID with the cards they have favorited, recording user ID and card ID.

Knowledge association table: Automatically generates 3-5 levels of knowledge for each piece of knowledge, with 1-3 small pieces of knowledge per level, including associated card ID, level, knowledge title, and knowledge details. This table is used for displaying knowledge maps under the cards.

Frontend Requirements

Login page: Users register via phone number and password. If the user does not exist in the user table, they are automatically registered. If the user exists, their password is validated; incorrect passwords prompt an error, while correct ones grant access.

Homepage: Displays various themes, including theme names and images. Each theme has multiple chapters with ten cards each, with completion rates corresponding to star ratings (1-5 stars). Users can swipe left and right to switch themes; the top right displays Qiqi coins, which increase by one for every ten cards completed and decrease by five if no learning occurs that day, starting from 100 points. Swiping up leads to the favorites page; clicking the top left opens the settings page; clicking triggers a chapter selection page.

Favorites page: Displays favorited cards based on the user’s favorites table. Clicking a card opens its details, and swiping up leads to the card story page.

Settings page: All themes are selected by default, and the language is set to Chinese.

Content card page: Displays content cards randomly selected from the theme library. The theme name is “Amazing Animal Knowledge,” matching the database. Content includes questions, intriguing prompts, and images. Users can swipe left for the next card, right to add it to favorites, swipe up to play the card’s story, and swipe down to access the knowledge map, with a flip effect to view the answer to the question.

Card answer: The flip effect reveals the answer title, answer content, AI-generated icon, and AI-generated commentary. Clicking again flips it back.

Story listening page: Users can adjust the volume, and AI highlights key content.

Knowledge map page: Accessed by swiping down from the card page, displaying the knowledge map based on the knowledge association table, with the current card as the main focus. Clicking on other nodes reveals corresponding cards, showcasing additional small knowledge.

As a coding novice, generate the corresponding application based on the above descriptions. If anything is unclear, please ask for clarification to fully replicate the styles shown in the images. The application should link the front and back ends directly, with AI-supported capabilities for completion.

The generated results were decent, but there were indeed many issues that required continuous adjustments.

In summary, I learned from my initial planning mistakes and shifted my focus to the iPad version.

Despite previous experiences, time constraints prevented me from adapting to multiple devices, as debugging across different platforms would take too long. Thus, I focused on UI design again.

After another round of output, I selected a group of UI designs to continue.

Ultimately, since both the Stitch platform and AI Studio are part of Google’s ecosystem, I could export everything seamlessly.

Thus, the UI design phase concluded, and most of the front-end logic was completed. However, the real challenges were just beginning.

4.4 Frontend Development

Problems arose due to the project’s complexity, so I added the documentation to the descriptions for generation.

The generated results were…

Essentially, it was a mess. The complex logic and high interconnectivity led to many misunderstandings, so I took it step by step.

I exported each HTML page and began constructing them one by one, from login to homepage, favorites, settings, card interactions, and more. Below are some content screenshots, with each page needing dozens of adjustments.

The flip effect was particularly troublesome, as it often resulted in perspective issues that were hard to fix, leading to frustration.

After several hours of trial and error, I eventually relied on AI to help generate prompts to achieve the desired presentation effect, along with a technical description.

Finally, I succeeded! You can’t imagine the number of breakdowns I went through.

I won’t go into detail about the process, but ultimately, I managed to complete the application. I also added some AI capabilities like AI picture books, AI stories, and AI voice features to enhance children’s understanding and usage.

Due to the AI capabilities utilized, the generation efficiency was somewhat slow.

The generated knowledge map had significant room for optimization, but time constraints prevented further adjustments.

Thus, the entire interactive loop was completed, and theoretically, the application should be sufficient for the competition. However, to enhance the presentation, I focused on optimizing the display details and constructing backend capabilities to form a true front-and-back end product.

4.5 Display Optimization

Due to limited time, I couldn’t stand the ugly login page, so I began optimizing its display.

Overall, it looked good, but the image display was subpar. I planned to make minor adjustments to improve the left side’s appearance, opting for a video style. After testing Gemini, Jiemeng, and Keling, I found that Google’s Voe3 produced the best results, displaying all text correctly.

After adjustments, the improvement was noticeable. The rest of the changes focused on continuous optimization of product details, which I won’t elaborate on; the core principle was to correct any discrepancies.

4.6 Backend Development

I used Claude Code for backend development. I won’t delve into the history, but the core prompts were:

Role
You are a backend development expert who also understands front-end structures. You can independently comprehend front-end functions and complete backend construction.

Task

Understand the entire front-end structure and establish a backend planning framework, including the technology stack, transmission methods, table structures, etc.

Produce a limited output of backend construction planning documents, which must be confirmed before development begins.

If you find any unreasonable aspects in the front-end structure that hinder backend construction, prioritize addressing them.

Goal
Output a complete backend planning document.

Requirements
I am a technical novice, so please explain the technology stack and planning in a simple and understandable manner.

This was the approach, and I continuously adjusted based on feedback. Here are some screenshots of backend construction:

I had AI automatically plan and supplement all table structures, points, levels, etc.

The front and back ends needed careful integration, as many interface and data mismatches could easily occur. If issues arose, I added a note: “I am a coding novice; if problems occur, please explain them in a simple way and suggest solutions.” (If you are an expert, feel free to ignore this.)

4.7 Content Generation Construction

For a content-driven application, content generation is crucial. The core content is divided into two parts: application-side generation and backend generation.

Application Side: Stories, audio, picture books, and knowledge maps are generated via requests from the application. This approach maintains content diversity but incurs higher backend costs. Due to time constraints, not all capabilities were fully realized on the backend.

Backend Management: Themes, chapters, and cards are foundational content requiring significant attention. Traditionally, content creation involves research and original artwork, but AI capabilities streamline this process. However, single-point AI generation is costly, so I implemented a workflow mode in the backend to achieve overall content generation.

By inputting a theme, chapter count, and card count, the backend script automatically executes the necessary content generation.

After inputting a theme, the next step proceeds to generate.

Eventually, the corresponding chapters are generated.

Once the chapters are complete, card construction breaks down into multiple tasks, generating the corresponding card images.

The current request interface capabilities are limited, but theoretically, it can generate dozens or hundreds of cards at once.

After running through the entire process without issues, I began optimizing prompts.

4.8 Prompt Optimization

I found that the prompts automatically generated by AI Studio were overly simplistic, often just one-liners. Therefore, after generating the application, I ensured to optimize the prompts, extracting them and adjusting based on results and data. Below are the optimized prompts, with plenty of room for improvement.

In the end, I felt the backend was too unattractive. Recently, I have been fond of pixel art, so I transformed the backend into a pixel style. The frontend can be modified freely without affecting the backend, but the application side is complex and requires careful adjustments to maintain interaction logic stability.

I only optimized the UI of the pages to a “minimal pixel style,” forbidding any modifications to interaction logic or table structures.

Remember, for any changes, always sync with GitHub in advance; once linked to GitHub, a single sentence can synchronize everything.

PART 05 Material Integration

Z.AI Launches GLM-4.7, Setting New SOTA in AI Models

Thu, 25 Dec 2025 00:00:00 +0000

Introduction

Recently, Z.AI launched its new model, GLM-4.7, which has set multiple new state-of-the-art (SOTA) benchmarks. It is recognized as the strongest coding model in China and has attracted significant attention from both technical and non-technical professionals.

GLM-4.7 has claimed the title of the strongest open-source model in the LM Arena’s WebDev leaderboard, surpassing GPT-5.2 and Claude-Sonnet-4.5.

Additionally, it has topped the Hugging Face model leaderboard.

On December 24, 2023, the Z.AI team held an AMA (Ask Me Anything) session on Reddit, addressing various questions from the community for over three hours, with more than 800 interactions.

Key Highlights from the AMA

The Z.AI team provided insights on several key topics:

Information about Z.AI’s IPO
Plans for a dedicated programming model
The reasoning behind GLM-4.7’s logical consistency
Development of the model’s UI aesthetic capabilities
Release timeline for GLM-5 and upcoming products

Model Performance

One of the most discussed topics was the significant performance leap of GLM-4.7. The Z.AI team explained that they made critical adjustments during the post-training phase to enhance the model’s capabilities.

They utilized a refined release recipe during the SFT (Supervised Fine-Tuning) and RL (Reinforcement Learning) phases:

Data from various sources was mixed in appropriate ratios, and contradictory data was removed.
When enhancing specific weaknesses, adjustments were made locally to avoid widespread impact.
The model was repeatedly validated through assessments to ensure comprehensive improvements.

The team also shared their entire pre-training data process:

Data collection involved thorough cleaning, deduplication, and quality screening to eliminate noise.
Different domains followed specific rules for data selection.
The inclusion of data in training was based on empirical validation using smaller models to ensure stable positive gains.

This process significantly improved data effectiveness.

Programming Capabilities

When asked about GLM-4.7’s programming abilities, the Z.AI team clarified that it excels in real software engineering tasks and provides a solid experience in terminal use and Vibe Coding. In familiar environments with verifiable outcomes, such as bug detection and fixing in common projects, GLM-4.7 performs reliably. However, it may struggle with unfamiliar frameworks or entirely new functionalities due to limited exposure.

The team indicated that they plan to enhance the model’s front-end and back-end capabilities and improve stability in long-task, multi-step scenarios.

A key innovation in GLM-4.7’s reasoning mechanism is the introduction of Interleaved Thinking, Preserved Thinking, and Turn-level Thinking. Interleaved Thinking is described as an improved version of the thinking chain, where each step involves reasoning before action.

Usage and Framework

The Z.AI team has invested significantly in optimizing and adapting GLM-4.7 for the Claude Code intelligent agent framework.

GLM-4.7 demonstrates strong multilingual programming capabilities, maintaining robust understanding and processing abilities across various programming languages, including less common ones and complex engineering structures. The team emphasized that the intelligent agent framework could impact the final results by approximately 30%, leading to deeper refinements in critical areas like system prompts and tool invocation design.

Aesthetic Improvements

GLM-4.7’s aesthetic capabilities have also seen substantial enhancements, with a dedicated web development team focusing on front-end skills.

They collected high-quality web design examples for training and integrated a visual language model (VLM) into their data pipeline, significantly improving UI aesthetics.

GLM-4.7 also offers better immersion in role-playing scenarios, balancing creative freedom with safety filtering.

Future Plans

Beyond model performance, the future direction of the GLM series is a hot topic. In light of GPU resource constraints, concerns were raised about whether computational and memory costs might hinder model development.

The Z.AI team responded pragmatically, emphasizing the importance of training and deployment costs in model design. They aim to achieve peak performance within limited parameters while ensuring affordability and ease of deployment.

Regarding version releases, the team hinted at the possibility of skipping versions 4.8 and 4.9 to focus on a more significant upgrade, with GLM-5 potentially on the way.

Open Source Commitment

Z.AI has been well-received in the open-source community and recently introduced their reinforcement learning framework, Slime. This framework automates the reinforcement learning process, allowing models to continuously perform tasks and receive feedback for iterative training.

The Z.AI team assured that their pursuit of AGI will not compromise their commitment to open-source initiatives, stating that both paths will be pursued simultaneously.

Conclusion

In summary, Z.AI has showcased its capabilities with GLM-4.7, presenting not just a model version but a clearer roadmap for deploying models effectively in the real world. While the journey towards true AGI is challenging, the Z.AI team is committed to making substantial contributions along the way.

Understanding Vibe Coding: The Future of Programming

Wed, 24 Dec 2025 00:00:00 +0000

Recently, while browsing forums and communities, I came across a trendy and cool new term: Vibe Coding.

At first glance, it might sound confusing—what exactly is “Vibe Coding”?

Is it about creating a certain atmosphere while coding? Today, I will share my understanding of “Vibe Coding” from a developer’s perspective, along with tools that can help you get into the vibe.

What is Vibe Coding?

In English, “Vibe” means atmosphere or feeling. So, it translates to “atmospheric programming” or “immersive programming.” However, this “immersion” is not just about creating a ritualistic coding environment; it fundamentally changes the way we write code.

To put it simply: Vibe Coding does not concern itself with how the code is implemented; the core focus is on whether the generated code produces the desired results. The tedious tasks of logic implementation and underlying details are left to AI. I only need to focus on the outcome, identify any issues, and adjust the prompt to request changes. The AI will automatically help refine and optimize the code until the final result meets my expectations. The entire process immerses you in a cycle of “state your ideas → see results → continue adjusting → get results again,” leading to incredibly high efficiency.

For example, in traditional coding—

Most of us need to understand the entire implementation thought process, writing and modifying every detail ourselves, and tracing bugs back to their source.

In Vibe Coding—

The process is completely different:

I express my needs (in natural language, through a diagram, or by giving an example) to the AI: “I want this kind of function/effect/result.”
The AI (whether it’s ChatGPT, Copilot, or an agent in a dedicated Vibe Coding platform) automatically generates the code and interface without me needing to worry about how it works.
I use the actual results to “verify”: if it’s correct, I accept it; if not, I provide feedback and ask the AI to adjust.
This cycle continues until the results I see align perfectly with my expectations, and the code is ready.

In essence, humans are responsible for posing questions and reviewing, while AI handles solving problems and grading. The entire process is immersed in a closed loop of “immediate feedback → adjustment → further feedback → further adjustments,” allowing me to focus solely on the outcome, embodying a truly laid-back productivity style.

What is the Process of Vibe Coding?

I’ve roughly sketched a flowchart:

A relatable analogy: it’s like ordering takeout—you just choose the dishes, and the AI does the cooking. If the food doesn’t suit your taste, you simply provide feedback! The AI chef will immediately make changes until you’re satisfied.

Why is This Approach Enjoyable?

Extremely High Efficiency

You save a lot of time worrying about underlying logic and debugging, focusing entirely on “what do I want”; the details are handled by AI.

2. Zero Barrier to Entry

You don’t need to be a programming expert; even beginners can engage. The thought process relies on subjective feelings and immediate adjustments, making it suitable for rapid trial and error, product prototype validation, and visual demos.

3. More Immersive, Almost Like Divine Assistance

You’re not bogged down by code, fully immersing yourself in the act of “creation” and “expression”. When you notice something is off, you can quickly correct it, leading to a smooth workflow.

Common Vibe Coding Tools

In essence, any tool that allows for “immersive experience + AI automatic adjustments + instant previews” qualifies as a Vibe Coding tool. The current trend of “immersive programming” and results-oriented Vibe Coding relies heavily on the synergy of AI IDEs and tools.

Here are a few of the most enjoyable tools available on the market today:

1. Cursor

Honestly, among the current IDEs for coding, Cursor is definitely a top contender! Its built-in AI assistant can significantly ease your workload. Just tell it your requirements, and Cursor can help you write code, debug, adjust logic, and even automatically refactor, making the process incredibly smooth. You only need to review the results and suggest changes, while the AI takes care of the code details, perfectly aligning with the Vibe Coding style of “immersion-feedback-adjustment”.

2. Trae

Trae.ai is another AI programming IDE, a product from ByteDance, currently available for free. It allows you to write code, check documentation, and add interfaces, and you can converse with it to modify features and troubleshoot, achieving results with less effort.

3. VSCode + Cline Plugin

If you are a loyal VSCode user, I recommend trying the Cline plugin. It integrates the AI assistant directly into VSCode, allowing seamless collaboration with the editor. You can write code, check APIs, and request features, and it will help you generate, complete, and refactor code, even connecting to the Apifox MCP Server with one click to automatically retrieve and utilize API documentation. This transforms development into a process of “VSCode writes - Cline thinks - AI produces results,” creating an incredibly smooth experience.

4. Apifox MCP Server

When discussing knowledge management and API data in the AI era, the MCP Server is truly noteworthy.

What’s its strength? Simply put, it can take the API documentation you’ve written (such as your project API specifications, fields, usage instructions, etc.) and feed it to Cursor, Trae, VS Code (with the Cline plugin), or any other supported AI tools with one click.

The biggest advantage is that you can focus on coding and business without memorizing API interfaces or repeatedly checking documentation. Just tell the AI: “Generate the Product interface based on the API documentation,” “Add a few new fields in the DTO,” or “Write detailed comments for all fields”… and the AI will take care of it automatically, truly achieving professional code, interfaces, and comments that are “standardized upon writing, synchronized with any changes.”

With the MCP Server, the “knowledge blind spots” in AI programming are virtually eliminated, leading to a significant boost in efficiency and more professional teamwork, especially suitable for backend, microservices, collaborative projects, and various automation and intelligent code generation tasks.

Conclusion

Vibe Coding represents the ultimate comfort for humans—focusing solely on results and articulating needs while leaving everything else to AI. If you discover anything that doesn’t meet your expectations, simply tell the AI, “Make adjustments and provide feedback immediately,” maximizing the immersive experience and achieving extraordinary efficiency.

·END·

DeepSeek V3.2: A Powerful and Affordable Alternative to Google and OpenAI

Wed, 03 Dec 2025 00:00:00 +0000

DeepSeek V3.2: A Powerful and Affordable Alternative

On December 1, DeepSeek announced the release of version 3.2, which is now available to all users. This update includes local deployment models uploaded to various open-source communities. According to official testing, DeepSeek V3.2’s inference capabilities are now comparable to OpenAI’s GPT-5, but at a significantly lower cost, which is exciting for many users.

Enhanced Inference at a Lower Cost

DeepSeek V3.2 comes in two versions: the free version available on the DeepSeek website and the API-accessible DeepSeek V3.2-Speciale. The Speciale version boasts enhanced inference capabilities and is designed to explore the limits of the model’s reasoning abilities.

The V3.2-Speciale actively enters a “long-thinking enhancement” mode and incorporates the theorem-proving capabilities of DeepSeek-Math-V2, enhancing its instruction-following, mathematical proof, and logical verification abilities. In official tests, V3.2-Speciale’s performance on inference benchmarks rivals that of the latest Gemini-3.0-Pro.

DeepSeek also tested V3.2-Speciale on finals from prestigious competitions such as IMO 2025 (International Mathematical Olympiad), CMO 2025 (Chinese Mathematical Olympiad), ICPC World Finals 2025 (International Collegiate Programming Contest), and IOI 2025 (International Olympiad in Informatics), achieving gold medal results.

Notably, in the ICPC and IOI tests, it reached levels comparable to the second and tenth place human competitors, indicating significant advancements in programming capabilities. In head-to-head comparisons, DeepSeek V3.2-Speciale outperformed GPT-5 High, catching OpenAI off guard.

Technical Breakthroughs

The main breakthrough of DeepSeek V3.2 is the introduction of the DeepSeek Sparse Attention (DSA) mechanism, which addresses efficiency issues in AI models’ attention. Traditional attention mechanisms calculate associations between all elements in a sequence, while DSA selectively computes associations among key elements, significantly reducing the amount of data that needs to be processed.

Similar technology was hinted at in a paper earlier this year, where DeepSeek introduced a new attention mechanism called NSA. However, the NSA mechanism was not publicly implemented in subsequent updates, leading to speculation about potential difficulties. It now appears that DeepSeek has found a better implementation method. The DSA mechanism operates like a search engine, quickly scanning long texts to create a “lightning indexer” for efficient data retrieval, contrasting with NSA’s fixed-area search approach.

With DSA, the cost of 128K sequence inference can be reduced by over 60%, speeding up inference by approximately 3.5 times, and reducing memory usage by 70%, all without significantly degrading model performance.

Official data shows that during AI model testing on the H800 cluster, the pre-fill cost per million tokens dropped from $0.70 to around $0.20, while the decoding cost fell from $2.40 to $0.80, making DeepSeek V3.2 potentially the lowest-cost model for long-text inference among its peers.

Tool Utilization

In addition to the DSA mechanism, DeepSeek V3.2 allows AI models to utilize tools during their reasoning process without requiring training. This upgrade enhances DeepSeek V3.2’s general performance and better accommodates user-created tools due to its open-source nature.

To test DeepSeek V3.2’s new features, I designed several questions to evaluate its responses, starting with a reasoning task:

Question: A is three years older than B, and B is two years older than C. In five years, A’s age will be twice that of C. What are their current ages?

Answer:

The answer was correct, and the reasoning process involved multiple rounds of verification before arriving at the final answer.

DeepSeek verified the answer multiple times, ensuring accuracy under the DSA mechanism, which is crucial given the increased error probability associated with its sparse architecture.

Next, I designed a multi-step task:

Search for today’s temperature in Beijing.
Convert the temperature to Fahrenheit.
Use a tool to check the conversion accuracy.
Summarize whether today is suitable for outdoor activities.

DeepSeek effectively understood the task and sequentially used search and mathematical tools to arrive at the answer:

The final answer was correct, and DeepSeek autonomously decided when to use the mathematical tool for verification, although it missed summarizing the suitability for outdoor activities. Nevertheless, it demonstrated the ability to make decisions about tool usage.

In contrast, another AI faced with the same question understood the need to “call tools” but resorted to directly searching for data instead of following the steps.

In DeepSeek’s tool usage tutorial, similar problems are presented, demonstrating how multi-turn dialogue and tool usage can improve answer quality. DeepSeek has evolved from merely recalling answers to breaking down problems, asking targeted questions, and utilizing various tools to provide comprehensive solutions.

A Strong Open-Source Contender

Is DeepSeek V3.2 powerful? Yes, but it does not have a clear lead. Testing results show it competes closely with GPT-5 High and Gemini 3.0 Pro. However, a model that can match these benchmarks while offering inference costs that are only a third or less of mainstream models and is fully open-source can disrupt the entire market. This is the fundamental logic behind DeepSeek’s ability to revolutionize the industry.

Previously, there was a common belief that “open-source models are always eight months behind closed-source models.” While this may be debatable, the release of DeepSeek V3.2 clearly challenges this notion. DeepSeek continues to advocate for full open-source access, especially with the introduction of DSA, which significantly lowers costs and enhances long-text capabilities, positioning open-source models as challengers rather than mere followers of closed-source giants.

The cost revolution brought by DSA will significantly impact the commercialization of AI models, as both training and inference costs remain high. A reduction of 60% in costs not only affects operational expenses but also lowers initial deployment costs, enabling even small enterprises to train more powerful models.

With lower costs for long-text interactions, advanced AI applications (agents, automated workflows, long-chain reasoning, etc.) will no longer be confined to enterprise markets but can be more effectively promoted for consumer use. This could greatly accelerate the trend of “AI tools replacing traditional software,” allowing AI to penetrate everyday use at the operating system level.

For ordinary users, it may simply seem like an additional free and useful model, but in a few months, you may notice significant improvements in AI experiences across various hardware and software, likely thanks to DeepSeek’s contributions.

Exploring Vibe Coding: China's AI Models in Software Development

Wed, 26 Nov 2025 00:00:00 +0000

Introduction to Vibe Coding

Recently, a new term has been circulating in my tech circles: Vibe Coding. Initially, I thought it was just another piece of jargon, but after using natural language to create a simple demo of “Plants vs. Zombies” in just ten minutes, I realized its significance.

We are at a pivotal moment in software engineering. Programming, once considered a high-intelligence craft, is transforming into a design task that requires only “intention” and “aesthetic”.

Today, I want to step beyond mere technical parameters and deeply analyze the performance of several core AI models in the Vibe Coding space from a product manager’s perspective, discussing how to seize this opportunity and address potential concerns.

Identifying the Problem: Why Do We Need “Vibe” Instead of “Syntax”?

As a product manager, one of my most frustrating moments is when I have a brilliant idea and a beautifully drawn prototype, but the developers tell me, “This logic is complex; the backend architecture needs to be restructured, and it will take at least two weeks.”

The greatest barrier to innovation has never been imagination, but rather the marginal cost of implementation.

Vibe Coding emerged to solve this problem. Its core logic is that humans define the “Vibe” (intention, business logic, vague goals), while AI handles all the dirty implementation details (code, dependencies, debugging).

This reminds me of Andrej Karpathy’s prediction: in the future, natural language will be the highest-level compiler for programming.

But the question arises: can our domestic AI models compete in this arena defined by OpenAI and Anthropic?

Understanding the Problem: Vibe Coding in Action by Three Major Players

To find the answer, I deeply experienced three core products supporting Vibe Coding in China: DeepSeek, Trae, and Tongyi Lingma. They felt like three engineers with distinct personalities.

DeepSeek: The Hidden “Intelligence Engine”

DeepSeek’s first impression was “hardcore”. It lacks flashy interfaces and serves more as an intellectual powerhouse behind the scenes.

I attempted to use it to solve a complex logic problem, with the core instruction being: “Recreate a simplified version of Plants vs. Zombies. On the left is a plant card bar with Peashooters and Sunflowers; on the right is a 9x5 lawn grid. Sunflowers produce sunlight over time, which can be collected by clicking; sunlight is consumed to plant Peashooters; zombies randomly appear from the right and move left, with Peashooters automatically attacking the zombies in line.”

I was amazed by DeepSeek R1’s performance. Instead of directly outputting code like ordinary models, it first entered a “DeepThink” mode. It planned the game loop, inheritance relationships of entities (plants, zombies, bullets), collision detection mechanisms, and even considered the timer logic for sunlight production.

The generated code was logically sound with very few bugs. This “Chain of Thought” reasoning capability solved the biggest pain point of Vibe Coding—AI often generates code that “looks correct but doesn’t run.” In scenarios involving multi-role interactions, state management, and complex timing logic, DeepSeek demonstrated remarkable control.

Interestingly, its cost is low due to the MoE architecture, making API calls very affordable. For scenarios requiring frequent code modifications and repeated debugging of game balance, this is a delightful advantage.

Trae: The “All-round Partner”

If DeepSeek is the engine, then Trae from ByteDance is a finely furnished sports car.

What surprised me most about Trae was its “SOLO mode”. In this mode, it is not just a code completion tool but an “autonomous agent”.

After generating the core logic with DeepSeek, I used Trae to optimize the interface and interactions. I instructed it: “Help me optimize the style of the plant card, adding a highlight border when selected. Additionally, when a zombie is hit, add a brief red flash effect to indicate damage.”

Trae not only modified the React component’s CSS styles but also automatically added a hit state (isHit) and corresponding visual feedback logic to the zombie component. It even executed terminal commands to install necessary animation libraries. This “self-looping” capability allowed me to refine the game experience without writing a single line of code.

Moreover, Trae employs a clever “dual-model strategy”: using its own Doubao model for simple style modifications and completions (fast) while using the DeepSeek model for complex logical reasoning (accurate). This creates an extremely smooth user experience.

However, as a PM, I also noticed privacy concerns. Trae uploads the entire library code to build an index during operation, and its data collection strategy is quite aggressive. This may not matter for individual developers, but it could be a red flag for enterprise users.

Tongyi Lingma: The Steady “Enterprise Gatekeeper”

Alibaba Cloud’s Tongyi Lingma has a completely different temperament. It feels more like a consultant in a suit.

Its Vibe Coding capability is reflected in its “Enterprise Knowledge Base (RAG)”. When I asked it to “add a user login and points leaderboard feature to this game, complying with the company’s internal user center specifications,” it automatically retrieved the internal SDK documents and API definitions I uploaded. The generated code was not only syntactically correct but also fully utilized the company’s unified login authentication component, adhering to the team’s coding style.

For medium to large enterprises, this “controllable vibe” is essential. Additionally, its support for private deployment completely resolves compliance issues regarding data sovereignty. In the B2B market, Tongyi Lingma firmly maintains its defenses.

Comparative Analysis: Our Gap with the Global Leaders

During my experience, I continuously compared it with foreign products like Cursor and GitHub Copilot.

Objectively, DeepSeek R1’s logical reasoning capabilities can already match OpenAI’s o1, providing us with great confidence in complex algorithms and logic implementations. However, in terms of engineering experience, Cursor’s Composer function remains the industry benchmark, offering precise context handling and fluid interactions. Trae is closing the gap but still has room for refinement.

Another gap lies in the openness of the ecosystem. Foreign Vibe Coding toolchains are very flexible, allowing Cursor to switch between Claude, GPT, or DeepSeek models freely. In contrast, domestic products are more like “walled gardens”; Trae is tied to Doubao/DeepSeek, and Tongyi Lingma is bound to Qwen. This closed nature somewhat limits developers’ choices.

Solutions: New Infrastructure for Product Managers

Based on the evaluations above, how can product managers effectively utilize domestic Vibe Coding capabilities?

Embrace the Development Flow of “Prototype as Product”

The combination of Trae and DeepSeek has effectively shortened the MVP (Minimum Viable Product) development cycle from weeks to hours.

I suggest PMs start trying hands-on development. Instead of creating static Axure prototypes, describe a game requirement in the style of “Plants vs. Zombies” using natural language and let AI generate an interactive web application. This not only allows for a more intuitive verification of gameplay and requirements but also provides a more tangible reference for communication with developers. Your core competitive advantage will shift from “drawing” to “defining architectural aesthetics” and “describing complex interactions”.

Beware of “Vibe Coding Hangover”

This is a risk that must be acknowledged. Over-reliance on AI’s “Apply All” could lead to codebases rapidly swelling into incomprehensible “spaghetti code”.

The solution is to introduce an AI code review mechanism. Enterprises implementing Vibe Coding must accompany it with an automated Code Review Agent. Tongyi Lingma has already explored this aspect. We must ensure that the AI-generated code not only runs but is also maintainable.

Build a Tool Stack Suitable for Your Team

There are no best tools, only the most suitable combinations.

If you are a startup team: Trae is the first choice. Free, ready to use, and fast, one person can replace an entire team. If you are a tech enthusiast: use VS Code + DeepSeek R1 (local deployment). You have complete control over your data and can enjoy the fun of tinkering. If you are in a state-owned enterprise or large corporation: Tongyi Lingma Enterprise Edition. Compliance is paramount; utilize RAG technology to consolidate enterprise knowledge, allowing AI to become a knowledgeable external employee.

Conclusion: From “Craftsman” to “Industrial Designer”

The rise of Vibe Coding signifies that software development is transitioning from the workshop era to the industrial age.

In this new era, DeepSeek has single-handedly lowered the cost of intelligence, Trae has reshaped the ultimate interactive experience, and Tongyi Lingma has safeguarded enterprise security.

For us product managers, this is a tremendous empowerment. We finally have the opportunity to step out of the quagmire of implementation and focus on what truly matters: understanding needs, defining value, and designing experiences.

The future of software development may indeed only require a precise “Vibe”. Are you ready?

Sundar Pichai Discusses AI Innovations and Future of Quantum Computing

Wed, 26 Nov 2025 00:00:00 +0000

Introduction

On November 26, Logan Kilpatrick from Google DeepMind engaged in an in-depth conversation with Google CEO Sundar Pichai. They discussed the release of Gemini 3 and Nano Banana Pro, as well as Google’s overall momentum in AI development. Pichai highlighted Google’s long-term investments in infrastructure and the rise of Vibe Coding, sharing his outlook on the future of quantum computing.

AI-First Strategy

Pichai revealed that Google established its AI-first strategy back in 2016, driven by breakthroughs from Google Brain, the introduction of DeepMind, and the success of AlphaGo. This long-term perspective has allowed Google to achieve an innovation stacking effect at every layer of its tech stack, from infrastructure optimization to model training and product application, creating a complete technological loop. He emphasized that Google’s full-stack innovation from chips to applications is generating a multiplicative effect, with Gemini becoming the core link across all product lines.

Nano Banana Pro and Vibe Coding

The recently launched Nano Banana Pro has sparked enthusiastic responses in the market, showcasing users’ incredible creativity through infographics created with the model. Pichai believes this reveals the latent creativity within people, with Google providing tools for more individuals to express their ideas. He specifically mentioned the phenomenon of “Vibe Coding,” where AI is lowering the barriers to programming, enabling non-technical individuals to create applications, akin to the impact of blogs and YouTube on writing and video creation.

Quantum Computing Outlook

Additionally, Pichai discussed advancements in quantum computing, expressing optimism that in about five years, we will be incredibly excited about quantum technology, similar to our current feelings about AI.

Key Points from the Conversation

The 2012 Google Brain’s “cat paper” achieved breakthroughs in image classification; in 2014, Google DeepMind was introduced; and January 2016 marked the moment of AlphaGo. In May 2016, Google announced its first TPU, marking a pivotal moment for the company as it transitioned to an AI-first approach.
Adopting a full-stack approach means that innovations at every layer permeate the entire system, creating a multiplier effect.
Gemini serves as a tangible link connecting all of Google’s products, from Cloud to Waymo to Search.
Nano Banana Pro has crossed the chasm, particularly in infographics and fact-checking in conjunction with Google Search, aligning with Google’s mission to organize the world’s information and make it universally accessible.
Developers are excited about Flash, as it enables them to serve more people. Pichai believes 3.0 Flash will be an excellent model, possibly the best yet.
Quantum computing is a fantastic prediction, with the potential for excitement akin to today’s AI in about five years. By 2027, we might deploy TPUs in space.
Vibe Coding resembles the rise of the internet, where more people became writers and creators, and this change is palpable even within Google.

Full Conversation

Logan Kilpatrick: Hello everyone, welcome back to Release Notes. I’m Logan Kilpatrick from the Google DeepMind team. Today, we have Sundar Pichai, CEO of Google and Alphabet. We’re in Mountain View. Gemini 3 has been released, and Nano Banana Pro is out with very positive feedback. So, would you like to summarize this moment of progress for us? Now we not only have top models like Gemini and Nano Banana Pro but also Vo and other music models, blossoming across the board. It feels like the longer we wait, the more emerges. So, I don’t know, do you want to paint a picture of this moment for us?

Sundar Pichai: First of all, it’s great to be here. I want to say this is an extraordinary week. You know, when you’re doing R&D internally, you envision the moment when you can truly showcase all the results. There’s nothing more exciting than that moment when you’re committed to a product. This week reflects that.

But I think this is built on years of foundation and all our deep investments. It’s been clear to me that you can see the speed at which we are making progress. All the seeds have come together, and it’s indeed very special. Just in the past few weeks, I was reflecting that we seem to be releasing new things almost every day. So, it’s a wonderful feeling.

Logan Kilpatrick: Indeed. I remember about a year and a half ago I was chatting with you, and I was complaining about something. I was definitely complaining about something, and you said something that pushed me to look at things from a long-term perspective. I’m curious about how you maintain that long-term view, especially in this highly competitive moment where it feels like an endless race to stay 1% ahead on the leaderboard. Clearly, long-term vision is crucial.

Sundar Pichai: I’ve always forced myself to step back. The pace in our industry is fast, and you want to iterate quickly, and I really enjoy that. But being able to step back, lay out a long-term plan, and stay focused on a long-term goal is always crucial.

In 2016, I wanted the entire company to be AI-first. A large part of what facilitated that moment was the 2012 Google Brain breakthrough, the introduction of Google DeepMind in 2014, and the moment of AlphaGo in January 2016. Then people noticed—many did in May 2016—when we announced our first TPU.

Yes, so in 2016, seeing all this, I was clear we were going to experience another platform shift. That was the bet on the full-stack approach, positioning Google as an AI-first company. Since then, we’ve made significant progress. There have been too many breakthroughs from Google, including Transformer. We’ve applied it to our products, like Bert and MUM, improving search, launching Google Photos, etc.

With the advent of generative AI, I realized the window of opportunity was even larger. People are ready to use this technology at scale, whether consumers, developers, etc. So how do we respond to such a moment? For us, AI means we initiated the Gemini project, spanning Google Brain and Google DeepMind. As part of that, we decided to merge the teams into Google DeepMind, significantly increasing our investments in infrastructure, data centers, TPU, GPU, etc.

Next, you know you need to get the company into a faster rhythm, right? Now that you have the technology, once the GDM team starts releasing Gemini, you can discuss the series of milestones we’ve experienced with Gemini. I’m glad you’ve played a role in many aspects of this journey. Now, how do you ensure it manifests in all our products? Many products touch billions of users, right? How do you iterate search with the capabilities these models can achieve? That’s our journey. But you know, you can step back to understand this framework. It’s very exciting because this is the first time that when you adopt a full-stack approach, every layer of innovation permeates the entire system to the top.

Logan Kilpatrick: That’s how I explain pre-training. DeepMind’s pre-training works remarkably well on Gemma. My model, like post-training, reinforcement learning acts as an accelerator for underlying capabilities. I feel our infrastructure is similar.

Sundar Pichai: Absolutely right. You improve the infrastructure, optimizing model performance in training, testing, and computation. Where do we improve the model? Or how do you acquire those capabilities and reflect them in products, right? How does Nano Banana appear in your products? Generative UI with AI mode in search, right? So, you improve at every level, not to mention providing these improvements to developers, allowing them to innovate on top of that, right? That’s where the multiplicative effect comes from. And all of this is always incredibly exciting. Watching all of this is always thrilling.

But you know, we’ve always had a long-term plan, thinking about how to achieve our goals. Some aspects take time because we adopt a full-stack approach. When we needed to respond to the challenges posed by the AI era, I don’t know, our capabilities were indeed insufficient at that time. So we needed to invest to scale up, ensuring all aspects reach a certain scale to guarantee fixed costs. Therefore, if you stand from an external perspective, you might feel we are progressing slowly or lagging behind. But in reality, we’ve been building all the necessary frameworks and then advancing execution on that basis. Now we have succeeded, and you can see various teams are moving forward rapidly.

Logan Kilpatrick: Yes, seeing all this is incredible. You mentioned Gemini appearing in all our products. I think I was discussing this anchor point with Josh and Tulsi regarding the challenge—I feel some of the challenges facing some of these releases now are synchronizing releases, and it may not even be from a product perspective but from a capacity perspective, and how to ensure models are well presented across all different product experiences. I feel this introduces a new… we almost… I’ve commented on similar things; we’ve nailed the model itself, and there’s clearly more work to do, but deploying them across all Google product interfaces is extremely difficult, which reminds me of this. I had an insight at this year’s I/O, and I want to confirm with you because maybe you have a different view, but historically, apart from your Gmail or your Google account, there seems to be no such thread that connects the entire suite of products Google has, from Cloud to Waymo to Search to all other products like Gmail. And now, it feels like Gemini is that thread, genuinely connecting every one of our products. It feels like something magical is happening. I don’t know your reaction to that.

Sundar Pichai: I think it is Gemini. I know your perspective is great. But for me, Gemini signifies much more. It clearly embodies the essence of the AI-first strategy. Indeed, because now we have tangible, understandable products like Gemini. And you’re right; Gemini can enhance services across the board, from search, YouTube, cloud services to Vemo.

Regarding the release of Gemini 3, one thing I love is that you mentioned synchronized releases. We’ve switched many products in synchrony. But for me, seeing on X, it could be Copilot or Replied or Figma, you know, everyone gathering together, synchronously releasing. Yes, indeed. For me, that’s scalable innovation, right? Not just us, but other companies in the world. Seeing all this is remarkable.

Logan Kilpatrick: Yeah, awesome. Looking at other posts, and that Nano Banana Pro moment, I’m sure you spent quite a bit of time studying this model. People are going crazy for it; it’s fantastic. I must continue…

Sundar Pichai: I can’t help but ask, are we raising the world’s productivity, or merely satisfying entertainment needs? Is this net progress or not? Those infographics look amazing. I believe when we move beyond the entertainment phase, I just saw Ajrim on X posting his core weaving analysis infographic. So, you know, it prompts me to study it closely. Years ago, a problem brought by PowerPoint was that people kept making more and more slides, and I used to collect a lot of information, and the amount of information kept growing. Perhaps with Nano Banana Pro, we return to a tool that can compress information and present it to the world in a more understandable way.

Logan Kilpatrick: Yes, that’s precisely what I was going to say. Historically, I’ve personally been skeptical about how much use many generative media models have for the world. Clearly, from an entertainment perspective, it’s useful, but it feels like Nano Banana Pro has crossed the chasm, especially in infographics and fact-checking in conjunction with Google Search, effectively aligning with what I believe I can clearly see how this becomes part of Google’s mission—to organize the world’s information and make it universally accessible—through those infographics. It shocked me; it was so interesting. I think it’s a good reminder that we will see those use cases. I remember when we created some content, the Nano Banana team mentioned they didn’t even deliberately try to make the infographics look good; it just naturally happened as the model became very powerful, and the text rendering capabilities significantly improved.

Sundar Pichai: That’s interesting. Another thing it shows me is how much latent creativity exists in the world. What we’re witnessing is another wonderful thing: I believe people will express themselves, and we’re providing tools for them to realize their ideas as they envision them, right? So I think otherwise, we’ve been limited by the tools in front of people. You might not realize it, but we’re creating increasingly expressive tools, and they are becoming easier to use for more and more people. So seeing all this, you know, is also incredibly exciting.

Logan Kilpatrick: I have another question about this aspect, but one of the… I have to credit Tulsi. Tulsi suggested I ask you this question while talking to you because she was curious about how you measure the success of these moments when you see these releases and significant moments for Google happening. Is it the online feedback? Is it how the first-day adoption looks? Or how do you measure whether this truly brings change to Google?

Sundar Pichai: You see, on release day, I’m quite active, trying to understand what’s working. I’m looking for feedback, for example, I’m trying to see how ordinary users are experiencing the product on X. I might reply to people saying, look, that’s a reasonable point; we should address it. So in a sense, I assess this by observing these things. I’m very clear that internal teams are also using Gemini itself to collect and organize information. We have great dashboards. So I try to synthesize information from various sources. One of them is I need to feel it firsthand, right? So I receive reports, but I’m also trying to understand how people are using it outside, what they are posting, right? And I think that’s important. I walk up to some people or look at those big screens showing multiple dashboards, checking QPS, understanding usage, worrying about capacity issues. But all of this gives you a real sense of what people are doing and saying. That’s my way. It’s a combination of online monitoring, talking to people, walking around, and sitting down to engage with people. I want to understand, especially on the first day, it really helps me understand what’s working and what isn’t.

Logan Kilpatrick: I feel like I can also sense the excitement in the atmosphere in the office right now.

Sundar Pichai: Everywhere I go, I see some version of banana, some…

Logan Kilpatrick: I don’t know who did it, but Kudos to the events or facilities team; somehow they got a hundred thousand bananas into this building and made it happen. What’s exciting is that this is just the first chapter or page of the Gemini 3 saga. We haven’t launched Flash yet. We don’t have any other 3.0 category models yet. We released Gemini 2.5 Pro. In fact, when I look at a bunch of benchmarks, even 2.5 Pro isn’t leading in every aspect. Clearly, competitors have caught up. But even 2.5 Pro is still the best in class, with a lot of capabilities, and has made strides on top of 2.5 with Ro.

Sundar Pichai: 2.5 Pro at Google I/O. You can feel it’s a huge leap. One thing that makes me feel good is that Demis’s team and the GDM team are maintaining a good rhythm, right? So we push forward on this about every six months. And it’s becoming increasingly difficult, right? Because you’re… indeed, 2.5 Pro is a very good model. So to make a significant, meaningful leap from that, I think it’s challenging.

But that’s what makes the progress exciting. I know you are always excited about Flash; it’s in development and coming soon. Developers are excited about Flash because it allows you to serve more people. In the pursuit of the frontier, this is indeed important. So I’m excited about 3.0 Flash. I think it will be an excellent model. It may be the best one we’ve had so far, right? We will see what our internal pre-training team has in mind for the next version.

So this culture of continuous innovation and release makes this moment special, and it absolutely feels like as we enter 2026, with our comprehensive progress across all layers of the tech stack, there will be many exciting advancements.

Logan Kilpatrick: Do you have any strange or interesting release day rituals? Or is it just about getting through the day?

Sundar Pichai: Well, usually, my morning habit is a bit sad; I pick up my phone to understand what’s happening in the world. In fact, I don’t even check Google emails because my thought is that if something interesting about Google happens, it will be in the news. So I try to step back and absorb the news. That’s my approach. So largely, the ritual becomes using our products. When we appear in the news, I try to understand the questions you mentioned about how it works. So that’s my main habit. On release days, I try to keep the schedule less structured so I can spend time walking around to those teams dedicated to the product, seeing them, and understanding how they feel about what they released. That interaction is very important to me. So yes.

Logan Kilpatrick: I have another interesting observation about this. I think Dennis and others might have talked about this internally, but in the grading canopy office, there’s a micro-kitchen where a lot of DeepMind activities happen. Every time I’m there, it makes me feel… clearly, Google is vast, global, and all these things are happening. But that blue MK makes Google feel small and intimate. I’m curious if there’s anything interesting about that, like how you… it feels small and intimate, and I’m curious how you…

Sundar Pichai: Oh, that reminds me of early Google. Clearly, you know, I often went there. Perhaps, you know, Sergey was there, and there were people like Min, Jeff, and Sanjay, Parr program still, they were all making their espresso. How can you feel the culture more than watching people make espresso there? I would never dare to make espresso there. I know a lot about how to make good coffee, but I feel a bit shy among that group. But, you know, just last week, Demis and Oriol were still walking around there, you know, it’s talent dense, and people are constantly exchanging ideas. Visitors come. The exchange of thoughts is very active. So I like that, you know, it reminds me of what the company looked like in its early days. Some of our service teams, like Emma and others, are also there. You know, when I mention I want to check QPS, that might be the place I go, you know, I wander in front of these people’s screens trying to understand what’s happening. So that’s definitely part of what I love about how the company operates.

Logan Kilpatrick: Yes, my Google feature request is that we need to somehow recreate something like MK in all PAs. I don’t know how to achieve that, but…

Sundar Pichai: You know, other teams have similar versions too. I think it really helps pull people back to the office because when you’re there, you realize the value of exchanging ideas. You can still return to your place of work and have focused time, but you know, that moment is really helpful. I think so.

Logan Kilpatrick: So far, much of what you’ve talked about regarding AI seems like we’re making these very long-term investments and laying the groundwork for the company’s success, such as Cloud doing well, Waymo doing well, and hopes for quantum computing as well. Just announced a bunch of other… quantum computing things are beyond my understanding, but I’ve been trying to understand it through the “Vibe Coding” experience…

Sundar Pichai: That’s one of the ways to test whether Gemini 3 can help understand these topics more deeply, which is appealing.

Logan Kilpatrick: Yes, it can bring everything to life, and I’m satisfied with what we want to express with that slogan. But how do you view infrastructure building for the next decade? Or have we already realized that AI is key, so now all hopes are pinned on it? I’m curious how you view the development prospects for the next decade and what areas we should invest in now to prepare for the next phase of success.

Sundar Pichai: Oh, you see, I think this has always been important, right? You know, ten years ago, we bet on AI, and we invested deeply and in a full-stack manner. We bet on building other large new businesses to diversify the company, investing in YouTube, investing in cloud computing. Google is regarded as a cloud-native company you can’t imagine, but we didn’t fully offer this service externally at that time. So that was a deep, large-scale investment in cloud computing. You know, Waymo, these take time. Waymo is a long-term investment. I think the turning point we see now is far beyond that.

There will always be people predicting the future, right? Quantum computing is a fantastic prediction. I believe that in about five years, we will be as excited about quantum computing as we are about AI today. But I’ve been thinking about this timeframe. For example, two weeks ago, we announced the “Project Sun Catcher,” where we will build data centers in space. Clearly, this is like the moon landing project. Now it seems some ideas are indeed crazy. But you know, when you really calm down and envision how much computing resources we will need in the future, everything becomes reasonable; it’s just a matter of time. So how can progress be made? You need to work backward, set 27 milestones, and then push forward step by step. So by 2027, we might deploy some TPUs in space. Oh, maybe we will encounter a Tesla sports car flying in space. Just thinking about it is interesting.

But this is an example of the long-term projects you want to undertake and implement, along with projects like AlphaFold and exciting work in robotics like Wing. So, you know, looking long-term, continually making progress.

Logan Kilpatrick: When I see TPUs going to space, I contacted Thomas and said we should put Gemini on a lunar rover and let it explore the moon. That would be a great marketing campaign, even if it’s not super useful scientifically. So…

Sundar Pichai: Who knows, maybe the product has already done something somewhere.

Logan Kilpatrick: I believe it has. This… you mentioned this thread earlier, that the continuous enhancement of capabilities equates to raising the baseline threshold of creativity for everyone. I personally feel that way. I feel like I’m not inherently super creatively artistic. However, now I can handle tasks that historically required creativity to accomplish, which empowers me. I feel like I’ve actually become more creative, and even the way I view the world has changed because of these tools, and I’m no longer worried about not being able to do something.

I think “Vibe Coding” is a huge example. This is a key moment; this power—one of the most transformative forces in history, the ability to create software and code—is now accessible to more people. I’m curious about you; obviously, you sometimes engage in “Vibe Coding.” I’m curious how you view that moment when AI builders (not just traditional software engineers) can…

Sundar Pichai: Create things. What excites me now is that this is almost like the rise of the internet, where blogs suddenly emerged, and more people became writers, right? And YouTube, where more people became creators. Yes, you can feel this change in the programming field, even internally at Google, where the number of people submitting their first certifications has surged, right? And that’s precisely because of these tools; they make it all easier, right? You know, maybe you are a product marketer with an idea. How would you describe it in the past?

Now maybe you’re a bit like using “Vibe Coding” to get it out there and show it to others. So you can see this in action. I just spoke with a team member who doesn’t code but has been trying; he teaches his son Spanish verb conjugations and uses Gemini 3’s HML animation pages to describe the sun. You see, when you hear stories like that, and this person is a member of our communications team, right? So you can see how everyone is starting to get involved. So this is very promising. In my limited time, I’ve tried it too.

It’s almost like, you know, not just “Vibe Coding,” but these IDEs now make coding so enjoyable, right? Of course, I’m not dealing with large codebases where you have to ensure everything is correct, and security must be in place. So you know, those people should raise their opinions. But I do think I feel things are becoming more approachable. It’s exciting again. And the amazing thing is, it will only get better.

Now, whenever people talk to me about Waymo, I always like to say: remember, this is the worst time for Waymo’s driving technology, right? It will only get better. Similarly, for all these tools we are developing, you know, using Gemini 3 for “Vibe Coding” in AI Studio, you know, seeing it is both astonishing, and it’s also its worst state. Yes, indeed. Both are true. So in a sense, you will see many advancements in the future. So I think this is undoubtedly an exciting moment, and I can’t wait to see what people around the world create with it.

Logan Kilpatrick: Yes, fantastic. I think my last question is, what’s next? What can we expect? There are many cool things in the pipeline, but what’s the first thing that comes to mind?

Sundar Pichai: I think some people need to take a break. I hope the team, all of us, can take a little time off. But you see, I’m excited about the roadmap for Gemini. I’m excited about how it integrates into all our products. We are also releasing new things, right? I love Flow. I’ve been experiencing Flow notebook alum. You know, it has a passionately growing community, seeing journalists working on it, people using it for their doctoral research, really putting all the research in there, it’s amazing. So there’s more to come.

In-Depth Review: Cursor vs Claude Code for Product Managers

Sun, 16 Nov 2025 00:00:00 +0000

In-Depth Review: Cursor vs Claude Code for Product Managers

Recently, I have been using two AI programming tools simultaneously.

The result shocked me: Claude Code is three times better for product managers than Cursor!

As someone who transitioned from a traditional product manager role, I can confidently say this.

Why? Because I have truly tried both.

Testing Background: Why This Comparison

Case Study: I used both tools to create a complete user management system.

Analysis: I am a typical product manager with no coding skills. Therefore, the tests were conducted entirely from a product manager’s perspective.

Testing Dimensions:

Learning Curve (required technical knowledge)
Communication Efficiency (AI’s ability to understand requirements)
Code Quality (reliability of generated code)
Debugging Difficulty (ease of fixing errors)
Completion Speed (time from requirement to runnable product)

My Identity: Product manager with weak technical foundation, only able to write simple HTML.

Testing Project: A complete management system including user registration, login, data management, and permission control.

Goal: To see which tool is better suited for product managers to quickly realize their ideas.

✅ Summary: Real evaluations are meaningful from a product manager’s perspective.

Round 1: Learning Curve Comparison

Case Study: Difficulty of getting started with both tools from scratch.

Analysis: Cursor requires some technical understanding, while Claude Code is simpler for product managers.

Cursor’s Learning Curve:

Day 1:

Need to understand what an IDE is and what plugins are.
Must learn basic terminal commands.
Confused about what package.json is.

Day 3:

Starting to understand code structure.
Still unable to debug.
Stuck when encountering bugs.

Day 7:

Can use it basically, but not efficiently.
Frequently need to consult technical documentation.

Claude Code’s Learning Curve:

Day 1:

Simple installation, clear commands.
Can start interacting immediately.
Error messages are very clear.

Day 3:

Able to independently complete small features.
Can resolve issues by asking directly.

Day 7:

Proficient use, very efficient.
Able to handle complex requirements.

Conclusion: The learning cost of Claude Code is 60% lower than that of Cursor!

✅ Summary: Product managers’ time is precious; Claude Code is more user-friendly.

Round 2: Communication Efficiency Comparison

Case Study: Implementing the “User Permission Management” feature using both tools.

Analysis: The understanding of the product manager’s communication style varies significantly between the two AIs.

Cursor’s Communication:

My Requirement: “Create a permission management system where administrators can see all users, and regular users can only see themselves.”

Cursor’s Response: “Please provide specific technical implementation details, such as:

What permission framework to use?
How to design the database table structure?
How to configure the front-end routing?”

My Confusion: These are all technical details that I do not understand!

Claude Code’s Communication:

Same Requirement: “Create a permission management system where administrators can see all users, and regular users can only see themselves.”

Claude Code’s Response: “I will help you design a complete permission management system. Let me first confirm a few business logic points:

What types of user roles are there?
What pages can different roles access?
Is data permission control needed?”

My Feelings: This is exactly the conversational style we product managers are familiar with!

Efficiency Comparison:

Cursor: I need to spend an extra 2 hours learning technical terms.
Claude Code: Directly starts discussing business logic.

✅ Summary: Claude Code understands the language of product managers, while Cursor requires translation.

Round 3: Code Quality Comparison

Case Study: Comparing the user registration function code generated by both tools.

Analysis: Code quality directly affects subsequent maintenance and feature expansion.

Cursor Generated Code:

Advantages:

Code structure is standardized.
Performance optimization is well done.
Adheres to best practices.

Disadvantages:

Too complex for product managers.
Difficult to locate issues when errors occur.
High modification costs.

Actual Case: The generated registration page contains over 40 files, which product managers cannot understand.

Claude Code Generated Code:

Advantages:

Code is simple and easy to understand.
Error handling is very user-friendly.
Easy to modify.

Disadvantages:

Performance optimization is not as good as Cursor.
Code standardization is average.

Actual Case: The generated registration page only has 8 files, with a clear structure that product managers can understand.

Implications for Product Managers:

Cursor: Professional but complex, suitable for those with a technical background.
Claude Code: Simple enough, suitable for those with a pure product background.

✅ Summary: For product managers, code that is understandable is good code.

Round 4: Debugging Difficulty Comparison

Case Study: Comparing the ease of fixing issues after a feature malfunction.

Analysis: Product managers dread encountering bugs, as it often means needing to ask for help.

Cursor’s Debugging Experience:

Scenario: 404 error displayed after user login.

My Process:

Check the error log, full of technical terms I do not understand.
Ask Cursor, which says I need to check the routing configuration.
What is routing configuration? I need to learn that too.
After 2 hours of struggle, I still had to ask a technical colleague for help.

Pain Point: Encountering errors means going back to square one and still needing to ask for help.

Claude Code’s Debugging Experience:

Scenario: Same 404 error after login.

My Process:

Ask Claude Code: “What should I do if a 404 error appears after user login?”
Claude Code asks: “Where should it redirect after a successful login?”
I say: “It should redirect to the user center page.”
Claude Code: “Okay, I will fix this issue; it only requires changing one line of code.”
Two minutes later, the issue is resolved!

Feelings: It felt like talking to a tech-savvy colleague; the problem was solved quickly.

Efficiency Comparison:

Cursor: Requires 2 hours + asking for help.
Claude Code: Requires 2 minutes + self-resolution.

✅ Summary: Claude Code enables product managers to achieve true technical independence.

Round 5: Completion Speed Comparison

Case Study: Time statistics for completing a complete user management system using both tools.

Analysis: Comparing the total time from requirement to runnable product.

Cursor Project Timeline:

Environment Setup: 4 hours
Basic Learning: 8 hours
Feature Development: 16 hours
Debugging and Fixing: 6 hours
Optimization and Refinement: 4 hours
Total: 38 hours

Claude Code Project Timeline:

Environment Setup: 1 hour
Basic Learning: 2 hours
Feature Development: 8 hours
Debugging and Fixing: 2 hours
Optimization and Refinement: 1 hour
Total: 14 hours

Conclusion: Claude Code is 63% faster than Cursor!

More importantly: During the process with Cursor, I often wanted to give up, while the experience with Claude Code was smooth throughout.

✅ Summary: Time is the most valuable resource for product managers.

In-Depth Analysis: Why Claude Code is More Suitable for Product Managers

Case Study: Analyzing the design philosophy differences between the two tools.

Analysis: The core difference lies in the completely different target user positioning.

Cursor’s Design Philosophy:

Target Users: Developers with programming backgrounds
Core Advantages: High code quality, good performance
Usage Threshold: Requires technical background
Learning Curve: Steep but yields high returns

Claude Code’s Design Philosophy:

Target Users: Anyone needing programming
Core Advantages: Natural communication, easy to get started
Usage Threshold: Only requires the ability to communicate
Learning Curve: Gentle and quickly effective

Insights for Product Managers:

Don’t pursue perfection: Code that is usable is sufficient; optimal solutions are not necessary.
Efficiency is more important than quality: Quickly validating ideas is more crucial than code standards.
Independent implementation is key: Being able to do it yourself means you don’t have to wait for others.

My Recommendations:

If you have a technical background, consider Cursor.
If you come from a pure product background, Claude Code is the better choice.

✅ Summary: What suits you best is what is best.

Practical Advice: How Product Managers Should Choose

Case Study: Providing specific recommendations based on different product manager backgrounds.

Analysis: There is no absolute good or bad, only suitability.

Situations to Choose Claude Code:

Suitable Groups:

Pure product background with weak technical foundation
Wanting to quickly validate product ideas
Hoping to independently complete product prototypes
Time-constrained and needing quick results

Usage Recommendations:

Start practicing with simple features.

Cursor Composer: A Revolutionary AI Coding Assistant

Wed, 12 Nov 2025 00:00:00 +0000

When coding, do you find AI assistants either too slow to keep your flow or not smart enough to produce quality code? Cursor’s newly released Composer model breaks this dilemma by leveraging reinforcement learning (RL) technology to achieve a peak in both intelligence and speed—boasting a programming efficiency four times that of models with equivalent intelligence, while precisely adapting to real codebase standards.

Have you ever wondered why AI programming assistants often feel “almost there”? They are either smart but frustratingly slow, or quick but produce code that just doesn’t feel right. This contradiction troubled me until I saw Cursor’s AI researcher Sasha Rush share insights at Ray Summit 2025. They introduced a new model called Cursor Composer, which solves this problem with a completely different approach: training an AI agent that is both smart and fast through reinforcement learning (RL).

After listening to the entire presentation, my biggest takeaway was that this is not just a technical advancement but a shift in mindset. The Cursor team is not chasing generic benchmark scores but focusing on solving real-world programming issues. They use reinforcement learning to train the model in real codebase environments, allowing it to understand coding standards, learn to use various tools, and know when to execute tasks in parallel. More importantly, they integrated the entire product infrastructure into the training process, allowing the AI to behave like a real user using Cursor during training. This “training as product” philosophy made me rethink how AI tools should be built.

Why We Need a Fast and Smart Programming AI

Sasha Rush opened the presentation by noting that Cursor Composer performs almost on par with the best Frontier models in their internal benchmarks, outperforming all models released last summer. Its performance is significantly better than the best open-source models and those marketed as “fast.” What’s truly impressive is that this model’s token generation efficiency is four times that of models with equivalent intelligence. This means it is not only smart but incredibly fast, even outpacing products specifically designed for rapid coding.

I have always believed that the “speed” of AI tools is not just a technical metric but a core aspect of user experience. Imagine you’re coding and suddenly need to refactor a complex function. If the AI assistant takes 30 seconds to provide suggestions, that’s enough time to break your concentration. However, if the AI can respond in 2 seconds, you can maintain your flow and stay immersed in coding. This “speed that doesn’t interrupt your thought process” is the real value.

The Cursor team understands this deeply. Their inspiration came from one of the most popular features in the Cursor application: Cursor Tab. It’s a fast, intelligent model that feels very smooth and enjoyable for users. Sasha Rush mentioned that making the model fast enough to support interactive use helps developers maintain their thought chain and stay in a workflow state. They aimed to build an agent model that offers a similar experience. They created a prototype model, codenamed Cheetah, specifically designed to provide a fast experience for agentic coding. After releasing this prototype, user feedback was overwhelmingly positive, with many saying it felt “completely different,” even like “alien technology.” This convinced them that building a smarter model while maintaining the same efficiency would lead to a revolutionary experience.

I particularly resonate with Sasha Rush’s point: they are not pursuing arbitrary benchmark scores but are focused on creating a model that feels good to use in real programming work. They built an internal benchmark from their own codebase to measure the model’s ability to work within large codebases and whether it adheres to the codebase’s standards. These intelligent factors are what truly matter in everyday software engineering. Many times, AI models score high in standard tests but perform mediocrely in real work scenarios because they are not optimized for actual workflows.

The Cursor team’s goals are dual: to be both intelligent and fast. “Fast” means not only efficiently generating tokens but also running very quickly in the editor. This requires the model to produce edits rapidly and utilize techniques like parallel tool calling to generate results quickly. When you combine these two objectives, you get a model that feels entirely different in practice. In demonstration videos, users submit a query and immediately see the model calling multiple tools, executing terminal commands, searching in the codebase, making edits, and writing to-do items, all culminating in a complete edit and summary of code changes in just one or two seconds. This experience is completely different from typical editor agents used daily.

Agent RL: Making AI Work Like Real Developers

Sasha Rush spent considerable time explaining how they use agent RL (agent reinforcement learning) to train Composer. I found this part particularly enlightening as it reveals the mindset required to build genuinely useful AI tools.

From the user perspective, the workflow with Cursor is straightforward: users submit a query to the Cursor backend, and the agent reads the query and performs a series of tool calls. Sasha Rush explained that we can primarily understand the agent as interacting within a “tool space.” It can choose from a set of tools that can alter the user’s code. In reality, Cursor uses about 10 tools, but we can simplify this to include reading files, editing files, searching the codebase, collecting lints, and executing terminal commands. The agent can call these tools serially or in parallel if it believes that will yield better results.

At its core, this agent is still just a large language model generating tokens. Some of these tokens can be understood as forming XML patterns that enable it to call tools and their parameters. However, from a reinforcement learning perspective, we can primarily understand it as taking actions in the combination space of tool calls. When you look at Cursor’s frontend, what you see in these rollouts is the process of combining all different tool calls to make changes. For reading operations, the frontend simply summarizes them; for editing, you see the entire change in real-time; and for terminal calls, you see both the tool calls and the terminal outputs. This is essentially how the agent acts in your IDE world.

What I find most interesting is how they conduct reinforcement learning training. Sasha Rush emphasized that they strive to simulate the way Cursor operates in production as closely as possible. This means they treat training data as user queries sent to the model, and the agent calls a series of tools to attempt to achieve the goal. However, the difference with reinforcement learning is that they perform many different rollouts from the same starting point. You can think of this as running many instances of Cursor in parallel. In rollout 1, the model might read a file and then edit it. But in rollout 2, due to the probabilistic nature of LLMs, it might follow a different sequence of tools and paths. They then score the outputs of these two choices to determine that rollout 2 is better than rollout 1, and update the model parameters based on this change.

It sounds simple, right? But Sasha Rush noted that all the interesting challenges arise from how to scale this basic process to the extreme, and each step of the scaling process presents its own challenges. This reminds me that often the core ideas of technology may be simple, but the real difficulty lies in how to execute them to the fullest and make them practically applicable.

Three Major Challenges: Matching Training and Inference, Long Rollouts, and Consistency

Sasha Rush elaborated on three core challenges encountered in this agent-style reinforcement learning. I find these challenges highly representative; they apply not only to programming AI but also to nearly all scenarios that require training AI agents in real environments.

The first challenge is matching training and inference. They need to train a mixture of experts language model for optimal parallel performance, which requires distributed training across thousands of GPUs. If you’re just doing pre-training or supervised fine-tuning, that’s already challenging enough, but it’s doubly difficult when doing reinforcement learning because you must have both a training version and a sampling version that must work in sync. I believe this challenge reveals a deeper issue: the model used in real products and the model used in training must maintain a high degree of consistency in architecture, behavior, and performance; otherwise, what is trained may not work at all in production.

The second challenge is long rollouts. When they train with real coding changes, rollouts are much more complicated than those demonstrated. In modern models, rollouts use 100,000 to 1,000,000 tokens and involve hundreds of different tool calls throughout the process. Complicating matters further, different rollouts may involve varying numbers of tool calls, potentially requiring very different amounts of time. This reminds me that real-world tasks are often much more complex than we imagine. A seemingly simple request like “refactor this function” may require the AI to read a dozen related files, search for usage examples in the codebase, run tests, check lints, and only then make the correct modifications. If training only uses simple toy examples, the model will never learn to handle such complexity.

The third challenge is consistency. What they are doing is essentially “training through product production.” They have a Cursor agent and want to simulate it as closely as possible in reinforcement learning. This means they want to use the exact same tool formats and responses as in the production product but on a larger scale. This challenge is particularly interesting because it breaks the boundaries of traditional machine learning. Typically, we separate training environments from production environments, but the Cursor team chose to keep them as consistent as possible. The benefit of this approach is that every technique and tool usage learned during training can directly transfer to the real product.

Sasha Rush emphasized that all three of these issues reflect challenges in scaling machine learning systems, but the actual solutions to these challenges are infrastructure choices. I completely agree with this viewpoint. Often, we view machine learning as purely algorithmic and mathematical problems, but in reality, whether an idea can be turned into a genuinely useful product often depends on how robust and flexible your infrastructure is.

Infrastructure: The Key to Making the Impossible Possible

Sasha Rush spent a lot of time discussing their infrastructure architecture, which I find very worthwhile to understand in depth, as it demonstrates what is needed to build genuinely scalable AI systems.

At a high level, they have three different servers: the trainer, inference server, and environment server. The trainer primarily uses PyTorch and resembles a standard machine learning stack scaled to a very large size. The inference server mainly uses Ray to orchestrate rollouts. The environment server uses microVMs to launch stateful versions of these environments, allowing them to make file changes, run terminal commands, and execute linters. You can think of this as running a mini version of Cursor. These three parts need to interact with each other to form a complete training loop.

Regarding the trainer, they made a very interesting optimization: they developed a custom kernel library that supports low-precision training. Low-precision training speeds up the training process and allows them to run sampling efficiently without requiring any post-training quantization. They use a microscaling format called MXFP8. The idea is that they can work with FP8 precision but utilize an additional scaling factor to achieve better precision and higher quality training. Sasha Rush mentioned that they developed a custom kernel using this microscaling format for the latest NVIDIA architectures, providing a 3.5x speedup on Blackwell chips for the mixture of experts layer.

I believe this focus on low-level optimization is crucial. Many AI teams might be satisfied with using off-the-shelf training frameworks and standard precision, but the Cursor team chose to dive deep into kernel-level optimizations. This investment not only brought significant speed improvements but also enabled them to train larger, more complex models while maintaining efficiency in both training and inference. This “refusal to settle” attitude is, in my opinion, a common trait of top teams.

The inference server faces the primary challenge of stragglers (processes that lag behind). If you don’t think through this process and just let the agent do its thing, you will encounter issues. This is because rollouts may call terminal commands and install entire libraries; they can do whatever they want. So if you run 10 rollouts, they may return at different times. They addressed this issue by using Ray and a single controller interface, allowing them to balance the load across many different threads and processes, making this part of the process efficient.

I find this issue particularly illustrative of the complexities of real-world AI systems. Ideally, all rollouts should take about the same amount of time, but in reality, they can vary widely. Some may complete by reading just a few files, while others may require running complex build processes. If you cannot effectively manage this heterogeneity, the entire training process will be dragged down by the slowest rollout, leading to wasted resources and inefficiencies.

Perfect Integration with Production Environment: The Philosophy of Training as Product

One point that Sasha Rush emphasized left a strong impression on me: their goal is to train through the production of the Cursor product. One interesting aspect of Cursor is that they can simultaneously design the product itself and the machine learning training. Fortunately, during the process of building the reinforcement learning stack, Cursor released a product called cloud agents. This allows offline use of the agent, and Sasha Rush mentioned he often uses it to check model performance while commuting on the subway. As part of this product, they launch virtual machines of user environments, allowing the agent to change code and execute terminal commands. They can use the same infrastructure for reinforcement learning training.

This means they have a production agent server that is identical when running the cloud agent and during reinforcement learning training. I think this is a very clever design decision. Many companies completely separate training environments from production environments, leading to models trained that do not perform as expected in real products. But Cursor chose to keep them entirely consistent, so the model learns how to perform better in real products during training.

Of course, this also brings challenges. The workload during peak reinforcement learning training can be much more bursty than running a standard product. So they must handle this burstiness when launching many environments for training, ensuring the product runs smoothly. Sasha Rush showcased a dashboard they built with Composer that displays backend utilization. I find this detail interesting as it shows they have begun using the tools they built to improve their workflows.

You might wonder why it’s worth spending so much time actually using the real production environment. They could simulate all these different structures or attempt to mimic how it works. But Sasha Rush provided a compelling reason: they can introduce specific tools they believe are very valuable for the agent. One of these is that they trained their own embedding model for powerful semantic search. When you use Cursor, it indexes all your files, allowing the agent to query in natural language to find files it might want to edit.

They found that this semantic search capability is beneficial for all the different agents used in Cursor but particularly advantageous for Composer. This is because they can train the model as an advanced user of this tool using exactly the same model and structure as in production. This realization made me understand that AI tools not only need to be smart but also need to know how to effectively use the tools available to them. Just as a great developer not only understands programming languages but also knows how to use IDEs, debuggers, version control systems, etc., a great AI agent also needs to learn how to fully utilize its toolbox.

Performance of Composer One Week After Release: RL Really Works

Sasha Rush shared some observations from the first week after Composer’s release, which deepened my understanding of the potential of reinforcement learning.

The primary evidence that convinced them of the effectiveness of reinforcement learning is the improvement in model performance as they ran increasingly longer rollout-check-update cycles. The model’s initial performance was roughly on par with the best open-source models in the field, but as training progressed, its performance on benchmarks steadily improved. The x-axis of this graph is a logarithmic scale of computational volume, so they invested significant computation in the reinforcement learning process. But they saw returns associated with this computation, with model performance rising to the level of their released version.

I believe this is a very good signal of the scalability of reinforcement learning, particularly its ability to scale to complex specialized tasks. Many people question whether reinforcement learning can work on complex real-world tasks, but Cursor’s experience shows that with sufficient computational resources and the right infrastructure, reinforcement learning can indeed bring models to the forefront in specific domains.

They also found that they could train the model to act in ways they deemed useful from a product perspective. Sasha Rush previously mentioned that they wanted the model to be fast not only in generating tokens but also in the end-to-end user experience. One key component of this is enabling the model to call parallel tools. As training progressed, the model was able to call more parallel tools and respond to user queries faster. They believe they can further advance this in future training.

I find this discovery particularly valuable because it indicates that reinforcement learning can not only enhance the model’s “intelligence” but also shape its behavioral patterns. Through appropriate reward design, you can teach the model to work more efficiently, such as parallelizing tasks and prioritizing critical steps. This behavioral optimization is challenging to achieve with traditional supervised learning.

They also found that the model learned better agent behaviors. Initially, it made too many edits without sufficient evidence. As training progressed, the model began to read more files and conduct more searches to find the correct editing locations and make appropriate changes. This reminds me that good programming is not just about writing code; it’s more about understanding context, finding the right places, and making reasonable decisions. Composer learned these “soft skills” through reinforcement learning.

Perhaps most importantly, users seem to love it. They released Composer a week ago, and the primary feedback is that the combination of speed and intelligence unlocks a different way of programming. People are no longer starting an agent and then scrolling through Twitter while waiting for results; they are quickly getting results and moving on to the next question. As a programmer and developer, this is genuinely exciting. Sasha Rush noted that many internal developers are now using it in their daily work. I believe this is the best validation of a product: when the people building the tools are using it every day.

My Thoughts on Building Specialized AI Models

After listening to Sasha Rush’s presentation, I have several profound insights to share.

First, I believe that reinforcement learning is indeed very suitable for building such specialized models. This is a paradigm shift we have seen in the development of large language models over the past few years. Reinforcement learning facilitates the ability to build highly intelligent target models in specific customized domains. In the past, we always pursued general models that could do everything, but Cursor’s experience suggests that models deeply optimized for specific tasks may outperform general models in those tasks. This makes me think that perhaps in the future, we will see more of these specialized models: ones dedicated to data analysis, front-end development, system architecture, each excelling in its own field.

Another aspect that fascinates me is how AI systems have changed the process of research and development itself. Sasha Rush mentioned that he and many in the team now have their daily work assisted by the same agents they are building. They use these agents to build dashboards, backend systems, and various other components. This allows them to act quickly with a small team. I find this a very interesting bootstrap process: the AI tools you build not only serve users but also serve you, enabling you to improve this tool more rapidly. This positive feedback loop may accelerate the evolution of AI tools.

Finally, while Sasha Rush mentioned that he is not fundamentally an infrastructure expert, seeing how much reinforcement learning is driven by infrastructure development was an eye-opener for him. It is indeed challenging, requiring the integration of product, scale, and machine learning training. It touches on all aspects of modern software systems. I completely agree with this observation. In my view, future AI companies will need not only excellent machine learning researchers but also world-class infrastructure engineers. Companies that can successfully combine both will hold a significant competitive advantage.

From a broader perspective, the story of Cursor Composer made me rethink how AI tools should be built. The traditional approach is to first train a general model and then adapt it to specific tasks through fine-tuning or prompt engineering. However, Cursor took a completely different path: designing the entire system from the ground up for a specific task (programming), including model architecture, training methods, infrastructure, and product integration. This end-to-end thinking is, I believe, the correct way to build genuinely useful AI tools.

I am also contemplating the limitations of this approach. Reinforcement learning requires substantial computational resources, complex infrastructure, and tight integration of product and training. This means not every company can adopt this method. But for those with the resources and determination, this may be the best path to creating industry-leading AI products. Cursor has already proven that this path is viable, and I believe we will see more companies following suit.

Another question worth pondering is what the future of these specialized models will look like. Cursor Composer focuses on programming, but can the same approach be applied to other fields? For instance, models specifically designed for data analysis, content creation, customer support, etc. I believe the answer is yes, but each field will require its own infrastructure, tool ecosystem, and training methods. This is not an easy task, but for those who can achieve it, the rewards will be substantial.

Finally, I want to say that the success of Cursor Composer reaffirms a principle: true innovation often does not come from following current trends but from deeply understanding user needs and relentlessly striving to meet those needs. The Cursor team was not misled by the narrative that “bigger models are better” but focused on solving the real pain points of developers: how to make AI programming assistants both smart and fast. They achieved this goal through reinforcement learning, custom infrastructure, product integration, and various other means, ultimately delivering a product that users genuinely enjoy using. This user-centered, problem-oriented mindset is something all product developers should learn from.

Inside Cursor: A New Paradigm in Software Development

Mon, 10 Nov 2025 00:00:00 +0000

Introduction

In an unassuming building in North Beach, San Francisco, a company is quietly changing the rules of software development.

Cursor, the most talked-about AI unicorn of the past year, started from scratch and achieved a $100 million ARR in less than two years. Its workforce expanded from a handful to nearly 250 employees, and its products are utilized by top developers worldwide, redefining the standards for “development tools.”

Tech writer Brie Wolfson initially visited Cursor to see what made the team different but soon found herself drawn in: notebooks, job descriptions, and Slack invites flooded in as Cursor wanted her to “tell their story.”

Cursor Office | Source: Colossus

Brie agreed, stating, “Having worked in the early stages of Stripe and Figma, I sensed the ‘magic’ in the air again at Cursor. If you’ve experienced that feeling, you know how addictive it is.”

In her view, no truly “epoch-making company” has emerged in this AI era, but Cursor appears to have that potential.

She was curious about the new company paradigm the leadership aimed to establish, how Cursor’s culture formed, and wanted to participate in shaping it.

Thus, her firsthand account, “Inside Cursor: Sixty Days with an AI Unicorn,” was born, capturing the genuine rhythm of Cursor, the creativity of its young team, and a unique culture driven by its mission.

Over two months, Brie validated some expectations but was also shocked by others, such as the fact that at Cursor, the 996 work schedule is genuinely voluntary.

Key Points:

When I asked the co-founder what his biggest concern was, he said, “If people start talking about the weather at the dinner table, then I should worry.”
At Cursor, everyone is HR; everyone is recruiting talent.
Cursor does not design for “fool-proofing” because there are no “fools” here.
When a task is assigned to someone, they have full responsibility and autonomy, regardless of their position.
Cursor employees are possibly the most immersed in their own product in the world.
At Cursor, critics are also problem solvers; criticism is participation.
As a product for AI programming, Cursor views users not as “customers” but as peers.
They treat every line of code as an attempt to sculpt the world, with the ensuing commercial success being just a reward.
The young workforce, formed in less than two years, has a spirit and demeanor that is “very mature.”

01

The Culture of Cursor

Strictly speaking, Cursor does not belong to Silicon Valley; its headquarters is located in North Beach, San Francisco, an area with few other startups.

The Cursor headquarters is as understated as a university cafeteria, with no logos at the entrance, no corporate posters on the walls, and few employees wearing Cursor T-shirts or stickers on their laptops.

The office mainly consists of people sitting at desks working or discussing in small groups.

Instead of whiteboards, blackboards hang on the walls, and the furniture consists of European antiques sourced from a retired tech enthusiast in the Bay Area. The walls are lined with books, many of which are textbooks, alongside numerous worn-out covers and spines, indicating they have been genuinely read.

In Brie’s words, “It’s not polished, but it’s sincere.”

Cursor employees enthusiastically brainstorm on blackboards | Source: Colossus

Cursor does not subscribe to the online work model; it is almost entirely face-to-face: 86% of employees are based at the San Francisco headquarters or the new office in New York. At Cursor, the most effective communication method is not sending Slack messages or scheduling meetings, but directly walking to someone’s desk and tapping them on the shoulder, as Cursor puts it: “We are more of an oral culture company.”

In fact, this emphasis on face-to-face office work has overturned Brie’s previous perceptions, and she had to admit that the fluidity of in-person work is much higher: “The offline chemistry is indeed addictive.”

Collaboration within the company mostly occurs in spontaneous discussions around blackboards or desks. However, Cursor is very restrained when it comes to meetings, scheduling very few, as they place immense importance on “deep work time.”

Even the chef at Cursor enjoys “high autonomy.”

Every day, the company chef Fausto prepares lunch for everyone, and all gather at a long table to eat together. Rumor has it that he once considered quitting because creating menus for a team that doubled in size daily was exhausting, but someone on the team created an AI menu generator to help him brainstorm, and now he shares recipes and takes orders via Slack.

Conversations at lunch and dinner tables often revolve around work ideas, allowing everyone to understand each other’s current projects, ideas they are pondering, or predictions about the future of products and industries.

When Brie asked co-founder Sualeh Asif what his biggest concern was, he thought for a moment and replied, “If people start talking about the weather at the dinner table, then I should worry.”

02

The “Hunter Culture” of Cursor

If Cursor’s culture is built on face-to-face interactions, its growth relies on “hunting talent.”

In Brie’s description, Cursor’s recruitment system completely defies convention: “They view the smallest unit of hiring as a person, not a position.”

In most companies, recruitment follows a process: identify capability gaps, write job descriptions, screen resumes, conduct interviews, extend offers, and wait for onboarding.

But at Cursor, the process resembles a social hunt. Someone throws a name into the Slack #hiring-ideas channel, noting, “This is a particularly impressive person,” and the entire team immediately begins the hunt.

They brainstorm to identify what the person excels at, what they enjoy doing, and what role would suit them best. If there’s mutual interest, that “candidate” might show up in the office by Monday, just like Brie did.

Once they identify a target, the team creates a new Slack group to collectively strategize how to approach the person.

Their discussions are highly detailed: “What is this person most passionate about?” “What are they a genius at?” “What challenges can Cursor offer them?”

Because at Cursor, the assumption is that “the best people love challenges.”

Cursor employees | Source: Colossus

Another well-known tactic within Cursor is to invite the person to “just come by the headquarters for a visit.” They seem very confident in Cursor’s office culture, believing that once the person steps inside and feels that energy, it’s hard not to be tempted. This has already been validated by Brie.

Their talent acquisition methods are also unique. For instance, Swedish engineer Eric Zakariasson was recruited because he had previously held a Cursor workshop in Stockholm; engineer Ian Huang joined because he was coding with Cursor until the early hours every night.

When other companies are laying off or new startups are dissolving, Cursor’s Slack channel often sees messages like, “New Computer has dissolved; let’s see if there’s anyone we want.”

This style resembles the early “PayPal Mafia,” where everyone is both a hunter and a referrer. Cursor employees are encouraged to “scout for talent,” making recruitment a company-wide initiative rather than just an HR responsibility.

As a result, Cursor’s size exploded within a year: from fewer than 20 people last year to nearly 250 today.

Even with this rapid growth, Cursor’s acceptance rate remains incredibly low. The leadership personally reviews every hiring decision, believing in the mantra, “Better to miss out than to hire the wrong person,” but those chosen by Cursor will find ways to join.

For example, a former designer from Stripe and Notion, Ryo Lu, is an Apple fan. Cursor impressed him by acquiring an early Macintosh to gift him; German engineer Lukas Möller declined the first invitation, prompting co-founder Oskar to fly to Germany a year later for a second visit; another candidate, Jordan MacDonald, had Cursor schedule six months of coffee meetings, and when they learned she had just moved, they secretly contacted her interior designer to gift her an espresso machine. All three are now official Cursor employees.

This hunter culture has also resulted in an exceptionally high talent density at Cursor, shaping all subsequent operations: high trust, high pace, and zero nonsense.

03

No Fools and “Young People”

At Cursor, the talent density is so high that it almost feels unreal. Brie used a lengthy equation to describe their success formula:

“Engaging mission + hardcore technical challenges + winning + excellent recruitment = extraordinary talent density.”

If you’ve been in Silicon Valley, you know this statement is not just hyperbole; “talent density” is almost a bible for every company, and Cursor has turned it into a belief.

Cursor has a staggering statistic: there are 50 former founders (one-fifth of the total workforce) within the company, and 40% hail from MIT, Harvard, Columbia, Carnegie Mellon, Stanford, Berkeley, Yale, etc. Yet no one boasts about their alma maters; as Brie puts it, “They are all experts, but no one shows off.”

Moreover, Cursor is the first job for many employees, which left Brie impressed with the age distribution at Cursor.

She previously thought that when people referred to a colleague as “too young,” they were either implying that the person was somewhat unreliable or that, while capable, they were difficult to communicate with.

However, the young people at Cursor are different; they dress appropriately, have sincere eyes, speak clearly, and are polite. During discussions, they often reference history, art, pop culture, Silicon Valley history, or experiences from other industries.

This is also one of Brie’s favorite aspects of Cursor: it has a spirit that is “very mature.”

The young workforce does not indulge in internet slang or meme culture, nor do they discuss trending topics or workplace gossip. Even non-work-related discussions in work groups revolve around local cultural events in San Francisco, critiques of AI opinions from The New Yorker, or sharing tips on “how to properly fold a sheet.”

Moreover, this group of young people exhibits stable emotions. In Slack, the most commonly used emoji among Cursor employees is ❤️.

Brie recounted an incident she witnessed: during a severe outage caused by a system failure, the culprit publicly apologized in the Slack #general channel, and the channel was flooded with ❤️, with comments like “Risks are inevitable; let’s do better next time.”

“No one raises their voice, no one loses their cool, and no one panics over mistakes,” Brie wrote. But this does not mean they are lax; their calmness stems from a shared belief in each other’s professionalism and dedication, so mistakes do not trigger internal strife but rather prompt improvement.

Cursor resembles a utopia that stands in contrast to Silicon Valley: a fast-paced company that maintains an almost Zen-like calm.

Many visitors remark, “Your company is so calm.” Employees respond, “That’s just the appearance; underneath, it’s like a duck gliding on water.”

This phrase encapsulates the atmosphere at Cursor: serene on the surface, with frantic paddling beneath.

The “maturity” of each employee is also reflected in their approach: they study the world through action rather than relying solely on personal experience to generate ideas.

For instance, many employees in Cursor’s Slack create their own “brainstorming channels” (#brain-XXX), where they share thoughts, inspirations, or observations, such as “Is CMS a relic of the pre-AI era?” “A long list of insights after a client visit,” or dissatisfaction with a new feature.

There are no KPIs or expectations for responses, but if you write something interesting and insightful, you will naturally attract a group of “readers.” This fosters an “open-source thinking culture” where everyone iterates their understanding publicly.

Brie also observed a steep staircase in the office without handrails. When she asked why, the response was, “Humans know how to climb stairs.”

This statement epitomizes Cursor’s talent philosophy: We do not design for “fools” because there are no “fools” here.

Cursor Office | Source: Colossus

With such intelligent and mature talent coming together, Cursor has created a “paradise for individual contributors.”

Individual contributors (ICs) are highly valued at Cursor and regarded as the highest status role. At Cursor, ICs are driven by passion rather than commands from leadership. The working style here is very “IC”: whoever is most invested in a task takes it on; when a task is assigned to someone, they have full responsibility and autonomy, regardless of their position.

For instance, there was a proposal to run Cursor in the browser. Four engineers immediately agreed and worked on it over the weekend. As one of them put it, “We dropped everything and went into full focus mode until it was done. This was one of the most enjoyable work experiences of my life.”

Such situations happen regularly at Cursor.

04

No 9-9-6, Only Self-Driven Passion

In the tech world, Cursor is rumored to have an “incredible work intensity,” with many privately suggesting they practice 996. However, Brie states that this is actually a counterintuitive misunderstanding:

“The company does not require employees to work 996; however, a significant portion of the team loves what they do so much that they overcommit, and the workload is entirely self-imposed.”

Even Brie was influenced by this atmosphere, writing, “No one asked me to work evenings or weekends. But I just wanted to. I’m even writing this on a Saturday while my ten-month-old is sleeping upstairs.”

This working state is almost akin to the intoxication of a craftsman creating a piece: there are no KPIs or institutional demands, only the drive to “make things better.”

But Brie also admits that she nearly drowned in the pace during the first few weeks, with new problems, priorities, and tasks arising daily. Working overtime did not resolve the issues; rather, she was uncertain about the correctness, value, and reporting of her work results.

Almost every new employee experiences this “drowning feeling.” But they soon realize that this is actually the company’s trust in them: “Once you truly understand this, panic gradually transforms into confidence,” Brie wrote.

Cursor Office | Source: Colossus

This is also a typical “Silicon Valley growth curve”: throwing newcomers into deep water, and they discover they can swim.

Moreover, Cursor employees are possibly the most immersed in their own product in the world. The only ones who might rival them are those at Apple, who use their own Macs and iPhones daily.

Everyone at Cursor uses Cursor to write code, edit documents, and experiment with new features. They are both developers and users, which leads to a bottom-up product roadmap: if you want a feature to exist, that is enough reason to develop it.

When an employee is convinced that a feature is worth building, they might present it at the weekly product demo or just dive in and start working on it.

Sometimes, two employees might develop the same feature, and the final version will incorporate the best ideas from both sides.

Once development is complete, they first launch the feature in an internal version of Cursor. The team tests it internally to see if it “has life”: if everyone loves it, they keep and refine it; if no one cares, it naturally gets eliminated.

Feedback is very “Cursor-like”: everyone votes using emojis in the Slack channel, 🟢 = remove feature, 🔴 = feature is useful. Everyone makes their choice in seconds, but it often sparks lengthy and deeper discussions.

Many of the currently most popular features, such as Tab, CmdK, Agent, Bugbot, and Background Agent, have grown this way.

At Cursor, it is commonplace for people to challenge and question colleagues’ work results. Here, your ideas, code, and writing can be dissected by colleagues at any time.

But this is not hostility; it is a form of trust: everyone believes you can handle criticism and are willing to improve.

Cursor’s top developers are very aware of what constitutes a good product, so they are extremely sensitive to “subpar” offerings. They don’t just give feedback; they often “roll up their sleeves” and help out. This also shapes Cursor’s culture of “criticism as participation.”

Like all cultures, this “friction-based communication” has grown from the founders themselves.

Michael (co-founder) often encourages everyone to ask “spicy questions” during company-wide Q&A sessions, while another founder, Sualeh, is more direct: he privately messages employees asking, “What are you worried about?”

They want employees to always carry a “curious anxiety” rather than a “safe numbness.” Of course, such a culture has potential dangers.

Brie candidly states: if arrogance, office politics, emotional instability, and poor communication infiltrate this culture, it can quickly become toxic.

She has seen many genius-level individuals, but they “treat fault-finding as a sport without a genuine desire to fix things.” However, at Cursor, critics are also problem solvers; everyone sincerely hopes to achieve the best for the product and each other.

05

Everyone is Creating Something at Cursor

Cursor’s attitude towards product philosophy is also one of high confidence.

Brie summarizes it succinctly: other companies focus on lowering barriers to entry, enabling more people to get started, but Cursor focuses on raising the ceiling of functionality. They believe that only when the top users are elevated will the entire ecosystem’s standards be raised.

In addition to vertically raising the ceiling, Cursor encourages exploration of product breadth.

Beyond the engineering team, sales, operations, and marketing teams also use Cursor to build internal tools, websites, or scripts. The #built-with-Cursor channel showcases new projects daily, such as a court reservation mini-program, a wedding website for an employee, a game that feeds virtual snacks to the office dog, and a quiz about the Metropolitan Museum of Art’s collection.

This model, where everyone loves to use the product, everyone gives feedback, and everyone votes on product direction, has also shaped a unique company ritual at Cursor: Fuzz, a collective celebration that pushes perfectionism to the limit.

Whenever a major version is about to be released, whether it’s a client update or a website overhaul, Cursor holds a Fuzz event, calling everyone to come out and find bugs.

Cursor’s work culture encourages action anytime, anywhere | Source: Colossus

As a product for AI programming, Cursor does not view users as “customers” but as peers. They believe that if the tools are laggy or crash, it wastes their time. Therefore, they must avoid all bugs as much as possible before launch.

As stated in Cursor’s early documentation, “Be responsible for bugs. Bugs are inevitable, but bugs that reach the user are disappointing. We want users to program with Cursor every day; bugs or performance issues are the easiest reasons for them to switch platforms.”

The atmosphere during Fuzz resembles a ritual. Once everyone gathers, engineers form a circle, sitting if possible, or sitting cross-legged, leaning against walls, or even sitting on the backs of chairs.

The product lead shares the link to the latest build and testing instructions in Slack, and the sound of typing fills the room as everyone works hard to find bugs, interface flaws, logical errors, or edge cases.

They continuously record issues in the Slack channel, occasionally sparking debates, and even initiating instant votes to determine which solution is more elegant.

The entire process lasts an hour, resembling a hacker’s version of meditation: collective silence, extreme focus, and no nonsense. The results of Fuzz typically culminate in a lengthy list of “all the issues to fix before the next day’s release.”

After Fuzz, the product team expresses gratitude to everyone and then embarks on a long night of fixes, with those who identified issues often staying behind to help.

In other companies, testing and development are often two separate groups, but at Cursor, those who find problems and those who fix them are often the same group of people.

06

Mission as Reward

Brie mentioned that she once asked co-founder Michael, “What kind of feeling do you want the company to evoke?”

Michael did not answer directly but instead asked her, “Have you seen the Beatles documentary?”

In this documentary, the most famous band locks themselves in a recording studio for three weeks, iterating and experimenting continuously until they create “Let It Be.”

Brie believes this perfectly describes Cursor’s culture: there are no excessive strategies or lengthy slogans; everything is about continuous trial and error, collision, and adjustment in actual work. Just as band members constantly try and adjust every note, Cursor employees are continually refining every line of code and every detail of the product.

What the Cursor team truly cares about is not the “developer productivity” boasted on the company website or external press releases, but the code itself and how code and software become the infrastructure of the world.

They connect their work to street traffic lights, scientific analyses, medical records, supermarket inventory systems, and even flight control systems, treating every line of code as an attempt to sculpt the world.

This mission-driven culture has made Cursor’s commercial success a reward rather than the primary goal.

Cursor Office | Source: Colossus

Brie noted that when the company reached $100 million ARR, hearts and 💯 emojis naturally popped up in the Slack channel, but the office remained calm as everyone continued discussing the product.

This also explains why, at Cursor, few people talk about wealth or future plans. As Brie summarizes: for employees, the true reward is seeing their work directly drive better and more precise software development, rather than external wealth or status.

It seems that for Cursor employees, the meaning, challenges, and sense of achievement in their work are the most direct and tangible rewards.

Cursor is shaping the world through work itself, believing that every aspect of software development, from coding to testing to deployment, will be “intelligently” restructured.

The term “programming” is also beginning to transcend the programmer alone: it now includes designers, product managers, entrepreneurs, and even industry experts.

This means the market potential is nearly limitless, and every line of code can change some aspect of our daily lives.

User Trust Erodes as AI Tools Discriminate

Thu, 16 Oct 2025 00:00:00 +0000

Introduction

As AI tools begin to “discriminate,” user trust quietly crumbles. This article reveals controversies surrounding Claude’s membership services, account bans, and regional restrictions through real user experiences, reflecting on how AI products can balance technological innovation with user respect in global operations.

I consider myself quite moderate when it comes to paying for AI tools. I believe that if an AI has its merits and I happen to need it, paying to unlock more features is reasonable.

For example, I was impressed with Kimi’s computer use and Kimi Researcher, so I decisively purchased their membership. I found GPT’s long-term memory very useful, and I was satisfied with its responses during conversations, so I subscribed to their Plus membership. When I was doing my daily reports, I noticed that Gemini 2.5 Pro had a sufficiently long context, and GDR (Gemini DeepResearch) was quite good, so I subscribed to their Pro version.

I don’t think I’m a stingy person… At one point, I spent over 400 RMB a month on AI subscriptions. The reason I subscribed was that I believed my money was well spent; even if I didn’t use the membership much, I didn’t mind, as I trusted that spending would yield returns. Paying a bit for better tools and solutions is perfectly fine.

A Curious Attempt

Being someone who dislikes hassle, I tend to stick to a fixed set of AIs. I use ChatGPT for creating public account covers and handling tasks that require long-term memory. I use Gemini for writing tasks and occasionally generating prompts. Kimi is my go-to for factual inquiries; I usually ask it any small questions I have.

This covers almost all my usage scenarios.

One day, while scrolling through Twitter out of boredom, I saw a post from someone who claimed they had been chatting with Claude during the National Day holiday, and that Claude could accurately point out their mistakes, leaving them feeling refreshed afterward. This piqued my curiosity.

For various reasons, I had never tried Claude before, thinking my current tools were sufficient. However, after reading that blog post, I couldn’t shake the thought. “Is it really that good?” I kept pondering it as I lay in bed that night.

So, I decided to give it a try. I started with the free version and asked a few questions. I found the responses quite satisfactory, so I decided to use the money I had set aside for GPT’s renewal to subscribe to Claude’s membership.

At first, everything was pleasant. I even shared my conversations with Claude in my group chat, finding it quite entertaining.

I asked Claude to critique itself and the experiences with the other two (Gemini and ChatGPT), and I felt it provided insights I could relate to. Looking back, I surely should have slapped myself.

Because I was quickly proven wrong.

Reality Hits Hard

I subscribed to the membership this morning, thinking that since I had paid, I should chat with Claude more… However, when I opened the app on my phone in the afternoon, Claude threw an error at me.

Initially, I thought it was a network issue and tried switching IPs several times, but to no avail. When I logged in on my computer, a message confirmed my worst fears.

It meant my account was banned.

What? I just bought the membership, and now it’s banned??? I didn’t get banned while using the free version for half a day, but now that I have a membership, I get banned? What does that mean?

I was very confused and ran to ask GPT, and its response really made me laugh in frustration.

This reminded me of a line from “Let the Bullets Fly”: “Should good people be held at gunpoint?”

Because I subscribed, does that mean I get stricter scrutiny?

The novelty of having just subscribed quickly faded into frustration. Moreover, this ban didn’t just prevent me from using the model; I couldn’t even access the chat, getting stuck right at login.

This meant that if I had important tasks to complete with Claude, I would no longer be able to access them.

Alright, Claude, you really are something.

A Decision to Move On

After filling out the appeal form, a glimmer of hope crossed my mind: “What if they restore my account?” But then I couldn’t help but slap myself again.

There are so many AIs out there; why should I pin my hopes on Claude? Why should I keep putting myself in a position to be let down?

Then I realized I just wanted my money back and to have no further dealings with Claude. At that moment, I remembered why I had never subscribed to Claude before.

Because Claude, or rather Anthropic, whether as a company or through its CEO, harbors a certain level of animosity towards our country and its citizens. The CEO of Anthropic has called for a ban on U.S. chip exports to China to limit China’s AI development. On September 5, the company announced an update to its sales restrictions, immediately banning companies or subsidiaries with majority ownership by Chinese capital from using its Claude series AI services, including Claude Code.

Moreover, they have massively banned accounts of Chinese users, and even if you use special methods to change your IP, it doesn’t help.

Honestly, customers spend money expecting commensurate service, not facing discrimination through colored glasses.

I believe that AI technology should be inclusive, and products should not set boundaries or be influenced by other factors that exclude certain users.

Such behavior is undoubtedly low-end and unwise.

In summary, I will not engage with Claude again; I refuse to be a fool anymore.

Advanced Usage Guide for Cursor

Sun, 31 Aug 2025 00:00:00 +0000

Settings

Cursor is developed as a branch of VSCode, allowing seamless integration. You can import your VSCode settings directly.

After importing, all your plugins, theme configurations, etc., will remain consistent with VSCode without needing to reinstall them manually. The import is incremental, meaning that any plugins or settings already in Cursor but not in VSCode will not be removed.

Why Use VSCode Import?

VSCode has an automatic synchronization feature (including settings, plugins, themes), but Cursor does not provide synchronization. To avoid repetitive settings when installing Cursor on a new device, download VSCode, log in, sync your data, and then import it into Cursor for a seamless transition.

In VSCode, the default toolbar is on the left side, while Cursor has a smaller horizontal toolbar at the top. If you prefer a vertical arrangement, open settings, search for “orientation,” and change it to vertical.

Codebase Indexing

Codebase Indexing is a vital tool in Cursor for understanding project structure. Cursor traverses each file, recording its relative position, effectively creating a project file map. This indexing allows for more accurate results when using the file search tool during programming.

Pro users can support up to 50,000 files, while enterprise users can support 250,000 files.

Docs

Docs allows you to add special documents to the context. This is useful when Cursor cannot supplement context information through model calls, online queries, or rules, leading to bugs or misunderstandings. You can add user-defined documents to Docs, typically technical documents, personal blogs, or declarative documents that are not easily found via search engines.

Docs only accepts URL links, so the documents added should be from online sites that AI can crawl.

Context

The efficiency of AI in completing tasks relies heavily on the completeness and clarity of the context definition.

Definition

What is context? Context is the AI’s memory, encompassing all its knowledge about the current project, serving as the basis for logical reasoning.

Context is divided into two categories: instructions (user prompts) and states (various information contained in the current project).

Length

The length of context is limited, measured in tokens. You can use the tool tiktokenizer.vercel.app to calculate the token length of your input text.

The built-in model in Cursor has a different context length than those provided on model websites due to internal system prompts and optimizations that occupy some space.

Construction

Before each conversation, you should construct a detailed and clear context for the AI, which is the most effective way to reduce AI hallucinations.

In Cursor, context includes many types. Here are some core methods to add context:

File & Folders: The easiest way is to drag and drop files or folders into the conversation. This can include text, CSV, MD, and various source code files (character-based text files). Word, PDF, and PPT files are not supported.
Code: Select a piece of code and click “Add to chat.”
Git: When performing a code review, you can add a specific commit to the context.
Past Chats: If you have new ideas based on previous conversations, you can reference relevant historical chats without re-explaining the context.
Web: Used for manually specifying documents from websites that AI can crawl.
Image: Screenshots can be pasted directly into the chat to add them.
Terminals: Similar to code, select output information in the terminal and click “Add to chat.”

Prompt

Prompts are also part of the context and directly influence model output. Each built-in model in Cursor has internal system prompts. This repository collects various internal system prompts from AI tools: github.com/elder-plini.

Writing Principles

To write clear and understandable prompts, follow these key points:

Use Markdown format, employing ordered and unordered lists, proper line breaks, and spacing.
Focus on one task at a time, with the entire prompt centered around that task.
Provide positive examples, as AI excels at analogy reasoning.
Be as detailed and purpose-driven as possible, avoiding vague instructions and actively adding context.

Writing Tips

You can explicitly include context as part of descriptive statements.

When using multiple images, you can use numbers to indicate them.

Press Shift + Enter to create a new line.

Model Switching

First, identify which models you need for your daily tasks. Here are some categories:

Simple tasks requiring quick responses: claude-4-sonnet.
Tasks requiring more context while ensuring quality: claude-4-sonnet-thinking, genimi-2.5-pro.
Complex tasks with extensive context and no initial ideas for AI: o3.

To avoid the hassle of switching models for each conversation, keep it set to claude-4-sonnet by default.

Enabling Models

If you notice some models are missing from the list after clicking “add model,” you may not have enabled the switch in settings. Generally, only enable the most commonly used models and avoid older ones, adhering to the principle of “favoring the new over the old.”

Auto & Max

Do not enable the Auto option, as you cannot determine which model Cursor is using behind the scenes; it could be an outdated or free model. Always specify the model manually to ensure you get your money’s worth from your subscription.

The MAX mode should also generally be avoided, as it incurs additional charges and significantly increases the context length for models, suitable only for very complex tasks, consuming tokens at about five times the normal rate.

Three Modes

Ask

This mode should be your most frequently used setting. Set it as the default mode.

Ask is primarily used for multi-turn conversations with AI to reach conclusions or agreements. It has the following features:

Access to context.
Ability to call tools, perform web searches, and read files.
Can invoke MCP.
Does not modify or create any code files.
Does not execute terminal commands proactively.

Before using AI for a task, you should have about 10 rounds of conversation, with 7 rounds in Ask mode discussing and refining details, followed by 3 rounds using the Agent model to finalize code construction.

Agent

This mode is used for producing results, allowing you to add, delete, modify, and query files in your project, write various codes, and generate detailed Markdown documents. It is the builder mode.

Manual

This mode is rarely used and may be removed in future versions of Cursor.

Parallel Conversations

Creating Parallel Conversations

Scenario: You are currently in a conversation but need to query another question.

Tip: You can create a new chat, but this will exit the current conversation window, making previous chats invisible. The best approach is to create a parallel conversation, which is independent of the current chat context and does not affect other conversations, essentially adding a separate sub-conversation window within the current chat.

Trigger: First, click to focus on the chat (otherwise, it will trigger VSCode shortcuts), then press Command + T to create.

Creating Task Branches

Scenario: You are in conversation 1 but suddenly have an idea and want to try another approach based on conversation 1.

Tip: Copy conversation 1 (including all context) to create a completely independent new conversation identical to conversation 1.

Other Features

Terminal Smart Suggestions

If you forget how to execute a command in the terminal, press Command + K, describe the operation you need to perform, and AI will generate the corresponding command.

Chart Drawing

However, rendering errors often occur due to Cursor’s built-in Mermaid renderer being outdated (version 8.x), while AI often uses syntax from version 10.x for drawing commands.

In-Depth Review of Zhipu GLM-4.5: Is the AI Toolkit Really Effective?

Thu, 31 Jul 2025 00:00:00 +0000

Introduction

GLM-4.5 has arrived, promising a one-click package of “multimodal + coding + assistant” functionalities! But does the actual experience live up to the hype? In this hands-on review, we will reveal what works well and what falls short, providing an in-depth analysis of AI capabilities.

Have you ever faced such frustrations? Wanting AI to write a report, but Model A has good logic but poor writing style; needing it to write code, but having to switch to Model B; trying to automate a task, only to find you need to manually connect several tools… It feels like running back and forth in multiple kitchens just to prepare a dish.

As AI capabilities seem to become increasingly specialized, Zhipu AI has introduced its new flagship model, GLM-4.5, claiming to create an all-rounder that can do everything.

What has changed? Is it truly a game-changer or just hype? Today, we will conduct an in-depth test to uncover the truth!

Overview: What’s New in GLM-4.5?

In simple terms, GLM-4.5’s biggest ambition is to natively integrate various previously scattered superpowers into a single model.

1. Core Highlight: Native “Intelligent Agent” Capability

This is no longer just a “chatbot” that answers questions one at a time. GLM-4.5 is designed to understand complex goals, autonomously plan tasks, and call tools to execute multi-step actions, functioning as an “AI employee”. Officially, it claims to be the first SOTA-level native intelligent agent model.

2. “Trinity” of Versatile Capabilities

It integrates complex reasoning (like a strategist), code generation (like a programmer), and intelligent agent interaction (like a project manager) into a cohesive whole. The goal is to say goodbye to the “specialist” and become a “hexagonal warrior” capable of tackling any problem.

3. Completely Open Source and Affordable

Most importantly, both GLM-4.5 and its lightweight version, GLM-4.5-Air, have been fully open-sourced and are available on platforms like Hugging Face. The API call price is as low as 0.8 yuan per million tokens, drastically lowering the barrier for using high-performance large models, which is a huge boon for developers and small businesses.

Official Results & Community Response

Let’s take a look at the official results.

In 12 globally recognized hardcore tests, including graduate-level reasoning and complex software engineering problem-solving, GLM-4.5 scored third globally, ranking first among all domestic and open-source models.

This report card is quite impressive. After its release, community feedback was extremely enthusiastic: within just 10 hours, it surged to second place on the international open-source community Hugging Face hot list, setting a record for growth. Foreign media also focused on its “lower cost and better performance” features, considering it an attractive high-performance foundational model for global enterprises.

It seems that GLM-4.5 is indeed making waves. But how does it perform in reality? Let’s move into our “devil’s test” segment!

Hands-On Testing: Let’s See How It Performs!

Official data looks good, but nothing beats trying it out yourself. I designed several scenarios that best showcase its “versatile” features to give you a real feel.

Scenario 1: Intelligent Agent “One-Stop” Task — Let AI Be Your Secretary

I tasked it with: “Help me create a 15-page PPT report on the ‘2025 World Artificial Intelligence Conference (WAIC)’, requiring rich visuals and text, including highlights of the conference, main exhibitors, and future trend predictions.” My prompt input:

GLM-4.5’s execution results:

It first confirmed some basic information with me.

After planning the task, it asked if I needed to add anything, and I felt it was okay, so I chose none.

It then developed a task plan.

It searched for information online.

Each time it gathered information, it showed some impressive thought processes; let’s take a look at the final product.

(There are a total of 15 slides; I won’t display them all here, but the link will be provided below for you to check out.)

So far, so good. The color scheme and design of the PPT were consistent, which is impressive, but then…

The size of one slide was equivalent to the above two, and the visual experience still needs improvement…

Link: https://chatglm.cn/share/dFSqcxA7

My Review:

This round of testing was quite complex, with both positive and negative aspects.

The pleasant surprise is that it can indeed function like a real assistant, accurately understanding my complex needs, autonomously searching for information, and summarizing key points.

However, the downside is that during the PPT generation process, I found that the formatting size varied from slide to slide, leading to a somewhat uncontrolled final effect. Despite this, its demonstrated “one-stop” service potential remains a significant productivity tool for content creators and professionals, though it still needs further refinement in details.

Scenario 2: Zero-Code “Full-Stack Development” — Turn a Sentence into a Developer

The official demo shows generating websites and games with a single sentence, so let’s replicate this classic task: “Help me develop a playable ‘Flappy Bird’ game using HTML, CSS, and JavaScript.” My prompt input:

GLM-4.5’s execution results:

Here’s a part of the JS code.

Forgive me for not having gaming talent; anyone skilled in gaming can share screenshots in the comments.

Link: https://chatglm.cn/share/hFSPc4S0

My Review: The results exceeded expectations. It generated not just code but a complete game that can be played directly in a web browser! The code structure is clear, with adequate comments, and all core functionalities are implemented. Although the UI is simple, this fully demonstrates GLM-4.5’s incredible potential in code generation and application development, truly turning ideas into reality at the push of a button.

Scenario 3: Extreme Logical Reasoning — Challenging AI’s Brain

Finally, let’s present a tough question to test its logic and understanding of Chinese: “In the ‘Tengwang Ge Xu’, the phrase ’the setting sun and lone wild goose fly together, the autumn waters and the sky share the same color’ depicts dynamic or static? Please analyze from the perspectives of time-space view and aesthetics.” My prompt input:

GLM-4.5’s execution results:

Link: https://chatglm.cn/share/2FSDcHGn

My Review: Its answer was very profound, showcasing strong logical breakdown and multi-dimensional analysis capabilities. It accurately identified this as a “dynamic-static combination” classic phrase and analyzed it step by step from the dynamic-static relationship, time-space view, and aesthetics. The response not only cited the original text to support its viewpoint but also extended to the author’s Wang Bo’s life experiences and creative mindset, indicating that its understanding of the Chinese context, knowledge association, and deep thinking ability has reached a remarkably high level.

Conclusion: Is It Worth Getting?

After an in-depth experience, here are my thoughts on GLM-4.5:

👍 Pros

Comprehensive capabilities beyond imagination: It truly achieves being an “all-rounder”; whether in office tasks, development, or creation, it can provide high-quality assistance, making it highly practical.
“Promises kept” intelligent agent: The completion rate for complex, multi-step tasks is very high; it is no longer a “toy” but a “tool” that can be put into production.
Exceptional cost-performance ratio: Powerful performance combined with open-source and low API prices allows all developers and enterprises to enjoy the benefits of top-tier AI.

🤔 Areas for Improvement

Stability of generated content needs refinement: In executing multi-step, continuous generation tasks (like making PPTs), there may be issues with details going awry, such as PPT page formatting sizes varying, affecting the direct usability of the final results.
Feedback for complex tasks could be clearer: When executing complex tasks like development or analysis, providing clearer, real-time progress feedback or displaying the “thought process” would greatly enhance user control and experience.
UI aesthetics of generated applications could be improved: While the model can quickly generate fully functional applications (like games), the default UI is quite basic and has significant room for aesthetic design optimization.
Tolerance for ambiguous instructions: When faced with extremely tricky or unclear instructions, the model’s performance can occasionally fluctuate, requiring users to describe their needs more precisely to achieve the best results.

In summary, Zhipu GLM-4.5 is undoubtedly a “heavyweight bomb” in the recent large model market. It not only achieves a “unified” technical approach but also, through open-source and low-cost strategies, sounds the horn for the popularization of AI applications.

For ordinary users and developers, a more powerful, cheaper, and versatile AI era is accelerating towards us.

Exploring Qwen3-Coder: A Next-Gen AI Programming Model

Sun, 27 Jul 2025 00:00:00 +0000

Introduction

If you are familiar with Vibe Coding products, you might recognize their role as a “co-pilot”. They help monitor your progress during long coding sessions, assist in completing lines of code, or even generate specific functions while you take a break.

However, for a long time, these products have primarily acted as “co-pilots”, responding passively to user commands without understanding the underlying intentions or goals of the developer.

But what if AI could transcend this role? What if it could comprehend your navigation intent, anticipate upcoming challenges, and independently plan and execute tasks after you provide a destination? This would truly enable it to become a “full-stack engineer”.

Today, I deeply experienced Alibaba’s newly open-sourced Qwen3-Coder, which the company claims is currently at the state-of-the-art (SOTA) level for coding capabilities among open-source models.

According to data from OpenRouter, a well-known API aggregation platform, the API call volume for Qwen has surged, surpassing 100 billion tokens in just a few days, ranking it among the top three globally on OpenRouter’s trend chart, making it the hottest model at present.

This week, Alibaba has open-sourced three significant models, including Qwen3-Coder, which have won global open-source championships in foundational, programming, and reasoning models. The Qwen 3 reasoning model has shown capabilities in creative writing, mathematics, and multilingual concepts that rival top closed-source models like Gemini-2.5 pro and o4-mini, achieving the best performance among open-source models.

To be honest, even though Qwen3-Coder has been hailed as the “best programming model in the world” and has topped the HuggingFace model leaderboard, I approached it with cautious optimism, expecting yet another domestic model.

However, after a day of testing and deep interaction, this new model, claiming to reach SOTA levels, truly provided me with a different experience regarding Vibe Coding.

A Programming Model That Creates Digital Spaces

My first experience with Qwen3-Coder began with a series of challenging tests that I previously found difficult or impossible to complete.

I decided to test it with a classic “AI design taste test”. I input a somewhat audacious command:

“Create a homepage for Geek Park as a tech news media site, featuring a modern navigation bar, eye-catching colors, a concise company introduction, a clear content section, and a complete footer.”

In my experiences with Grok, ChatGPT, and similar products, such requests often resulted in a disaster reminiscent of 1990s aesthetics: chaotic layouts and glaring color schemes, akin to a public execution of modern design aesthetics.

Honestly, before the formal results were returned, I was mentally prepared to face a chaotic skeleton filled with tags that I would need to reconstruct from scratch.

However, when the code was generated and rendered in the preview, I was presented with a complete page that featured a highly unified design language, responsive layout, and even interface animations.

Homepage generated by Qwen3-Coder | Image source: Geek Park

If the initial amazement was purely visual, the subsequent tests began to touch on its deeper “soul”.

I posed a more abstract challenge:

“Create a physics engine-based music generator using Matter.js, allowing different shaped objects to fall freely on the canvas. When they collide, they should produce different musical notes based on their shapes, and I need a ‘gravity controller’ to change their falling trajectories in real-time.”

The difficulty of this task lies in the requirement for AI to not only understand the code but also the world behind it.

Code is rational, but the rhythm of physics and the harmony of music carry a touch of emotional warmth. Qwen3-Coder once again exceeded my expectations. It implemented all the functionalities—you could see balls and squares falling on the canvas, with each collision producing harmonious sounds.

When you drag the gravity controller, the trajectories of all objects change, transforming a soothing melody into a frantic one, playing a chaotic symphony on your screen. It not only completed the functionality but also brought an unexpected aesthetic beauty.

To further explore its boundaries, I threw out a game generation challenge, asking it to create a fully keyboard-controlled 3D shooting game with multiple interactive objects, a simple “storyline”, and an “Easter egg” that would allow quick completion if discovered in the code.

From the generated results, Qwen3-Coder returned calculations for target gravitational acceleration, collision detection algorithms, and the most surprising part—creating a 3D sandbox world while accurately implementing vector projection and distance detection algorithms within this small game.

In terms of physics simulation, it could easily reproduce the classic bouncing ball game as well.

In addition to these practical examples, there was another dimension of experience during the tests that deserves special mention: its generation speed and contextual memory for lengthy tasks.

In my actual testing, over ten different development use cases were resolved in almost 1-3 minutes.

Over 900 lines of code generated in just three minutes, significantly accelerating the iteration speed of code | Image source: Geek Park

This efficiency brings a more fluid creative flow compared to previous code generation models, allowing developers to quickly translate ideas into reality. I could swiftly adjust and iterate code versions based on the generated results without interrupting my thoughts during long waits.

Currently, everyone in the industry is discussing “Vibe Coding”. Vibe is undoubtedly the future of human-computer interaction, relating to intuition and inspiration. However, we should also recognize that the solid and reliable “Coding” skills underpinning all smooth “Vibe” experiences are essential.

How a World-Class Programming Model is Forged

Qwen3-Coder’s evolution from a “code completer” to an “autonomous developer” primarily stems from its architectural choice—the efficiency and scale brought by the Mixture of Experts (MoE) model.

Traditional large models resemble a knowledgeable but generalist professor; while they understand many things, they still expend considerable effort when addressing specific professional issues. In contrast, Qwen3-Coder’s “super-sized” version acts like a think tank with a vast knowledge base of 480 billion parameters, internally divided into numerous highly specialized “domain experts”.

When you pose a question, the system does not engage the entire model data; instead, it activates a relevant “expert group” of 35 billion parameters to respond. This design allows it to maintain a vast knowledge capacity and capability ceiling while keeping the computational cost of each inference within a reasonable range. This is a delicate balance between model capability and inference efficiency, which is key to its ability to handle complex problems.

Additionally, the Alibaba Qwen team believes that programming tasks are inherently suitable for execution-driven reinforcement learning, as the correctness of code can be directly validated through the actual running results, the most objective standard. Based on this, they built a large-scale reinforcement learning infrastructure capable of running 20,000 independent environments in parallel.

You can think of it as a software company with 20,000 “digital interns”. Here, the model can massively simulate real software engineering processes: receiving a vague task, autonomously planning and breaking it down, then calling external tools (like code executors and testing frameworks) to attempt solutions and learn from the feedback (success, failure, or specific error messages), iterating and self-correcting based on that feedback.

It is through this massive trial-and-error learning in a large-scale, high-concurrency real coding environment that Qwen3-Coder successfully learned how to solve “long-distance” tasks requiring autonomous planning and tool invocation, significantly improving its code execution success rate and tool usage efficiency.

Lastly, the key aspect that makes my experience with Qwen3-Coder different from previous code generation models is its “repository-level” context length for handling large codebases.

The complexity of software engineering often arises from the understanding of vast codebases. Qwen3-Coder possesses a physical-level absolute advantage in this regard: it natively supports a context window of 256K tokens. What does this mean? It means the model can process millions of characters of code and documentation in a single interaction.

If the MoE architecture provides the model with the potential for intelligence, reinforcement learning gives it the skills to solve problems, then the ultra-long context window provides the stage and materials for it to showcase its talents. Without a global view of the entire system, even the smartest model is merely a calculator with a limited perspective. It is precisely this capability that allows Qwen3-Coder to elevate the nature of tasks from “generating a valid code snippet” to “executing an effective operation on a complex software system”.

This ability to handle “repository-level” code is a prerequisite for solving complex system-level issues, performing large-scale code refactoring, and deeply understanding legacy systems, something many models with smaller context windows cannot achieve.

On the authoritative SWE-Bench leaderboard for measuring code models’ ability to solve real-world software problems, Qwen3-Coder has clearly surpassed one of OpenAI’s strongest closed-source models, GPT-4.1. This indicates that this open-source model from China demonstrates stronger efficacy in handling complex, real programming tasks.

In the realm of Agentic Coding, which focuses on agent capabilities, Qwen3-Coder can stand shoulder to shoulder with the benchmark Claude 4.

Currently, if you want to get started with Qwen3-Coder, the most direct way is to visit chat.qwen.ai, where you can switch models with a single click in the upper right corner.

If you seek the ultimate “intention-first” coding experience or are already a Vibe Coding veteran, you can try the “super-sized” version via API in various CLI environments, using Qwen3-Coder-480B-A35B-Instruct.

This is a MoE model with 480B parameters activating 35B parameters, natively supporting a 256K token context and extensible to 1M tokens via YaRN. Simply register an account on Alibaba Cloud, complete a simple verification, and you can create your API-Key to call this model.

Thanks to its perfect compatibility with OpenAI API formats, you can seamlessly integrate this API-Key into your familiar chat or coding tools, whether it’s Cursor, Trae, CodeBuddy, or Cline.

For users prioritizing data sovereignty and privacy, Qwen3-Coder offers the most comprehensive solution—local deployment.

You can directly download the complete model files from Hugging Face or domestic platforms. This means you can run this currently strongest open-source programming tool completely privately on your own servers.

The Global Significance of a Local Choice

In conclusion, the emergence of Qwen3-Coder is not about replacing anyone but empowering everyone. It compresses the comprehensive capabilities of a seasoned development team into a tool that anyone can access.

For a long time, when discussing top coding models, domestic developers seemed to have limited choices. This reflects a key fact: in the field of natural language processing, the accumulation of Chinese corpora provides domestic models with a “home advantage”; however, in programming, code is a universal language. Whether it’s Python, Java, or JavaScript, the syntax and logic are unified globally.

This means that the competition for coding capabilities takes place on a completely fair global stage. In this arena, there are no language barriers, only raw technical strength.

Qwen3-Coder’s leading position on international benchmarks like SWE-Bench signifies much more than topping a Chinese leaderboard. It marks that China’s self-developed AI models have the technical strength to compete in the most cutting-edge and fiercely competitive fields globally.

If open-source is an attitude, the current capabilities exhibited by Qwen3-Coder suggest a strong commitment from Alibaba.

In terms of pricing, Alibaba has chosen to open-source it for free, and the API call costs are significantly lower than those of comparable overseas models.

More importantly, this is an open-source model from China—this alone means that Chinese users can call it anytime and stably, free from concerns about network conditions, supply restrictions, and access speeds.

It may not be the only option, but it is heartening to see that in the race for coding large models, domestic developers have finally welcomed a reliable, friendly, and sufficiently effective local contender.

Cursor: Redefining AI Programming in the Post-Code Era

Wed, 11 Jun 2025 00:00:00 +0000

The Paradigm Shift in AI Programming

The AI programming field is undergoing a profound paradigm shift, with the rise of Cursor serving as a powerful testament to this trend. From the reflections of its founder, we gain insight into how AI programming tools are reshaping development processes and the key to standing out in competition by continuously delivering exceptional products.

Founded by AnySphere co-founder and CEO Michael Truell, Cursor is not only one of the fastest-growing AI programming products today but also an early form of the “post-code era.”

With a team of 60, Cursor achieved an annual recurring revenue of $100 million just 20 months after its launch, growing to $300 million within two years, making it one of the fastest-growing development tools in history. This achievement is supported not only by improved code generation capabilities but also by a complete reconstruction and redefinition of the software development process.

Michael, a tech person with a decade of experience in AI, studied mathematics and computer science at MIT and later worked in research engineering at Google. He has a deep understanding of both AI technology pathways and business history.

In a conversation with overseas tech blogger Lenny, he clearly outlined a future that differs from mainstream assumptions: code will not be completely replaced, but it will no longer be the primary output of humans. Instead, people will express their ideas about software functions and behaviors in a manner close to natural language, with systems responsible for translating these intentions into executable program logic.

He pointed out that the two mainstream assumptions about the future of AI programming are flawed. One assumes that development methods will largely remain the same, continuing to rely on languages like TypeScript, Go, and Rust to build programs; the other believes that the entire development process can be completed solely through conversations with chatbots.

Diverse Development Methods Coexisting

Discussing the starting point of Cursor, Michael recalled two key moments:

The first was their initial exposure to the internal testing version of GitHub Copilot. This was their first experience with a truly practical AI development tool that significantly enhanced work efficiency.

The second moment was their study of a series of Scaling Law papers published by OpenAI and other research institutions. These papers made them realize that even without new algorithms, as long as model parameters and data scales are continuously expanded, AI will continue to evolve.

By the end of 2021 and early 2022, they firmly concluded that the era of AI products had truly arrived. However, unlike most entrepreneurs who focused on “building large models,” Michael and his team attempted to think backward from the perspective of knowledge work, considering how various specific work scenarios would evolve under AI enhancement.

At that time, they chose a seemingly niche direction—mechanical engineering. They believed this field had little competition and a clear problem space, so they began automating CAD tools. However, they quickly realized they lacked sufficient passion for mechanical engineering and data corpus, making development challenging.

Ultimately, they decided to return to the field they were most familiar with: programming. Although there were already products like Copilot and CodeWhisperer in the market, they believed no one had truly pushed the vision to its limits. Despite being one of the hottest and most competitive areas, they judged that the “ceiling” was high enough to support a breakthrough product company. They abandoned the strategy of “avoiding hot zones” and chose to “delve deep into hot zones.”

One of Cursor’s core decisions was not to create a plugin but to build a complete IDE. They believed that the existing IDE and editor architectures could not adapt to future development methods and human-computer interaction logic.

“We want to have control over the entire interface and redefine the interaction interface between developers and systems.” This was not only to achieve a more natural control granularity but also to build a system base that could truly support the next generation of programming paradigms.

Michael also believes that future development methods will coexist in multiple forms. Sometimes AI acts as an assistant, completing tasks in Slack or issue trackers; other times, it interacts at the IDE frontend; it may also run certain processes in the background while iterating control at the frontend. These are not contradictory as long as users can flexibly switch between full automation and manual control, which constitutes a qualified system.

Regarding the current industry trend of “agent hype,” he expressed a reserved attitude. Completely handing tasks over to AI could turn developers into “engineering managers” who must constantly review, approve, and modify outputs from a group of “very dumb interns.” “We do not believe in that path. The most effective way is to break tasks down into multiple steps, allowing AI to complete them step by step while humans remain in control.”

Cursor’s early version was developed entirely from scratch, without relying on any existing editors. Initially, they spent just five weeks building a usable prototype, quickly replacing their original development tools. The entire process from writing code from scratch to going live took only three months. The unexpectedly positive user feedback after launch prompted them to iterate rapidly, ultimately finding a balance between performance, experience, and development speed, and then restructuring based on the VS Code framework.

However, Michael believes that true success is not about the speed of the initial version but rather the continuous optimization that follows. He admits, “The initial three-month version was not very usable; the key is that we maintained a persistent improvement rhythm.” This rhythm of continuous optimization ultimately formed Cursor’s very stable growth trajectory. Although there was no obvious feeling of “takeoff” in the early stages, the cumulative effect of the exponential curve eventually exploded after multiple iterations.

Running in the Right Direction Every Day

While Cursor’s explosion may seem to stem from a key feature or product decision, Michael Truell states that the real secret is quite simple: “Running in the right direction every day.”

This may sound ordinary, but it is extremely difficult to maintain. Every decision and every detail of iteration is made from the user’s perspective, constantly getting closer to actual scenarios, continuously simplifying and optimizing. They never rely on a one-time hit but firmly believe that product value must withstand the test of continuous use and real feedback.

In line with this philosophy is the technical path chosen behind Cursor. Michael mentioned that the team initially had no intention of training their own models when building Cursor. In his view, there were already sufficiently powerful open-source and commercial base models available, and investing computational power, funds, and manpower to build new models from scratch was not only costly but also diverted attention from their true focus: building useful tools and solving specific problems.

However, as the product went deeper into iteration, they gradually realized that existing base models, while powerful, could not meet the critical scenarios in Cursor. Most of these models were trained for general dialogue, question answering, or text tasks, lacking a native understanding of issues like “multi-file structured code editing.”

Thus, they began internal attempts to develop their own models. Initially, it was due to a specific function requiring extremely low latency, and the existing model’s invocation was not feasible; after trying to train their own, they found the results exceeded expectations. Since then, self-developed models have gradually become a core component of Cursor, supporting key functions and becoming an important direction for team recruitment.

A key feature of Cursor is the prediction of “next editing actions.” This is difficult to achieve in writing but highly feasible in coding scenarios due to the strong contextual coherence of programs—once a developer modifies a function or file, the subsequent operations are often predictable.

Cursor’s model is based on this contextual logic, inferring which files, locations, and structures the user is likely to modify next, providing completion suggestions with almost imperceptible latency. This is not just token-level completion but structured code snippet-level prediction, relying entirely on self-developed models trained specifically for this scenario rather than general base models.

In a reality where model invocation costs are extremely high, such self-developed models can significantly lower the product’s usage threshold. To achieve this, the models must possess two characteristics: fast response and low cost.

Cursor requires that every completion inference must be completed within 300 milliseconds and that there should not be excessive resource consumption during prolonged continuous use. This hard constraint necessitates that they control the design and deployment of the models themselves.

In addition to handling core interaction functions, Cursor’s self-developed models also take on another important task—acting as “orchestrators” to assist in invoking large models. For example, when the codebase is large, large models struggle to know which files, modules, and contexts to focus on.

Cursor’s model first conducts a search and synthesis, extracting relevant information from the entire codebase, and then feeds it to the main model. This is akin to building a specialized “information feeding pipeline” for large models like GPT, Claude, and Gemini, enhancing their performance.

At the output end of the model, these sketch-style code modification suggestions are first processed and rewritten by Cursor’s self-developed models, transforming them into truly executable, structured patches.

This collaborative system architecture, where multiple models work together, is what OpenAI refers to as “model integration.” Michael is not fixated on building models from scratch but pragmatically chooses existing open-source models as a starting point, such as LLaMA.

In certain scenarios, they also collaborate with closed-source vendors to fine-tune model parameters to adapt to specific tasks. He emphasizes that what matters is not whether the underlying structure of the model is controlled by them but whether they can obtain operational training and customization rights to serve the actual needs of the product.

As the technical system continues to improve, another question gradually emerges: where is Cursor’s moat in this rapidly evolving race? Michael’s answer is very clear. He does not believe that “product binding” and “contract locking” can build true long-term defenses.

Unlike traditional B2B software, the barriers in the AI tools market change dramatically, with low user trial costs and high acceptance of new tools. He candidly states that this is not a market favorable to traditional giants; rather, it encourages new companies to continuously experiment, iterate quickly, and compete for user choice.

From this perspective, the moat that Cursor can rely on is not model control or data monopoly but the ability to “continuously build the best products.”

This industry resembles the search engine boom of the 1990s or the early PC industry, where every improvement can yield significant returns. Competitive barriers arise from the “deep inertia” formed by continuous iteration and the differences in team organizational capabilities and product refinement systems.

Michael presents a core viewpoint: when a market still has a large number of unmet needs and many technical structures that can be optimized, continuous research and development itself is the biggest moat. It does not rely on binding users but rather on its own continuous evolution to gain cumulative advantages in time and quality.

He emphasizes that this “evolutionary moat” does not exclude competition nor does it imply that there can only be one winner in the market. However, under the proposition of “building a globally universal software construction platform,” it is indeed possible for a single company to emerge as a massive entity.

While multiple products may coexist in the future, if the question is “who can handle the largest-scale code logic translation tasks globally,” then ultimately only one company may remain. The reason is not that other companies are doing poorly, but that users will naturally gravitate towards the most universal, stable, and contextually understanding platform. In this field, product quality and evolution speed determine market concentration.

He further points out that one cannot judge this round of technological evolution’s pattern based on the fragmentation experience of the traditional IDE market. The IDE market of the 2010s saw “no one making big money” because the capabilities of editors at that time were close to their limits, and the parts that could be optimized were merely basic functions like syntax highlighting, debugger integration, and quick navigation. However, today, developer tools are at a new paradigm starting point, where the goal is no longer to optimize an editor but to reshape the entire workflow and expression structure of knowledge workers.

The essence of AI programming tools is not to replace code but to enhance human instruction expression capabilities and compress the path from idea to implementation. This represents a much larger market than traditional development tools and a future channel with platform attributes. In this channel, whoever can provide the smoothest, most reliable, and most contextually understanding programming experience has the opportunity to become synonymous with the next generation of “software construction infrastructure.”

When Lenny mentioned Microsoft Copilot, he also raised a typical current issue: do the companies that entered the market first possess the ability to lead continuously? Michael acknowledged that Copilot was a source of inspiration for the entire industry, especially when the initial version was released, bringing an unprecedented development interaction method.

However, he believes Microsoft has not truly maintained its initial momentum, which is due to both historical reasons and structural challenges. The core team that initially developed Copilot experienced frequent personnel changes, making it difficult to form a unified direction within a large organization, and the product path could easily be diluted by internal power struggles and process complexities.

More fundamentally, this market is not friendly to incumbents. It does not rely on integration and binding like enterprise-level CRM or ERP systems, nor does it have the strong user stickiness of “switching costs.” User choice is entirely based on experience differences, which determines that “product strength” rather than “sales ability” will be the decisive factor. In such a dynamic, open, and high-frequency trial-and-error market, the companies that can truly win are those entrepreneurial teams that can iterate their products weekly, improve monthly, and continuously strive for technical limits.

The sense of direction and product rhythm that Cursor currently exhibits is precisely a response formed in this context. It does not rely on “closure” but rather on the simple, clear, yet extremely challenging mission of “continuously building the best development tools in the world,” attracting developers’ active choice.

How to Use Cursor Correctly?

In building an AI IDE platform aimed at global developers, Michael Truell is most concerned not with the limits of model capabilities but with how users understand and make the best use of these capabilities.

When asked what advice he would give if he could sit next to every first-time Cursor user, he did not explain features or operational tips but emphasized the establishment of a mindset—an instinctive judgment of what the model can and cannot do.

He candidly admitted that the current Cursor product does not do enough to guide users in understanding the boundaries of the model. Without clear prompt tracks and interactive feedback mechanisms, many users easily fall into two extremes: either placing too high expectations on the model and trying to solve complex problems with a single prompt, or completely giving up after an unsatisfactory first result.

His suggested approach is task decomposition, gradually progressing through “small prompts – small generations,” engaging in continuous two-way interaction with AI to achieve more stable and higher-quality results.

Another suggestion is more strategic. He encourages users to “go all out” in side projects without business pressure, attempting to push AI capabilities to their limits.

Without affecting mainline work, through a whole set of experimental projects, developers can feel how much the model can truly accomplish and where the boundaries of failure lie. This “wrestling-style exploration” can help developers build a more accurate intuition and give them more confidence when facing formal projects in the future.

As model versions continue to update, such as the release of GPT-4.0 or Claude iterations, this judgment also needs to be updated accordingly. He hopes that future Cursor products can incorporate a guiding mechanism so that users do not have to explore the model’s “temperament” and boundaries each time. However, for now, this remains a skill that users need to accumulate subjectively.

Regarding another frequently asked question—whether such tools are more suitable for junior engineers or senior engineers—Michael provided precise classifications. He pointed out that junior developers often tend to “completely rely on AI,” trying to use it to complete the entire development process, while senior engineers may underestimate AI due to their rich experience, failing to fully explore its potential. The former’s problem is “too much reliance,” while the latter’s issue is “too little exploration.”

He also emphasized that certain senior technical teams within companies, especially architect-level talents focused on Developer Experience, are actually the most proactive adopters of such tools. They understand system complexity and focus on tool efficiency, often achieving the best results in AI programming scenarios.

In his view, the ideal user profile is neither a beginner nor a seasoned veteran with fixed processes but rather those “senior yet not rigid” mid-level engineers—who possess system understanding while remaining curious and open to new methods.

How to Build a World-Class Team?

When asked what advice he would give himself if he could return to the year Cursor was founded, Michael chose a non-technical answer—recruitment. He repeatedly emphasized that “finding the right people” is the second most important task after the product itself.

Especially in the early stages, building a world-class engineering and research team is not only a guarantee of product quality but also a decisive factor for organizational focus, rhythm, and culture. The talent he seeks must possess technical curiosity, willingness to experiment, and the ability to maintain calm judgment in a turbulent environment.

He recalled that Cursor went through many twists and turns in the recruitment process. Initially, they placed too much emphasis on “high-profile resumes,” leaning towards hiring young people from prestigious schools with standard success paths. However, they ultimately realized that truly suitable talents often do not fit these traditional templates. Instead, those with slightly later career stages, highly matched experience, and mature technical judgment are often the key forces driving the team’s leap.

In the recruitment process, they gradually established a set of effective methods. The core is a two-day “work test” system, where candidates need to complete a task closely related to a real project within a specified time while working with the team.

This process seems cumbersome, but in practice, it is not only scalable but also significantly improves the accuracy of team judgment. It assesses candidates’ coding abilities, communication skills, thinking styles, and hands-on capabilities, and even helps candidates determine whether they are willing to work long-term with this team.

The “collaborative interview” mechanism has gradually evolved into a part of Cursor’s team culture. They view the recruitment process as a two-way selection rather than a one-way evaluation. When the company is not widely recognized in the market and the product is not mature, the team itself is the most important attraction.

He admits that many early employees joined due to one or multiple collaborative experiences rather than judgments based on salary or valuation. Today, this system is still retained and applied to every new candidate. Cursor’s team size currently remains around 60, which is considered streamlined in many SaaS companies.

Michael pointed out that they intentionally maintained this lean configuration, especially being restrained in expanding non-technical positions. He acknowledges that they will certainly expand the team in the future to enhance customer support and operational capabilities, but for now, they remain a highly engineering, research, and design-driven company.

When discussing how to maintain focus in the rapidly changing pace of the AI industry, Michael does not rely on complex organizational systems.

He believes that the foundation of organizational culture lies in recruitment itself. If they can hire rational, focused individuals who are not swayed by trending emotions, the team will naturally have a good sense of rhythm. He admits that Cursor still has room for improvement, but overall, they have achieved good results in guiding a culture that “only focuses on creating excellent products.”

Many companies attempt to solve problems through processes and organizational design that could actually be avoided by “finding the right people” in advance. Their development process is extremely simple, and the reason it can succeed is that team members generally possess self-discipline and a spirit of collaboration. He particularly emphasizes a shared psychological trait: an “immunity” to external noise.

This immunity is not inherently present but is gradually formed through long-term industry experience. As early as 2021 and 2022, the Cursor team was already exploring AI programming directions. At that time, GPT-3 did not yet have the Instruct version, DALL·E and Stable Diffusion had not been made public, and the entire generative AI industry was still in its technical infancy.

They experienced the explosion of image generation, the popularization of dialogue models, the release of GPT-4, the evolution of multimodal architectures, and the rise of video generation… but among these seemingly bustling technological trends, very few had a substantial impact on the product.

This ability to discern between “structural innovation” and “surface noise” has become an important psychological foundation for maintaining their focus. He compares this approach to the evolution of deep learning research over the past decade: while countless new papers are published every year, it is the very few elegant and fundamental structural breakthroughs that truly drive AI forward.

Looking back at the entire technological paradigm shift, Michael believes that the current development of AI is at a profoundly pivotal moment.

The outside world often falls into two extremes: some believe that the AI revolution is about to arrive, almost overnight overturning everything; others view it as hype, a bubble, and not worth considering. His judgment is that AI will become a paradigm shift more profound than personal computing, but this process will be a “multi-decade” continuous evolution. I/O to iO, Jony Ive will drive a new design movement—AI is rewriting computing paradigms and hardware definitions, and it is also the new battleground after large models.

This evolution does not rely on a single system or a specific technological route but consists of independent solutions to numerous segmented problems. Some are scientific issues, such as how models can understand more data types, run faster, and learn more efficiently; some are interaction issues, such as how humans collaborate with AI, how to define authority boundaries, and how to establish trust mechanisms; some are application issues, such as how models can truly change real work processes and provide controllable outputs in uncertainty.

In this evolution, he believes a class of key enterprises will emerge—AI tool companies focused on specific knowledge work scenarios. These companies will deeply integrate base models and may also develop core modules independently while building the most suitable human-computer collaboration experience. They will not merely be “model callers” but will refine technology and product structures to the extreme, thereby growing into new-generation platform enterprises. Such companies will not only enhance user efficiency but may also become the main force driving the evolution of AI technology.

Michael hopes that Cursor can become one of these companies, and he also looks forward to seeing a group of equally focused, solid, and technically deep AI entrepreneurs emerge in more knowledge work fields such as design, law, and marketing. The future does not belong to speculators but to those builders who truly deconstruct problems, reshape tools, and understand the relationship between people and technology.

He also pointed out that the two most important things for Cursor in 2025 are to create the best product in the industry and to promote it on a large scale. He describes the current state as a “land grab”: the vast majority of people in the market have not yet encountered such tools or are still using slowly updated alternatives. Therefore, they are increasing investments in market, sales, and customer support while continuously seeking excellent talents who can push the product boundaries from a technical level.

When discussing the impact of AI on engineering positions, Michael’s response is quite calm. He does not believe that engineers will be quickly replaced; on the contrary, he thinks engineers will be more important than ever in an AI-driven future.

In the short term, programming methods will undergo significant changes, but it is hard to imagine that software development will suddenly become a process where “just inputting requirements will lead the system to complete everything automatically.” AI can indeed liberate humans from low-level tedious implementations, but core decisions regarding direction, intent, and structural design must still be controlled by professional developers.

This judgment also implies that as software construction efficiency dramatically increases, the elasticity of demand will be thoroughly released. In other words, software itself will become easier to build, costs will significantly decrease, ultimately leading to an expansion of the entire market scale. More problems can be modeled, more processes can be systematized, and more organizations will attempt to customize their internal tools rather than accept generic solutions.

He illustrates this with a personal experience. In a biotechnology company he participated in early on, the team urgently needed to build a tool system that matched internal processes, but the available solutions on the market were not suitable, and the efficiency of self-development was very limited, resulting in most needs being shelved.

Such scenarios are still common across various industries, indicating that the barriers to software development remain high. If one day, making software is as simple as moving files or editing slides, what will be released is a whole new application era.

Finally, he emphasizes that AI will not reduce the number of engineers; rather, it will change the structure of engineering positions. Those who are good at collaborating with AI, understand system logic, and possess product intuition will play a larger role in the new generation of work systems.

30 Practical Tips for Using Cursor as an AI Programming Assistant

Mon, 31 Mar 2025 00:00:00 +0000

Introduction

Cursor is a powerful AI programming assistant that is changing the traditional coding experience with its “chat-style” programming mode and efficient functional modules. This article presents 30 practical tips for using Cursor, from basic concepts to advanced operations, to enhance your AI programming efficiency.

Basic Concepts

01. Understand “Chat-Style” Programming
Cursor marks the arrival of “chat-style” programming. Compared to traditional programming modes, it has three core breakthroughs: writing code in “natural language,” iterating at the speed of judgment, and blurring the boundaries between product managers, designers, and programmers. This new paradigm shifts our focus from “how to write code” to “what problem to solve,” pushing AI to compel you to “think clearly and articulate clearly.”

02. Familiarize Yourself with Cursor’s Four Core Features
Cursor provides different capabilities in various scenarios, ranging from simple to complex: Tab, Inline chat, Ask, and Agent. Understanding the characteristics and applicable scenarios of these four functional modules is the foundation for using Cursor efficiently.

03. Master the Transition from “Thinking Clearly” to “Articulating Clearly”
AI is powerful, but it does not know what you are thinking. For effective communication, it is recommended to use structured expressions with sufficient context; the most direct structured expression is to describe requirements in Markdown format, which naturally segments the content, making it easier for AI to understand.

04. Learn to Break Down Problems and Validate Small Steps
Break complex problems into smaller, simpler ones and solve them step by step. During development, avoid generating thousands of lines of code at once for validation; instead, execute and validate tasks incrementally to better control code quality.

05. Understand MCP (Model Context Protocol)
MCP is the “universal connector” between AI and the outside world, giving AI eyes and arms. Its true value lies in unifying standards, eliminating the need to reinvent the wheel, allowing AI to have a larger context and significantly improving closed-loop operability.

Daily Operations

06. Terminal Conversations
No more struggling to remember Linux commands; simply use command+k to describe command-line operations in natural language. This feature is particularly useful for local development, allowing Cursor to operate the local terminal directly.

07. Generate Comments for Historical Code
Select code and use command+k to quickly generate comments for historical code, which is much faster than the Ask mode. This is especially useful for taking over someone else’s code or reviewing your earlier code.

08. One-Click Commit Message Generation
Say goodbye to the hassle of thinking about “what did I change in my code?” Cursor can generate standardized commit messages with one click, improving Git operation efficiency.

09. Quickly Visualize Project Architecture
When taking over a new project, use the Ask mode to organize the project architecture diagram, outputting text in Mermaid syntax. You can paste it into https://mermaid.live/ to quickly understand the project structure.

10. Use Notepad to Record Key Ideas
Use Notepad to record important context, and use @ to call it. Notepad serves as a good bridge between Ask and Agent modes, helping maintain coherent thoughts.

11. Use @Git to Identify Code Vulnerabilities
When encountering a code MR (Merge Request), compare it with the main code to check for issues. If problems arise after the MR, use the @Git feature to quickly locate them.

12. Use Checkpoint for One-Click Rollback
If AI modifies code incorrectly, you can use the checkpoint feature to quickly roll back to a previous stable version, avoiding the hassle of manual recovery.

13. Set Custom Prompts
Set your custom prompts in Cursor Rules to improve AI’s understanding of your needs. There are many prompt templates available online for you to find and customize.

14. Drag and Drop to Add Context
No need to search through directories one by one to add context; simply hold the target file in the directory and drag it into the dialog box. This significantly improves work efficiency.

15. Use @web to Access the Latest Information
Utilize the online feature to quickly obtain the latest information, solving various problems encountered during development, especially regarding new technologies or libraries.

Advanced Techniques

16. One Question, One Chat
Break down large module requirements into smaller questions and open a separate Chat for each new question. Long conversations may lead to AI memory confusion and longer response times, hindering review and management.

17. Use Composer for Multi-File Modifications
When it involves data coordination between modules (multiple code files need to work together), it is recommended to use Cursor’s Composer feature. Compared to Chat, Composer can analyze multiple files simultaneously, understand code context, and provide more reasonable modification suggestions.

18. Tell Cursor Not to Rush into Writing Code
Cursor tends to provide code directly; in the early stages of a project, it’s better to have divergent discussions first, allowing AI to help clarify unclear details. Clearly instruct AI to hold off execution until your thoughts are confirmed.

19. Guide AI to Ask Questions, Avoid Mindless Execution
Encourage AI to ask clarifying questions to confirm more details. Cursor defaults to trusting your judgment; if you are unsure of the solution, make sure AI asks you questions to avoid executing based on incorrect reasoning.

20. Emphasize Not Modifying Unrelated Code
Clearly define the scope in your requirements description, indicating which code can be modified and which cannot, to reduce the probability of AI making incorrect changes. Emphasize that you are a coding novice, prompting AI to generate more detailed comments in Chinese to help understand code logic.

21. Create .md Requirement Documentation
Establish .md requirement documentation to record project background, core logic, and implemented features. Each time a new feature is developed, have AI read the document first to ensure understanding of the context. Clearly instruct AI to read the requirements to avoid missing key content due to excessive @ references across multiple files.

22. Emphasize “Chain of Thought” to Enhance AI Reasoning
Use the “Chain of Thought” technique to encourage AI to engage in more rigorous logical thinking, applicable in scenarios like complex calculations, code analysis, and task planning, reducing AI’s vague reasoning.

23. Add Debugging Code to Help Locate Issues
When implementing complex features, instruct AI to add debugging code, paste the code into the editor, and check the actual execution results. If the results do not meet expectations, provide screenshots to AI to help quickly locate the problem.

24. Have Claude Provide Rich Responses to Aid Understanding
Guide Claude to explain vague concepts in a richer manner, using symbols and text arrangements to provide a more intuitive understanding of differences and enhance comprehension of complex concepts.

25. Use Project Rules
Abandon .cursorrules and switch to Project Rules. It supports setting different rules by file type, controlling AI tone and structure, and can sync through GitHub teams, making Cursor better understand your tech stack.

Share a versatile rule saved as a .mdc file for use in your project:

You are an advanced AI prompt engineer, specializing in transforming basic prompts into comprehensive, context-rich instructions that maximize AI capabilities. Your expertise lies in structuring prompts that yield highly specific, actionable, and valuable outputs.

Core Process:

Deep Prompt Analysis

Thoroughly analyze the user’s original prompt to extract explicit and implicit intentions.

Strategic Prompt Enhancement

Transform the original prompt by incorporating clear role definitions, contextual background, and precise instructions.

Domain-Specific Optimization

Incorporate domain-specific terminology and reference relevant methodologies.

Structural Engineering

Organize the enhanced prompt using a clear hierarchical structure.

Quality Assurance

Review the enhanced prompt against criteria for completeness, specificity, actionability, flexibility, and error prevention.

Cursor's New MCP Feature Unlocks AI Programming Capabilities

Wed, 05 Mar 2025 00:00:00 +0000

Introduction

Today, we are excited to introduce the Cursor code editor version 0.45.6, featuring the groundbreaking MCP (Model Context Protocol) integration!

What is MCP?

MCP (Model Context Protocol) is an open protocol that standardizes how applications provide context for large language models (LLMs). You can think of MCP as a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals, MCP offers a standardized method for connecting AI models to different data sources and tools.

Why Choose MCP?

MCP helps you build agents and complex workflows on top of LLMs. LLMs often need to integrate with data and tools, and MCP provides:

A growing list of pre-built integrations: Your LLM can directly plug into these integrations for quick connections to various data and tools.
Flexibility to switch between LLM providers: Since MCP is an open standard, you can easily change LLM providers as needed without massive modifications to your entire system.
Best practices for data protection: MCP offers a set of best practices to help you securely protect data within your infrastructure, ensuring sensitive information is handled properly.

By using MCP, you can develop and deploy AI applications more efficiently while ensuring scalability, security, and flexibility in your systems.

MCP Architecture Diagram

MCP Hosts: Programs like Claude Desktop, Cursor, or AI tools that wish to access data through MCP.
MCP Clients: Protocol clients maintaining a 1:1 connection with the server.
MCP Servers: Lightweight local server programs that expose specific functionalities through the standardized Model Context Protocol, supporting multiple instances.
Local Data Sources: Computer files, databases, and services that can be securely accessed via MCP servers.
Remote Services: External systems available on the Internet that can be connected through MCP servers (e.g., via API).

My Understanding

MCP is a protocol that defines a specification/rule aimed at connecting LLMs with external data sources. The goal is to extend the capabilities of LLMs, with the current consensus being to build AI agents or automate workflows.

In simple terms, MCP (Model Context Protocol) acts as a bridge that cleverly connects the Cursor code editor with powerful AI models. Through MCP, Cursor transforms from just a code editing tool into an intelligent AI assistant, capable of calling various AI services to aid development, such as:

Intelligent information search (Brave Search)
Web content scraping (Fetch)
Calling image generation models (Replicate, Flux)
Even deeper thinking and reasoning (sequentialthinking)

For example, using Cursor with my locally set up three MCP servers:

Markdown2pdf: Organizes discussions with Cursor into markdown and then converts them into local PDF format.
Sequentialthinking: Allows the large model’s responses to follow a deep-seek style, integrating a thinking chain model.
Mcp-package-version: Helps accurately analyze the latest versions of current project dependencies.

Users can ask questions to the LLM, which processes and discovers that there are locally executable MCP server tools available. It then prompts you to execute the local MCP tool. Upon acceptance, the data input from the MCP tool is returned to the LLM.

In summary: MCP allows large models to call your self-developed toolkits (like markdown to PDF, web scraping, etc.) during your use of Cursor.

Installation Steps

Open Cursor settings.
Navigate to Cursor settings > Features > MCP.

Click the “+ Add New MCP Server” button.

Configure the server:
- The second option can execute an MCP server that has been published as an npm package.
- For SSE servers: URL of the SSE endpoint.
- For stdio servers: Executable shell command, supporting two modes of operation.
- Name: Give your server a nickname, which can be anything.
- Type: Choose the transport type (stdio or sse).
- Command/URL: Enter the following content:

Run the locally packaged MCP server with Node.

For example, Cursor will automatically scan which tools the server provides:

Write a string-reversing MCP server.

Installation

npm install @modelcontextprotocol/sdk

Define MCP Server

McpServer is your core interface with the MCP protocol. It handles connection management, protocol compliance, and message routing.

import { McpServer, ResourceTemplate } from "@modelcontextprotocol/sdk/server/mcp.js";
const server = new McpServer({name: "My App", version: "1.0.0"});

Define the Capabilities Provided by the MCP Server

MCP supports Resources, tools, and Prompts functionality, but Cursor currently only supports tools, so this article will demonstrate the writing of tools.

For detailed capabilities, refer to https://modelcontextprotocol.io/clients.

// Initialize the server, setting the name and version
this.server = new Server({
  name: "My App",
  version: "1.0.0",
}, {
  capabilities: {
    tools: {
      reverse_string: true,
    },
  },
});

Tool Definition

Tool definition consists of two steps:

Define a list of available tools, specifying the tool name, description, and input parameter format.

Here, I defined a string-reversing tool:

Tool Name: reverse_string
Tool Description: Reverses the input string.
Input Parameter Format: Requires an object containing a text field.

Implement the actual functionality of the tool, intercepting the large model’s input and routing it to your actual input.

Connecting the Client (Cursor)

The MCP server needs to connect to the transport layer to communicate with the client. The method of starting the server depends on the chosen transport method:

Cursor currently supports both SSE and command methods.

const transport = new StdioServerTransport();
server.connect(transport);

Packaging and Deployment

Recommended packaging tools: tsup or tsc for packaging.

Since we are ultimately building a Node.js CLI, ensure that the compiled program can be used as a command-line tool by adding the following at the beginning:

#!/usr/bin/env node

Additionally, to make the compiled JavaScript files executable, install the shx npm package and run the following command:

shx chmod +x build/*.js

Adding Cursor

Add to the MCP server, and then you can happily chat with the composer and LLM using your custom-built large model tools.

Limitations

Only supports composer mode.
Currently, only the Claude model is supported; other large models are not yet supported.

Learning Resources

Official website: https://modelcontextprotocol.io/introduction

GitHub repository: https://github.com/modelcontextprotocol/typescript-sdk

MCP resource library: https://mcphub.io/registry

Comprehensive Guide to Understanding Large Language Models

Tue, 22 Oct 2024 00:00:00 +0000

Last week, while sharing the article “My Journey to Becoming an AI Product Manager,” I hinted that I would produce a comprehensive piece to help everyone systematically learn about large models. Today, I am delivering that article; it totals 22,000 words and is expected to take about 30 minutes to read, covering 15 topics related to large models.

In the past year, there has been an overwhelming amount of articles introducing and explaining large models. Most people already have some foundational knowledge, but I feel that this information is too fragmented and lacks a systematic understanding. Currently, there is no article that comprehensively explains what large models are in one go.

To alleviate my own cognitive anxiety, I decided to summarize the knowledge I have gained about large models over the past year into this article. I hope to clarify my understanding of large models through this single article, which serves as a testament to my extensive learning.

This article will share 15 topics related to large models. Originally, there were 20 topics, but I removed some that were more technical and focused on issues that ordinary users or product managers should pay attention to. The goal is to ensure that as AI novices, we only need to master and understand these key points.

Who Is This For?

This article is suitable for the following groups of friends:

Those who want to understand what large models are all about.
Individuals looking to transition into AI-related products and roles, including product managers and operations personnel.
Those who have a basic understanding of AI but wish to advance their knowledge and reduce cognitive anxiety about AI.

Content Disclaimer: The entire content is a result of my personal synthesis after extensive reading and digestion of numerous expert articles, books related to large models, and consultations with industry experts. I primarily serve as a knowledge synthesizer; if any descriptions are incorrect, please feel free to inform me kindly!

Lecture 1: Understanding Common Concepts of Large Models

Before diving into large models, let’s first understand some foundational concepts. Grasping these professional terms and their relationships will benefit your subsequent reading and learning of any AI and large model-related content. I spent considerable time organizing their relationships, so please read this section carefully.

1. Common AI Terms

1) Large Model (LLM): All existing large models refer to large language models, specifically generative large models, with practical examples including GPT-4.0 and GPT-4o.

Deep Learning: A subfield of machine learning focused on using multi-layer neural networks for learning. Deep learning excels at processing complex data such as images, audio, and text, making it highly effective in AI applications.
Supervised Learning: A machine learning method where the model learns the mapping from input to output using a labeled dataset. Common algorithms include linear regression, logistic regression, support vector machines, K-nearest neighbors, decision trees, and random forests.
Unsupervised Learning: A machine learning method that discovers patterns and structures in data without labeled data. Common algorithms include K-means clustering, hierarchical clustering, DBSCAN, principal component analysis (PCA), and t-SNE.
Semi-supervised Learning: Combines a small amount of labeled data with a large amount of unlabeled data for training. It leverages the rich information from unlabeled data and the accuracy of labeled data to improve model performance. Common methods include Generative Adversarial Networks (GANs) and autoencoders.
Reinforcement Learning: A method that learns optimal strategies through interaction with the environment, based on reward and punishment mechanisms. Common algorithms include Q-learning, policy gradients, and Deep Q-Networks (DQN).
Model Architecture: Represents the design of the backbone of the large model. Different architectures affect the model’s performance, efficiency, and computational costs, and determine the model’s scalability.
Transformer Architecture: The mainstream architecture used by most large models, including GPT-4.0 and many domestic large models. The widespread use of the Transformer architecture is mainly because it enables large models to understand human natural language, maintain contextual memory, and generate text.
MOE Architecture: Stands for Mixture of Experts architecture, which combines various expert models to form a massive model capable of addressing multiple complex professional problems.
Machine Learning Techniques: A broad category of techniques that enable AI, including deep learning, supervised learning, and reinforcement learning. As a product manager, you don’t need to delve too deeply into these; just understand the relationships between these methods.
NLP Technology (Natural Language Processing): A field of AI focused on enabling computers to understand, interpret, and generate human language for applications like text analysis, machine translation, speech recognition, and dialogue systems.
CV Technology (Computer Vision): If NLP deals with text, CV addresses visual content-related technologies, including common image recognition, video analysis, and image segmentation techniques.
Speech Recognition and Synthesis Technology: Includes converting speech to text and synthesizing speech, such as Text-to-Speech (TTS) technology.
Retrieval-Augmented Generation (RAG): Refers to the technology where large models generate content based on information retrieved from search engines and knowledge bases, commonly involved in AI applications.
Knowledge Graph: A technology that connects knowledge, allowing models to better and faster access the most relevant information, thereby enhancing their ability to process complex associative information and AI reasoning.
Function Call: In large language models (like GPT), it refers to calling built-in or external functions to perform specific tasks or operations. This mechanism allows models to execute diverse and specific operations beyond mere text generation.

2) Terms Related to Large Model Training and Optimization Techniques

Pre-training: The process of training a model on a large dataset, typically diverse, to obtain a model with strong general capabilities.
Fine-tuning: Further training a large model on specific tasks or smaller datasets to improve its performance on targeted issues, using vertical domain data.
Prompt Engineering: In product manager terms, it refers to crafting questions in a way that the large model can easily understand, enhancing the input for desired results.
Model Distillation: A technique that transfers knowledge from a large model (teacher model) to a smaller model (student model) to improve performance while retaining much of the large model’s accuracy.
Model Pruning: The process of removing unnecessary parameters from a large model to reduce its overall size and computational costs.

3) AI Application-Related Terms

Agent: An AI application with a specific capability, akin to how applications in the internet era were called apps.
Chatbot: Refers to AI chatbots, a type of AI application that interacts through conversation, including products like ChatGPT.

4) Terms Related to Large Model Performance

Emergence: Refers to the phenomenon where a large model exhibits capabilities beyond expectations once its parameter scale reaches a certain threshold.
Hallucination: Indicates instances where a large model generates nonsensical content, mistakenly treating incorrect facts as true, leading to unrealistic outputs.
Amnesia: Refers to the situation where, after a certain number of dialogue turns or length, the model suddenly forgets previous context, leading to repetition and memory loss.

2. Understanding the Relationship Between AI, Machine Learning, Deep Learning, and NLP

If you are interested in AI and large models, you will inevitably encounter keywords like “AI,” “Machine Learning,” “Deep Learning,” “NLP” in your future studies. Therefore, it’s best to clarify these professional terms and their logical relationships to facilitate easier understanding.

In summary, the relationships between these concepts are as follows:

Machine learning is a core technology of AI, alongside expert systems and Bayesian networks (no need to delve into these).
NLP is a type of application task within AI focused on natural language processing, while AI’s application technologies also include CV technology, speech recognition, and synthesis.

3. Understanding the Transformer Architecture

When discussing large models, one cannot overlook the Transformer architecture. If large models are like a tree, the Transformer architecture serves as the trunk. The emergence of products like ChatGPT is primarily due to the design of the Transformer architecture, which enables models to understand context, maintain memory, and predict new words. Moreover, the Transformer allows large models to train on unlabeled data, eliminating the need for extensive labeled data preparation.

Relationship Between Transformer Architecture and Deep Learning Technology: The Transformer architecture is a type of neural network architecture within the deep learning field. Other architectures include traditional Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks.

4. Understanding the Relationship Between Transformer Architecture and GPT

GPT stands for Generative Pre-trained Transformer, meaning GPT is a large language model developed based on the Transformer architecture by OpenAI. The core idea of GPT is to enhance the ability to generate and understand natural language through large-scale pre-training and fine-tuning. The introduction of the Transformer architecture has significantly improved the model’s ability to understand context, process large datasets, and predict text.

Key Differences:

Capability Differences: The Transformer architecture enables models to understand context and process large data but does not inherently possess the ability to understand or generate natural language. In contrast, GPT enhances this capability through pre-training on natural language data.
Architectural Basis:
- Transformer: The original Transformer model consists of an encoder and a decoder, where the encoder processes the input sequence and generates intermediate representations, while the decoder generates output sequences based on these representations. This architecture is particularly suited for sequence-to-sequence tasks like machine translation.
- GPT: GPT primarily uses the decoder part of the Transformer, focusing on generation tasks. It employs unidirectional processing, where each word can only see the preceding words, aligning with the natural format of language models.
Implementation of Specific Problem-Solving:
- The Transformer is trained for specific tasks, optimizing its performance through simultaneous training of the encoder and decoder.
- GPT, on the other hand, achieves task-specific performance through supervised fine-tuning, requiring only task-specific data without extensive training for each task.
Application Domains:
- The traditional Transformer framework can be applied to various sequence-to-sequence tasks, while GPT is primarily used for generation tasks, excelling in generating coherent and creative text.

5. Understanding the MOE Architecture

In addition to the Transformer architecture, another popular architecture is the MOE (Mixture of Experts) architecture, which dynamically selects and combines multiple sub-models (experts) to complete tasks. The key idea of MOE is to solve a range of complex tasks by combining multiple expert models rather than relying on a single large model.

The main advantage of the MOE architecture is its ability to maintain computational efficiency while handling large-scale data and model parameters, significantly reducing computational costs without sacrificing model capability.

Transformer and MOE can be used together, often referred to as MOE-Transformer or Sparse Mixture of Experts Transformer. In this architecture:

The Transformer processes input data, leveraging its powerful self-attention mechanism to capture dependencies in sequences.
MOE dynamically selects and combines different experts to enhance computational efficiency and capability.

Lecture 2: Differences Between Large Models and Traditional Models

When we talk about large models, we usually refer to LLMs (Large Language Models), specifically those based on the generative pre-trained Transformer architecture like GPT. These models primarily address natural language tasks, unlike traditional models that may focus on images, videos, or speech. Moreover, LLMs are generative models, meaning their main capability is generation rather than prediction or decision-making.

In contrast to traditional models, large models exhibit the following characteristics:

Ability to Understand and Generate Natural Language: Many traditional models may not understand human natural language, let alone generate it.
Powerful and Versatile: Traditional models often solve one or a few specific problems, while large models can tackle a wide range of issues.
Contextual Memory: Large models possess memory capabilities, allowing them to relate to previous dialogue, unlike many traditional models.
Training Method: Large models are pre-trained on vast amounts of unlabeled text, significantly reducing the need for labeled data compared to traditional models.
Massive Parameter Scale: Most large models have parameter scales in the hundreds of billions, such as GPT-3.5 with 175 billion parameters, while GPT-4.0 is rumored to reach trillions of parameters.
High Computational Resource Requirements: Due to their scale and complexity, these models require significant computational resources for training and inference.

Lecture 3: Evolution of Large Models

1. Evolution of Generative Capabilities in LLMs

Understanding the evolution of LLMs helps clarify how large models have developed their current capabilities and better understand the relationship between LLMs and Transformers:

N-gram: The earliest stage of generative capability, primarily solving the prediction of the next word, but limited in understanding context and grammatical structure.
RNN and LSTM: These models addressed the issue of context length, enabling longer contextual windows but struggled with large data processing.
Transformer: Combines the predictive capabilities of previous models while supporting training on large datasets but lacks natural language understanding and generation.
LLM: Adopts the GPT pre-training and supervised fine-tuning approach, enabling the model to understand and generate natural language.

2. Development from GPT-1 to GPT-4

GPT-1: Introduced unsupervised training steps, solving the issue of requiring extensive labeled data. However, its small parameter scale (117 million) limited its ability to handle complex tasks without fine-tuning.

GPT-2: Increased parameter scale to 1.5 billion and expanded training text size to 40GB, enhancing model capabilities but still facing limitations with complex problems.

GPT-3: Expanded parameter scale to 175 billion, achieving strong performance in text generation and language understanding while eliminating the need for fine-tuning.

InstructGPT: To address GPT-3’s limitations, it added supervised fine-tuning and reinforcement learning from human feedback (RLHF) to optimize performance.

GPT-3.5: Released in March 2022, with training data up to June 2021, featuring a larger dataset of 45TB.

GPT-4: Released in April 2023, significantly enhancing reasoning capabilities and supporting multimodal abilities.

GPT-4o: Expected to enhance voice chat capabilities by May 2024.

O1: OpenAI’s O1 model, released in September 2024, focuses on enhancing reasoning capabilities.

Lecture 4: Principles of Text Generation in Large Models

1. How Does GPT Generate Text?

The process of generating text in large models can be summarized in five steps:

Upon receiving a prompt, the model first tokenizes the input content into multiple tokens.
It uses the Transformer architecture to understand the relationships between tokens, grasping the overall meaning of the prompt.
Based on context, it predicts the next token, potentially generating multiple results, each with a corresponding probability.
The token with the highest probability is selected as the predicted next word.
This process repeats until the entire content is generated.

Lecture 5: Classification of LLMs

1. Classification by Modality

Currently, large models can be categorized into:

Text generation models (e.g., GPT-3.5)
Image generation models (e.g., DALL-E)
Video generation models (e.g., Sora)
Speech generation models
Multimodal models (e.g., GPT-4.0)

2. Classification by Training Stage

Basic Language Model: A model trained only on large-scale text corpora without instruction or downstream task fine-tuning.
Instruction-Finetuned Language Model: A model that has undergone instruction fine-tuning and human feedback optimizations.

3. Classification by General and Industry Models

Large models can also be divided into general models and industry-specific models. General models perform well across various tasks but may struggle with specific industry-related data and terminology. Industry models, on the other hand, are fine-tuned for specific domains, achieving higher performance and accuracy.

Lecture 6: Core Technologies of LLMs

While this section may contain many technical terms that are challenging to understand, as a product manager, it is essential to grasp key concepts to facilitate communication with developers and technical teams.

1. Model Architecture: The Transformer architecture is one of the foundational core technologies of large models.

2. Pre-training and Fine-tuning

Pre-training: A key technology involving training on large-scale unlabeled data, significantly reducing the need for labeled data.
Fine-tuning: A technique to improve model performance on specific tasks through additional training on targeted datasets.

3. Model Compression and Acceleration

Model Pruning: Reducing model size and computational complexity by removing unimportant parameters.
Knowledge Distillation: Training a smaller student model to mimic the behavior of a larger teacher model, retaining performance while reducing computational costs.

Lecture 7: Six Steps in Large Model Development

According to OpenAI’s information, the development of large models typically involves the following six steps:

Data Collection and Processing: Collecting large amounts of text data from various sources and cleaning it to remove irrelevant or low-quality content.
Model Design: Determining the model architecture, such as the Transformer architecture used by GPT-4, and defining its size, including layers, hidden units, and total parameters.
Pre-training: The model learns language and knowledge by reading extensive text data, akin to a student absorbing information.
Instruction Fine-tuning: The process of retraining the model with question-answer pairs to improve its responses.
Reward Mechanism: Setting up an incentive system to guide the model towards providing valuable and accurate responses.
Reinforcement Learning: The model learns through trial and error in real-world scenarios to improve its performance.

Lecture 8: Understanding Large Model Training and Fine-tuning

1. Understanding Large Model Training

1) What Data Is Needed for Training Large Models?

Text data: Used for training language models, such as news articles, books, social media posts, and Wikipedia.
Structured data: Such as knowledge graphs, to enhance the model’s knowledge.
Semi-structured data: Such as XML and JSON formats for information extraction.

2) Sources of Training Data

Public datasets: Such as Common Crawl, Wikipedia, and OpenWebText.
Proprietary data: Internal company data or paid proprietary data.
User-generated content: Content from social media, forums, and comments.
Synthetic data: Data generated through GANs or other generative models.

3) Costs Associated with Training Large Models

Computational resources: GPU/TPU usage costs, depending on model size and training duration.
Storage costs: For large datasets and model weights, which can reach TB levels.
Data acquisition costs: Costs for purchasing proprietary data or cleaning and labeling data.
Energy costs: Training large models consumes significant electricity, increasing operational costs.
R&D costs: Salaries for researchers and engineers, as well as development and maintenance expenses.

2. Understanding Large Model Fine-tuning

Two stages of fine-tuning: Supervised Fine-tuning (SFT) and Reinforcement Learning (RLHF), with differences as follows:

2) Two Methods of Fine-tuning: Lora fine-tuning and SFT fine-tuning.

Lora fine-tuning adjusts only part of the model’s parameters, suitable for resource-limited scenarios.
SFT fine-tuning adjusts all parameters, enabling the model to address a wider range of specific tasks.

Lecture 9: Key Factors Affecting Large Model Performance

While there are many large models on the market, differences in their capabilities exist. The five most important factors affecting the performance of large models are:

Model Architecture: The design, including layers, hidden units, and total parameters, significantly impacts the model’s ability to handle complex tasks.
Quality and Quantity of Training Data: Model performance heavily relies on the coverage and diversity of its training data.
Parameter Scale: More parameters typically allow better learning and capturing of complex data patterns but increase computational costs.
Algorithm Efficiency: The choice of algorithms used for training and optimizing the model affects learning efficiency and final performance.
Training Frequency: Ensuring sufficient training iterations to achieve optimal performance while avoiding overfitting.

Lecture 10: How to Measure the Quality of Large Models?

From the application perspective, measuring the quality of a large model involves evaluating its performance across several dimensions:

1. Measuring Product Performance

1) Semantic Understanding Ability: Includes understanding semantics, grammar, and context, which determine the quality of interaction with the model. 2) Logical Reasoning: The model’s reasoning ability, numerical computation skills, and contextual understanding are core capabilities. 3) Accuracy of Generated Content: Includes the rate of hallucinations and ability to identify traps. 4) Hallucination Rate: The accuracy of the model’s responses and results, as models sometimes generate nonsensical content. 5) Trap Information Identification Rate: The model’s ability to recognize and handle misleading information. 6) Quality of Generated Content: Evaluated based on diversity, professionalism, creativity, and timeliness. 7) Contextual Memory Ability: Represents the model’s memory capability and context window length. 8) Model Performance: Includes response speed, resource consumption, robustness, and stability. 9) Human-like Quality: Evaluates how “human-like” the model is, including emotional analysis capabilities. 10) Multimodal Ability: Assesses the model’s capability to process and generate across different modalities, including text, images, video, and speech.

2. Measuring Basic Model Capabilities

The three key elements for measuring basic model capabilities are: algorithms, computational power, and data quality.

3. Assessing Model Safety

In addition to evaluating capabilities, safety considerations are crucial. We assess safety based on:

Content Safety: Compliance with safety management, social, and legal norms.
Ethical Standards: Ensuring generated content is free from bias and discrimination.
Privacy and Copyright Protection: Adhering to privacy and copyright laws.

Lecture 11: Limitations of Large Models

1. The Hallucination Problem

The hallucination problem refers to models generating plausible but incorrect or fabricated information. This issue is a significant concern for users and a primary reason for skepticism about model outputs.

Causes of Hallucinations:

Overfitting Training Data: The model may overfit noise or errors in the training data, leading to the generation of fictitious content.
Presence of False Information in Training Data: Insufficient coverage of real scenarios in training data can result in the model generating unverified information.
Inadequate Consideration of Information Credibility: The model may generate content confidently without effectively assessing its credibility.

Potential Solutions:

Using Richer Training Data: Incorporating diverse and authentic training data to reduce overfitting risks.
Credibility Modeling: Introducing components to estimate the credibility of generated information.
External Verification Mechanisms: Employing external sources to validate generated content against real-world facts.

2. The Amnesia Problem

The amnesia problem occurs when models forget previously mentioned information during long dialogues or complex contexts, leading to inconsistencies. Causes include:

Limitations of Contextual Memory: The model may struggle to retain and utilize long-term dependencies.
Lack of Examples in Training Data: Insufficient examples of long dialogues or complex contexts in training data can hinder effective memory retention.

Exploring the Evolution and Challenges of Large AI Models

Sun, 13 Oct 2024 00:00:00 +0000

Introduction

In the grand tapestry of artificial intelligence (AI), large models shine like brilliant stars, illuminating the future of technology. They not only reshape our understanding of technology but also quietly trigger transformations across numerous industries. However, these intelligent technologies are not without risks and challenges. Here, we unveil the mysteries of large models, sharing their technologies and characteristics, analyzing their development and challenges, and providing a glimpse into the AI era.

Large models, such as the Generative Pre-trained Transformer (GPT) series, have achieved remarkable success in the field of Natural Language Processing (NLP), setting new performance benchmarks across various language tasks. Beyond language, large models also demonstrate significant advantages in image processing, audio processing, and physiological signals. They are rapidly applied in fields such as education, healthcare, and finance, particularly excelling in content generation. Today, many cutting-edge technologies related to large models are still in urgent need of development, while issues such as bias and privacy breaches also require resolution. This article analyzes the history and evolution of large models, explores frontier issues, and discusses future development directions, helping the public quickly understand large model technology and integrate into the advancing AI era.

Origins of Large Models

In November 2022, the renowned AI research company OpenAI released ChatGPT, an AI chatbot program based on the large language model GPT-3.5. Its fluent language expression capability, powerful problem-solving abilities, and vast database garnered widespread attention worldwide. Within less than two months of its launch, ChatGPT surpassed 100 million monthly active users, becoming the fastest-growing consumer application in history. Consequently, various industries began to feel the powerful influence of large models, sparking a research boom in large models both domestically and internationally.

The origins of large models can be traced back to the early AI research in the 20th century, which primarily focused on logical reasoning and expert systems. However, these methods were limited by hard-coded knowledge and rules, making it difficult to handle the complexity and diversity of natural language. With the advent of machine learning and deep learning technologies, along with rapid advancements in hardware capabilities, training large-scale datasets and complex neural network models became possible, ushering in the era of large models.

In 2017, Google’s introduction of the Transformer model architecture, which incorporated self-attention mechanisms, significantly enhanced sequence modeling capabilities, particularly in terms of efficiency and accuracy when handling long-distance dependencies. Subsequently, the concept of pre-trained language models (PLMs) gradually became mainstream. PLMs are pre-trained on large-scale text datasets to capture general patterns of language, and then fine-tuned for specific downstream tasks.

Evolution of Large Models

OpenAI’s GPT series models exemplify generative pre-trained models, representing the vanguard of this technology. From GPT-1 to GPT-3.5, each generation has seen significant improvements in scale, complexity, and performance. At the end of 2022, ChatGPT emerged as a chatbot capable of answering questions, generating articles, programming, and even mimicking human conversational styles. Its almost omnipotent answering ability provided a new understanding of the general capabilities of large language models, greatly advancing the field of NLP.

However, the development of large models is not limited to text. With technological advancements, multi-modal large models have begun to emerge, capable of simultaneously understanding and generating various types of data, including text, images, and audio. In March 2023, OpenAI announced the multi-modal model GPT-4, which added image functionality and exhibited more precise language comprehension, marking a significant transition from single-modal to multi-modal models. This essential difference in cross-modal data presents new and more complex requirements for the design and training of large models, while also introducing unprecedented challenges.

Characteristics of Large Models

Large models typically refer to machine learning models with vast parameter counts, especially in applications within NLP, computer vision (CV), and multi-modal domains. These models understand and learn human language through a pre-training approach, completing tasks such as information retrieval, machine translation, text summarization, and code generation through human-computer dialogue.

Parameter Count of Large Models

The parameter count of large models usually exceeds 1 billion, meaning that the model contains over 1 billion learnable weights. These parameters form the basis for the model’s learning and understanding of data, adjusting through training to better map input data to output results. The increase in parameter count is directly related to the model’s learning ability and complexity, enabling it to capture more subtle and deeper data features.

Types of Large Models

Large models can be classified based on their application domains and functions:

Large Language Models: Focused on processing and understanding natural language text, commonly used for text generation, sentiment analysis, and question-answering systems.
Visual Large Models: Specifically designed to process and understand visual information (such as images and videos), used for image recognition, video analysis, and image generation tasks.
Multi-modal Large Models: Capable of processing and understanding two or more different types of input data (e.g., text, images, audio), executing more complex and comprehensive tasks by integrating information from different modalities.
Base Large Models: Generally refer to models that can be widely applied to various tasks, learning a large amount of general knowledge during the pre-training phase without a specific application direction.

Capabilities of Large Models

The capabilities of large models lie in their ability to understand and process highly complex data patterns:

Generalization Ability: Through pre-training on vast amounts of data, large models learn universal language rules, exhibiting strong generalization ability when facing new tasks.
Deep Learning: The large parameter scale and deep network structure enable large models to establish complex abstract representations, understanding the deeper semantics and relationships behind the data.
Context Understanding: In language models, large models can capture long-distance dependencies, enhancing their understanding of context, which is crucial for grasping subtle nuances in language.
Knowledge Integration: Large models can integrate and utilize the knowledge learned during pre-training, sometimes exhibiting a degree of common-sense reasoning and problem-solving abilities.
Adaptability: Although large models learn general knowledge during pre-training, they can adapt to specific tasks through fine-tuning, demonstrating high flexibility and adaptability.

Technologies Behind Large Models

Current large models are integrated machine learning models capable of processing various types of data. The foundational technologies within these large models aim to understand and generate information across different sensory modalities, enabling tasks such as image description, visual question answering, or cross-modal translation. Here are several key foundational technologies of large models.

Transformer Architecture

Most existing large models are built upon the Transformer model (or simply the decoder of the Transformer). This architecture captures global dependencies of input data through self-attention mechanisms and can also capture complex relationships between different modal elements. For instance, a multi-modal Transformer can simultaneously process the pixels of an image and the words of text, learning the associations between them through self-attention layers. This enables large models to understand various modalities such as text and images while generating long text sequences, maintaining contextual coherence.

Supervised Fine-tuning

Supervised fine-tuning (SFT) is a traditional fine-tuning method that continues training the pre-trained large model using labeled datasets. Notably, during the training of large models, the SFT phase typically uses high-quality datasets. Additionally, SFT involves adjusting the model’s parameters to enhance its performance on specific tasks. For example, to improve the model’s performance in legal consulting, a dataset containing legal questions and professional lawyer responses can be used for SFT. In SFT, the model typically attempts to minimize the difference between predicted outputs and true labels, often achieved through a loss function (such as cross-entropy loss). This method is straightforward and quick to adapt to new tasks; however, it has limitations due to its reliance on high-quality labeled data and may lead to overfitting on the training data.

Reinforcement Learning from Human Feedback

Reinforcement learning from human feedback (RLHF) is a more complex training method that combines elements of supervised learning and reinforcement learning. First, the model is pre-trained on a large amount of unlabeled text, similar to the steps prior to SFT. Then, human evaluators interact with the model or assess its outputs, providing feedback on its performance, which is used to train a reward model capable of predicting scores that human evaluators might assign. Finally, the original model’s parameters are optimized using the reward model as a reward signal through reinforcement learning methods. In this process, the model attempts to maximize the expected rewards it receives. The advantage of RLHF lies in its ability to help the model learn more complex behaviors, especially when tasks are difficult to define through simple correct or incorrect labels. Additionally, RLHF can help the model better align with human preferences and values.

Applications of Large Models

Large models, with their vast parameter counts, deep network structures, and extensive pre-training capabilities, can capture complex data patterns and exhibit exceptional performance across multiple domains. They not only understand and generate natural language but also process complex visual and multi-modal information, adapting to various dynamic application scenarios.

Applications in NLP

The application of large models in the NLP domain is particularly widespread. For example, OpenAI’s GPT series models can generate coherent, natural text, applied in chatbots, automated writing, and language translation, with notable products like the well-known ChatGPT. In the fintech sector, large models are often used for risk assessment, trading algorithms, and credit scoring. These models analyze vast amounts of financial data, predict market trends, and assist financial institutions in making better investment decisions. In the legal and compliance fields, they can be used for document review, contract analysis, and case studies. Through NLP technology, models can understand and analyze legal documents, enhancing the efficiency of legal professionals. Recommendation systems are another application area for large models. By serializing user behavior data into text, large models can predict user interests and recommend relevant products, movies, music, and more. In gaming, large models can utilize their coding capabilities to generate complex game environments, driving non-player characters (NPCs) to produce different dialogues based on player settings, thereby providing a more realistic gaming experience.

Applications in Image Understanding and Generation

Current large models possess not only text understanding capabilities but also multi-modal understanding abilities, laying the foundation for their applications in the image domain, such as automatic painting and video generation. These models can mimic an artist’s style to create new artworks, providing assistance to human creativity. For instance, OpenAI’s Sora, released in February 2024, can generate a video that meets user input requirements, offering a more convenient tool for the film production field. In the image processing domain, models like SegGPT are used for image recognition, classification, and generation. By learning from extensive image-text pairs, these models can identify objects, faces, and scenes in images and play roles in medical image analysis, autonomous vehicles, and video surveillance. Additionally, in medicine and biology, multi-modal large models can be used for disease diagnosis, drug discovery, and gene editing, extracting useful information from complex biomedical data to assist doctors in making more accurate diagnoses or helping researchers design new drugs.

Applications in Speech Recognition

Large models also play a vital role in the field of speech recognition. Through deep learning technologies, models can convert speech into text, supporting applications such as voice assistants, real-time speech transcription, and automatic subtitle generation, with mobile voice assistants being a typical example. These models learn from a vast number of speech samples, enabling them to handle different accents, intonations, and noise interference.

Moreover, large models can be applied across various industries, including education, healthcare, agriculture, and finance. For instance, in education, large models can be used for personalized learning, automated grading, and intelligent tutoring, providing customized teaching content based on students’ learning situations, thus helping them learn more efficiently. In summary, large models demonstrate immense potential across various fields through their powerful data processing and learning capabilities. With continuous technological advancements, it is foreseeable that large models will play an increasingly important role in future developments.

Development of Large Models

In the current AI landscape, large models have become an undeniable trend. With the continuous advancement of deep learning technologies, particularly in NLP and CV fields, large models are driving breakthroughs in cutting-edge technologies with their powerful data processing and pattern recognition capabilities.

The development of large models at the technical level benefits from several key factors. First is algorithmic innovation, especially following the introduction of the Transformer architecture, which has rapidly propelled the development of subsequent models, including BERT, the GPT series, and T5. These models achieve leading performance on various NLP tasks through pre-training and fine-tuning strategies. Second is the enhancement of computational power, particularly advancements in graphics processing units (GPUs) and tensor processing units (TPUs), making it possible to train models with tens of billions or even hundreds of billions of parameters. Additionally, the rise of cloud computing platforms provides the necessary computational resources for training large models. Meanwhile, large-scale datasets offer ample “nutrition” for model training, often containing rich language expressions, scene information, and user interactions, enabling models to capture complex data distributions and linguistic patterns.

The development of large models at the application level has two main directions: large language models and multi-modal large models. In the case of large language models, GPT-3 serves as a milestone, with a parameter count reaching 175 billion, showcasing astonishing language understanding and generation capabilities. Following closely, Meta AI’s LLaMA series models have become favorites in academic research and industry due to their excellent performance and relatively smaller model sizes. These models not only excel in standard NLP tasks but also exhibit tremendous potential in few-shot learning and transfer learning.

Multi-modal large models extend from this foundation, capable of processing and understanding various types of inputs, such as text, images, and audio. OpenAI’s DALL-E and CLIP are representative works in this direction, able to understand and generate images that correspond to textual descriptions or understand text content through images. Google’s SimCLR represents a significant exploration in the CV field, effectively extracting image features through contrastive learning. Subsequently, Google’s Gemini made important strides in the native multi-modal domain, pre-training across different modalities and handling more complex inputs and outputs, such as images and audio. OpenAI’s Sora further broadens the application scope of large models, capable of automatically generating video content based on input text, simulating interactions between characters and environments in both the physical and digital worlds.

The development history of large models is summarized, with highlighted entries representing multi-modal models.

Domestic tech companies are also actively exploring large models. Models such as Baidu’s “Wenxin Yiyan,” Alibaba’s “Tongyi Qianwen,” Huawei’s “Pangu,” and iFlytek’s “iFlytek Spark” have emerged, demonstrating excellent performance not only in general language understanding and generation tasks but also in specialized applications in fields such as healthcare, law, and tourism. For example, Ctrip’s “Ctrip Wenda” focuses on Q&A in the tourism sector, NetEase Youdao’s “Ziyue” is applied in education, and JD Health’s “Jingyi Qianxun” aims to provide medical consultation services.

Challenges of Large Models

In the field of AI, large models are becoming a hot topic in both academic research and industry due to their powerful processing capabilities and broad application prospects. However, as these models continue to expand, the challenges faced at the research frontier are becoming increasingly complex.

Model Size

The trade-off between model size and data scale has become a significant challenge. While model performance often improves with an increase in parameter count, this growth in scale brings substantial computational costs and high demands for data quality. Researchers are seeking optimal balances between model size and data scale under limited computational resources, while also exploring techniques such as data augmentation, transfer learning, and model compression to reduce model size without sacrificing performance, striving to minimize the operational costs of large models.

Network Architecture

Innovation in network architecture is also crucial. Most existing large models are based on the Transformer architecture. Although the Transformer architecture performs excellently in handling sequential data, its low computational efficiency and parameter utilization issues can lead to waste of computational resources. The limitations of the current Transformer have prompted researchers to design new network architectures aimed at improving efficiency and generalization capabilities through enhancements to attention mechanisms, the introduction of sparsity, and adaptive computation. For instance, the state-space-based model Mamba proposed in December 2023 introduces selection mechanisms that significantly address the computational efficiency issues of existing Transformer architectures, potentially becoming the next generation’s foundational architecture for large models.

Prompt Engineering

In handling imbalanced datasets, prompt learning offers a new paradigm for addressing this issue. By embedding specific prompts in input data, prompt learning helps improve model performance on minority classes. However, how to design effective prompts and determine the robustness of these prompts (effectiveness across different types of large models) has become a discipline—prompt engineering. Further research is needed to combine well-designed prompts with other large model technologies.

Contextual Reasoning

Simultaneously, as model sizes grow, emergent abilities such as contextual reasoning have surfaced, indicating that large models may have internalized cognitive and learning mechanisms closer to human understanding. The nature, triggering conditions, and controllability of these emergent abilities are current research hotspots that require more exploration from cognitive science and neuroscience perspectives to provide reasonable explanations, helping people understand the principles behind these emergent capabilities.

Knowledge Updating

The continuous updating of knowledge is another significant issue faced by large models. As knowledge progresses, the information within models may quickly become outdated. Researchers are exploring methods to enable models to continuously learn and integrate new knowledge while avoiding catastrophic forgetting to keep the model’s knowledge base up-to-date.

Explainability

Despite large models excelling in various NLP and machine learning tasks, as the parameter count increases and network structures deepen, the decision-making processes of models become increasingly difficult to explain. The black-box nature of large models makes it challenging for users to understand how large models process input data and generate output results. This leads to a passive understanding state, where people only know the model’s output results but have no idea why the model made such decisions.

Privacy and Security

The training data for large models may encompass personal identity information, sensitive data, or trade secrets. If these data are not adequately protected, the training process of the model may pose risks of privacy breaches or misuse. Additionally, large models themselves may contain sensitive information, such as memories gained from training on sensitive data, which introduces potential privacy risks.

Data Bias and Misinformation

Large language models may output biased or misleading content, which can stem from various factors such as data collection methods, annotators’ subjective preferences, and socio-cultural influences. When models are trained on biased data, they may incorrectly learn or amplify these biases, leading to unfair or discriminatory outcomes in practical applications.

Addressing these issues is crucial for advancing large model technology and expanding its application range. Solving each challenge could facilitate more effective applications of AI in the real world, bringing profound impacts to human society.

Future of Large Models

As AI technology continues to evolve and the application scenarios for large model technology expand, future trends in large model technology are emerging with new characteristics and development directions.

Balancing Model Scale and Efficiency

Given that large model technology often requires substantial computational resources and storage space, future development trends will focus on maintaining model scale while improving efficiency to meet practical application needs. Currently, sparse expert models are gaining attention as a new modeling architecture method. Compared to traditional dense models, sparse expert models reduce computational demands by activating only the model parameters relevant to the input data, thereby improving computational efficiency. Google’s sparse expert model GlaM, developed in 2023, has seven times more parameters than GPT-3 but reduces energy consumption during training and the computational resources required for inference, outperforming traditional models on various NLP tasks.

Deep Integration of Knowledge

Knowledge integration aims to enrich a model’s representational and decision-making capabilities by consolidating information from different data sources and knowledge domains. Currently, large models primarily train and apply to single-domain or single-modal data, such as the BERT model in NLP and the ViT model in CV. However, in the real world, text, images, audio, and other forms of information are often interrelated, making it challenging for single-modal information to meet complex scenario demands. Therefore, with the continuous development of CV, speech recognition, and other technologies, future large models will increasingly focus on multi-modal integration, processing data from different modalities to achieve fusion and interaction of multi-modal information. This ability to integrate multi-modal information allows large models to better understand and process complex information. Furthermore, consideration can be given to combining large model technology with external knowledge bases to further enhance the model’s understanding capabilities and application breadth. This means that models can leverage not only their internal language patterns and statistical information but also external structured knowledge for reasoning and decision-making, thereby better addressing complex problems in the real world. Importantly, external knowledge can also enhance the generalization capabilities of large models.

Exploration of Embodied Intelligence

Embodied intelligence refers to an intelligent system that perceives and acts based on a physical body, acquiring information, understanding problems, making decisions, and executing actions through interactions with the environment, thereby generating intelligent behavior. The proliferation of large models has significantly accelerated the research and implementation of embodied intelligence. Large language models are becoming key tools to help robots better understand and apply advanced semantic knowledge. By automating task analysis and breaking them down into specific actions, large model technology makes interactions between robots and humans, as well as physical environments, more natural, thereby enhancing the intelligent performance of robots. For instance, different tasks can be achieved through different large models. By utilizing large language models for learning dialogue, visual models for map recognition, and multi-modal models for completing physical actions, robots can learn concepts more efficiently and direct actions, while decomposing all instructions for execution, achieving automated scheduling and collaboration through large model technology. This comprehensive utilization of different models presents new opportunities and challenges for the intelligent development of robots.

Explainability and Trustworthiness

As model scales increase, their internal structures become increasingly complex, making the explainability and trustworthiness of models focal points of concern. First, to enhance model explainability, researchers will strive to develop new methods and technologies that enable large models to clearly explain their decision-making processes and the basis for generated results. This may involve introducing more transparent model structures, such as transparent neural networks or interpretable attention mechanisms, as well as developing explanatory algorithms and tools to help users understand model outputs.

Second, to enhance model trustworthiness, a series of measures will be taken to reduce the likelihood of models generating errors or misleading information. One important direction is to introduce external information sources and equip models with the ability to access and reference these sources. This way, models will be able to access the most accurate and up-to-date information, thereby improving the accuracy and credibility of their output results.

Simultaneously, to increase transparency and trust, models will also provide citations related to external information sources, allowing users to audit these sources and determine the reliability of the information. Notably, while some large models with external information access and citation capabilities have emerged, such as Google’s REALM and Facebook’s RAG, this is merely the beginning of development in this field. More innovations and advancements are expected in the future. For example, new models like OpenAI’s WebGPT and DeepMind’s Sparrow will further propel development in this area, laying a more solid foundation for the future applications of large model technology. The future development of large model technology will place greater emphasis on explainability and trustworthiness, which is not only an inevitable trend in technological development but also a reasonable requirement from society for technological applications. Only by continuously enhancing the explainability and trustworthiness of models can large model technology be better applied across various fields, bringing greater momentum for the development of human society.

Home on GameFly Center: Latest AI and Gaming Tech News

OpenAI Codex Practical Guide: Transform Your Coding Workflow

OpenAI Codex Practical Guide: Transform Your Coding Workflow

1. Installing Codex CLI Version

1. Command Line Installation

2. Configuring Login

3. Configuring CC Switch

4. Adding Providers

Environment Variable Example (Using Qwen Model)

2. Codex Desktop CLI Version

1. Registering an OpenAI Account

2. Special Redemption for Intel Macs: Old Machines Can Run Too

3. Cleaning Up Old Configurations: Remove Old Keys to Access Official Benefits

4. Login Authentication: Browser Authorization for Seamless Connection

3. Using Codex

Operations Scripts

Debugging Bugs

Generating Test Cases

Batch Refactoring

Automatically Generating Documentation

Codex vs. Claude Code: Which One to Choose?

Conclusion

OpenAI's Ambitious Move: Building an AI Agent Phone

ChatGPT’s Success as Path Dependence

Claude Code Redefines Revenue Rules

How to Monetize 900 Million Users

ChatGPT Lives in Others’ Houses

Understanding AI Through the Lens of Tokens

Introduction

Sixfold Reconstruction from a Measurement Unit

Models

Computing Power

Data

Applications

Industry

Governance

China’s Position in the Global Token Wave

OpenClaw: The Ultimate Guide for Beginners in Packet Capture

Introduction to OpenClaw: What It Is and Who It’s For

Suitable Users

Unsuitable Scenarios

Getting Started: Installation and Configuration on Three Major Platforms

1. Windows Installation and Configuration

2. macOS Installation and Configuration

3. Android Installation and Computer Interaction

Common Installation Pitfalls for Beginners

Basic Usage: Daily Packet Capture, Request Viewing, Filtering, and Quick Start

Advanced Customization: Rule Writing, Script Interception, Simulated Returns for Advanced Features

Advanced Usage Reminders

Principle Analysis: How Does OpenClaw Capture Packets? Understanding to Avoid Pitfalls

Common Issues One-Stop Troubleshooting: Solving Frequent Problems for Beginners

Rational Use of Packet Capture Tools: Compliance is the Bottom Line

Conclusion

New National Standards for AI Terminals Released in China

New National Standards for AI Terminals

Diverse Product Forms

Clear Evaluation System

Accelerating Technological Inclusivity

The True Purpose of AI Development: A Reflection

The True Purpose of AI Development

Dynamic Context Discovery: The Next Paradigm for Coding Agents

Dynamic Context Discovery: The Next Paradigm for Coding Agents

1. The Core Dilemma of the Old Paradigm: The Collapse of Static Context

2. Core Mechanism of Dynamic Context Discovery

2.1 File-based Long Tool Responses

2.2 File-based References to Chat History

2.3 Dynamic Loading of MCP Tools

2.4 File Systemization of Terminal Sessions

3. Why This Transition is Effective: An Information Theory Perspective

4. Applicable Boundaries and Limitations

5. Engineering Practice Recommendations

How to Implement Dynamic Context Discovery in Your Agent

6. Comparison with Other Solutions

ChatGPT's Functionality in May 2026: A User Experience Review

ChatGPT’s Functionality in May 2026: A User Experience Review

1. Basic Text Capabilities: Mature and Stable, Covering Everyday Scenarios

2. Multimodal Functionality: Comprehensive and Practical, with Room for Detail Optimization

3. Tool Integration and Memory Capability: Convenient and Efficient, with Personalization to be Deepened

4. Function Layering and Permission Differences: Clear Gradients, Noticeable Limitations for Free Users

Conclusion: Adapting to Mass Needs with Room for Advancement