When Four Investing Legends Become Your AI Research Team

Staff Writer

A Claude Code skill set systematizes Buffett, Munger, Duan Yongping, and Li Lu into warring AI agents that force buy-or-sell convictions instead of balanced equivocation.

xbtlin/ai-berkshire

★8.2k stars Velocity · 7d +974 ★/day

star history

View on GitHub ↗

The real Berkshire Hathaway is having an identity crisis. Greg Abel, who took over as CEO in January, has steered the conglomerate into Alphabet with a $10 billion private placement, raising the stake by 224 percent in his first quarter and making Google the fourth-largest holding behind Apple, American Express, and Coca-Cola. Warren Buffett spent six decades avoiding Big Tech on the grounds that he did not understand it well enough to bet on it; Abel clearly disagrees. The move signals that even the most conservative capital on the planet now views AI infrastructure as a necessary utility rather than a speculative frontier. It is against this backdrop that a GitHub repository called AI Berkshire has captured attention. The name is opportunistic, but the timing is apt. The project is not affiliated with Buffett or Berkshire Hathaway; it is a collection of Claude Code skills designed to replicate the decision-making discipline of value investing using large language models.

The problem with generic AI investment analysis is well understood by anyone who has tried it. Ask a model whether Pinduoduo is undervalued and you receive a syntactically perfect essay that balances bull and bear cases, cites a few metrics, and concludes with a disclaimer that investing carries risk and you should consult a professional. The README quotes this exact failure mode: the model says Pinduoduo has growth potential but also faces competitive pressure, leaving the user exactly where they started. The output is epistemically useless because it is designed to avoid error rather than make a call. AI Berkshire treats this hedging equilibrium as a process failure. Its framework imposes a decision-forcing architecture: every analysis must return a pass, fail, or gray-zone verdict, complete with price intervals, position sizes, and a mirror test that requires the user to state the thesis in five sentences or fewer. If the argument cannot survive that compression, the framework mandates a veto.

What distinguishes the repository from the growing ecosystem of AI investing tools is its multi-agent design. The flagship skill spawns four parallel AI agents, each adopting the methodology of a different master: Duan Yongping for business-model essence, Warren Buffett for moat and valuation, Charlie Munger for inversion and risk, and Li Lu for long-term civilization trends. They do not collaborate. They contradict. In a sample analysis of Pinduoduo, Buffett’s agent flags a price-to-earnings ratio of 6.3x and calls it a money-printing machine, while Li Lu’s agent assigns a low score because management culture fails the ten-year certainty test. Duan Yongping’s agent likes the consumer-to-manufacturer model, while Munger’s agent warns that the moat is shallower than it looks because a rival reached four trillion GMV in three years. The user acts as team lead, adjudicating the conflict.

This is the repository’s genuine technical insight. Most multi-agent frameworks seek consensus or average out scores. AI Berkshire seeks structured dissent. It replicates the dialectic of a real investment committee, where analysts talk past each other and the portfolio manager must decide which argument carries the weight. The parallelism also quadruples search coverage: four independent context windows, four web searches, and four verification paths run simultaneously. It is not a single prompt chopped into sections; it is four full research sprints merged by a human judge. The architecture is explicitly designed to prevent the single-context blindness that occurs when one user chats alone with a model.

The repository organizes its capabilities into sixteen skills, ranging from deep research and earnings analysis to portfolio management and news pulse tracking. These sit atop a three-layer architecture: the skill layer provides explicit entry points for tasks, the agent layer runs the four parallel personas, and the tool layer handles verification and calculation. This separation matters because it prevents the research process from collapsing into a single chat thread where context drifts and standards erode. Specialized skills include an industry funnel that narrows a universe of stocks to three finalists using explicit elimination criteria, and a news-pulse skill designed to attribute price swings to fundamentals or sentiment within ten minutes rather than producing exhaustive but slow reports.

The boring infrastructure is where the value hides. The framework layers a series of anti-bias mechanisms onto the AI to combat its most dangerous failure mode: answers that look correct because they are syntactically polished and numerically close enough. An information richness rating forces the model to admit when it is extrapolating from thin data, assigning grades like A for dense disclosures and C for speculative reconstructions. A Munger-style inversion module mandates explicit failure scenarios, requiring the model to answer what would kill the company. A veto checklist contains eight red lines, including management integrity stains, that trigger automatic rejection regardless of valuation. A contrarian check requires the model to articulate why smart people might be short the stock. And a leave-blank principle forbids the model from disguising uncertainty as analysis; insufficient data must be labeled a gray zone.

Financial rigor is handled by a separate Python toolset that replaces the model’s arithmetic. All calculations use exact decimal arithmetic rather than floating-point approximations, because in finance 0.1 plus 0.2 must equal 0.3, not 0.30000000000000004. The tools cross-verify market capitalizations by multiplying share price against total shares outstanding, compare reported figures against independent sources, and flag variances above a one percent tolerance. One module even applies Benford’s law to detect anomalous distributions in financial data. The framework treats the language model as a bright but sloppy intern who is required to show his work and use a calculator.

In the landscape of AI investing tools, AI Berkshire occupies a specific niche. A comparable project, claude-equity-research, generates institutional-grade reports with Goldman Sachs-style formatting, technical analysis, options flow, and ESG scores. That tool is quant-driven and plugin-native. Elsewhere, the Substack ecosystem around Claude Cowork focuses on automating Excel models and folder structures for equity research workflows, while a YouTube session from Fundamental Edge promotes an AI exoskeleton that speeds up analysis without cutting corners. A Facebook post shows a German investor using Claude to build divergence dashboards. AI Berkshire is less concerned with automation or technical indicators and more with epistemic architecture: the design of a thinking process that prevents the user from fooling themselves.

An arXiv paper on an AI value-investing strategy for the Brazilian stock market warns of look-ahead bias and overfitting in backtested AI strategies, noting that many developers report strong simulated performance that deteriorates in live trading. AI Berkshire sidesteps this critique by positioning itself not as an autonomous trading agent but as a research workflow. It does not claim alpha from pattern recognition; it claims alpha from better human decision-making under uncertainty. That is a more defensible, though harder to prove, proposition.

The README advertises a 2024 return of plus sixty-nine percent and 2025 year-to-date returns of plus sixty-six percent, with screenshots attributed to a Futu Securities account. The standard disclaimers are present, but the numbers are extraordinary, outperforming the S&P 500 by roughly forty-six to fifty percentage points. Without audited statements, readers must treat this as anecdotal validation rather than evidence. Still, it signals that the author is eating their own cooking.

Tensions remain. The framework is tightly coupled to Claude Code, an Anthropic product with subscription tiers and context limits that sit outside the user’s control. The roadmap acknowledges that real-time data plumbing is still missing, with future plans to integrate financial data providers through the Model Context Protocol. The published reports focus heavily on Chinese and American tech equities, from Moutai and Tencent to Meituan and Pinduoduo. Whether the framework generalizes to European industrials, Japanese trading houses, or commodity producers is untested. And the four-masters approach, while philosophically rich, may be overfit to a specific era of platform economics and consumer internet moats.

What AI Berkshire ultimately sells is not stock tips but decision hygiene. In a market flooded with AI tools that generate beautiful charts and equivocal prose, it is notable for building a system that forces conviction, documents doubt, and treats disagreement as a feature rather than a bug. Whether the project endures will depend on its ability to backtest its own recommendations against historical data—a roadmap item that remains unchecked—and to decouple from Claude Code’s proprietary command structure. For now, it stands as a rare example of AI tooling built around the psychology of decision-making rather than the automation of answers.

When Four Investing Legends Become Your AI Research Team

Sources