TLDR AI 2026-05-07

55% of Americans already use AI for finance. Are fintechs ready for mass adoption? (Sponsor)

How much AI are your customers ready for? Well, 86% of AI users say it helps them better understand their finances. Consumers aren't afraid.

Plaid's latest report breaks it down:

55% of people have used AI for money tasks in the last 12 months
50% of people say managing money without AI will soon feel outdated
“Intelligent” is becoming the new “digital” faster than anyone expected

Learn what your customers actually want out of AI and how to meet their expectations with insights from our new report.

Dig into the full findings

🚀

Headlines & Launches

Higher usage limits for Claude and a compute deal with SpaceX (3 minute read)

Anthropic increased usage limits for Claude through a new compute partnership with SpaceX, accessing over 220,000 NVIDIA GPUs. This expansion follows deals with Amazon, Google, Broadcom, Microsoft, NVIDIA, and Fluidstack for significant compute capacity. The company also plans international expansion to address compliance needs for enterprise customers in regulated industries.

Claude adds Self-Improving Agents (5 minute read)

Claude Managed Agents launched features like dreaming, outcomes, and multiagent orchestration. Dreaming enhances agent improvement by analyzing past sessions to identify patterns, while outcomes allow agents to self-correct based on predefined success criteria. Multiagent orchestration optimizes complex task management by enabling agents to delegate tasks to specialized subagents, as utilized by companies like Harvey, Netflix, Spiral by Every, and Wisedocs.

China to Invest in DeepSeek at $50 Billion Valuation (4 minute read)

DeepSeek is in talks to raise money from China's National Artificial Intelligence Industry Investment Fund, a one-year-old government-backed fund with around $8.8 billion in capital. The startup aims to raise a few billion dollars in the new round, which values it at around $50 billion. DeepSeek is a key component in China's plan to have top-class homegrown companies in a range of AI fields. The strategy is a way to hedge against US export controls and to take leadership in bringing AI to the world.

🧠

Deep Dives & Analysis

OpenAI Flips the Script (10 minute read)

OpenAI's Codex now surpasses Anthropic's Claude Code after Codex's integration of GPT-5.5 and improved app performance. Austin Tedesco highlights Codex's use in creating strategy documents from diverse sources, while Dan Shipper uses it for recruiting based on career trajectories. Marcus Moretti adopts a cautious approach to new AI tech, focusing only on tools solving real problems and proven by reputable use.

How AI agent memory works (28 minute read)

Language language models forget everything the moment they finish replying. Memory systems help them 'remember' things so they can have conversations. Agent memory systems are a part of the loop that carries information forward. This article looks at different ideas on what information should be passed on in each loop.

🧑‍💻

Engineering & Research

Four levers to specialize your AI agents (Sponsor)

General-purpose AI agents fail in specialized domains — subtly wrong in edge cases. Domain specialization fixes this. Build AI agents with four levers: system prompt, knowledge corpus, tool selection, guardrails. Demonstrated across customer engagement, logistics, and voice on AWS. Workshop + guide.

NVIDIA Spectrum-X — the Open, AI-Native Ethernet Fabric — Sets the Standard for Gigascale AI, Now With MRC (3 minute read)

Multipath Reliable Connection (MRC) is an RDMA transport protocol that enables a single RDMA connection to distribute traffic across multiple network paths. This improves throughput, load balancing, and availability for large-scale AI training fabrics. MRC delivers high levels of GPU utilization by load-balancing traffic across all available paths. It gives administrators fine-grained visibility and control over traffic paths to simplify operations and accelerate troubleshooting at scale.

TokenSpeed: A Speed-of-Light LLM Inference Engine for Agentic Workloads (5 minute read)

TokenSpeed, a high-performance LLM inference engine, optimizes agentic workloads with speed-of-light efficiency, leveraging a compiler-backed modeling mechanism and a high-performance scheduler. It delivers faster throughput than TensorRT-LLM for coding agents, with optimizations like TokenSpeed MLA to enhance Nvidia Blackwell's performance. Developed with NVIDIA DevTech and other collaborators, TokenSpeed significantly reduces latency and increases throughput in typical agentic workloads.

vLLM V0 to V1: Correctness Before Corrections in RL (8 minute read)

The vLLM V1 update improved inference correctness by addressing discrepancies in logprob computation, runtime defaults, inflight weight updates, and final projection precision. Key fixes included adjusting processed logprobs, disabling prefix caching, matching weight update models, and ensuring fp32 lm_head computation to align with vLLM V0's behavior. These changes resolved initial training mismatches, ensuring the new engine maintains expected RL performance without unnecessary objective-side corrections.

ProgramBench (5 minute read)

ProgramBench challenges agents to recreate software executables without source code, using only documentation and experimentation. The tasks range from terminal utilities to complex software like compilers and libraries, offering over 248,000 behavioral tests across 200 tasks. Agents must design and implement entirely from scratch in a secure, sandboxed environment, emphasizing software architecture skills without external aids or decompilation.

🎁

Miscellaneous

Google is not building a consultancy. It is writing a licensing agreement. That may be the smarter play (9 minute read)

Google is betting that enterprise AI is a platform problem, not a services problem. It is in talks with Blackstone, KKR, and EQT to give their portfolio companies access to Gemini models through omnibus licensing agreements. The discussions are not exclusive, and no deals have been finalized. Google is offering private equity firms a commercial wrapper that gives their entire portfolio access to Gemini, then relying on the consulting ecosystem it has already financed to handle implementation. The approach trades consulting revenue for distribution speed.

AI inference just plays by different rules (9 minute read)

AI inference demands extreme data performance, overwhelming traditional storage and data infrastructures. Vector DBs, sub-millisecond access times, and decoupled cloud storage are essential to handle unprecedented concurrency and unpredictable workloads. Silk offers a solution that boosts storage performance without heavy provisioning, keeping systems resilient against AI-driven demand spikes.

World Models Can Change Everything (20 minute read)

World models aim to advance AI from mere pattern recognition to understanding and interacting with the physical world, posing potential challenges like data friction and variation. Investments from AI pioneers like Yann LeCun are addressing these obstacles with significant billions to develop models that encapsulate complex physical interactions beyond current LLM capabilities. The struggle remains in obtaining diverse, high-quality, real-world data necessary for these models to function effectively, creating a significant challenge and opportunity in AI progression.

⚡

Quick Links

200ms p99 query latency over 100 billion vectors (Sponsor)

turbopuffer wrote about building a 100B-vector search index. The post examines turbopuffer's architecture, travels up the modern memory hierarchy, zooms into a single CPU core, and backs out to the scale of a distributed cluster. Read the blog.

Supercomputer networking to accelerate large scale AI training (14 minute read)

Frontier model training depends on reliable supercomputer networks that can quickly move data between GPUs.

All the demons hiding in your AIs… ranked! (40 minute read)

Sometimes, stable, self-reinforcing behavioral states emerge in large language models that resist suppression and sometimes spread into contexts far removed from the ones that produced them.

The Problem with “Mathematically Proven” Claims About LLMs (15 minute read)

Systems keep getting better, and theorems keep arriving to explain why they can not - both can be true because they're usually about different things.

The April every AI plan broke (18 minute read)

The design of subscription plans is being challenged by evolving product capabilities and usage patterns.

Kimi Chatbot Maker Moonshot AI Valued at $20 Billion in Meituan-Led Round (2 minute read)

Moonshot has more than quadrupled its valuation in the span of just a few months.

Introducing Harvey's Legal Agent Benchmark (12 minute read)

Harvey's Legal Agent Benchmark (LAB) is an open-source tool for assessing AI agents' performance in legal tasks.

Google tests screen sharing and custom agents in Antigravity (2 minute read)

Google is testing screen sharing and custom agents in its Antigravity IDE.

Want to advertise in TLDR? 📰

If your company is interested in reaching an audience of AI professionals and decision makers, you may want to advertise with us.

Want to work at TLDR? 💼

Apply here, create your own role or send a friend's resume to [email protected] and get $1k if we hire them! TLDR is one of Inc.'s Best Bootstrapped businesses of 2025.

If you have any comments or feedback, just respond to this email!

Thanks for reading,
Andrew Tan, Ali Aminian, & Jacob Turner

Claude self-improving agents 🤖, Anthropic SpaceX deal 🚀, ProgramBench launch 💻