TLDR AI 2026-05-18

Your agent needs a harness, not a framework. 69% of engineers building in prod agree (Sponsor)

Inngest asked 130 engineers about running AI in production—only 19% were very confident their stack could scale, with gaps in tracing being a key issue. 1 in 5 now spend up to half their time on reliability work just piecing together context.

Read the full benchmark report to see what's working, what's just marketing (respectfully), and what teams your size actually use to ship production-ready apps and agents.

Or...just add the #1 thing the most confident teams use, for free 🤠

🚀

Headlines & Launches

Gemini app rolling out ‘Extended' thinking level, new 3rd-party app integrations (3 minute read)

Google is rolling out a new 'Thinking level' option for Gemini. The option has appeared for some users when they select Fast or Gemini 3.1 Pro. Google is also preparing to add more integrations with third-party apps in Gemini. Support for Canva, Instacart, and OpenTable appears to be coming.

Codex will soon be able to control other desktop devices via Computer Use (2 minute read)

OpenAI is working on a capability that lets its coding agent operate macOS applications through Computer Use even when a laptop is locked or asleep. Computer Use currently requires an unlocked, awake session to see the screen, move the cursor, and type. Lifting the restriction will allow users to direct their agents without having to walk back to their machines to log in first. It is unknown when the feature will be released.

ChatGPT Personal Finance (6 minute read)

OpenAI released a preview of a new personal finance experience in ChatGPT for Pro users in the US. The feature lets users securely connect financial accounts, view spending dashboards, and ask questions grounded in their financial context and goals.

🧠

Deep Dives & Analysis

Tokenomics: the 62.5-minute rule for Claude's cache (8 minute read)

If you expect to need a cache before 62.5 minutes, refresh it. Otherwise, let it expire. This number stays the same between models, and it doesn't change, no matter the size of the cache. The amount of dollars may change, but the decision point is still the same.

AI economics part 2 (11 minute read)

AI labs are in an ongoing war over GPU resources. That article looks into demand and supply and how the infrastructure powering AI today may not be sufficient. Scaling GPUs doesn't scale compute linearly. Efficiency matters more at raw scale given finite supply.

Portability Is a Myth: Why the Best AI Stacks Will Never Be Hardware-Agnostic (15 minute read)

AI kernel portability is structurally impossible because TPU's Pallas, NVIDIA's CuTile and CUTLASS, AWS's NKI, AMD's FlyDSL, and Tenstorrent's tt-Metalium each expose hardware-specific concepts that no universal DSL can unify. The evidence: MaxText's MoE grouped matmul ships as 282 lines of Pallas on TPU while flashinfer's equivalent for Blackwell SM100 takes 4 million lines of generated CUDA, with zero shared code because the algorithms themselves diverge across hardware.

🧑‍💻

Engineering & Research

May 26 workshop: Agent orchestration on AWS (Sponsor)

Multi-agent AI systems fail when agents can't share state, coordinate approvals, or recover from failures. The root cause: no orchestration layer managing execution and approval gates.

Build that layer using AWS Step Functions, Amazon Bedrock Agents, and Apache Airflow. See demos of retry logic, human approvals, and graceful failure handling in the May 26 workshop.

How Claude Code works in large codebases: Best practices and where to start (5 minute read)

Claude Code is now being used in production across multiple large codebases in organizations with thousands of developers. These environments bring challenges that smaller codebases don't. This article covers patterns that Anthropic has seen that have led to the successful adoption of Claude Code at scale. It looks at how Claude Code has been used in monorepos with millions of lines, legacy systems built over decades, and microservices across separate repositories.

Notes on pretraining parallelisms and failed training runs (12 minute read)

Pretraining runs often fail. This article looks at all the ways that things can go wrong and why training is such a precarious operation. The key culprits seem to be breaking causality and adding bias.

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention (33 minute read)

KV-cache size, memory traffic, and attention cost quickly become the main constraints as reasoning models and agent workflows keep more tokens around for longer. LLM developers are adding a growing number of architecture tricks to reduce costs. Most of the changes look like small tweaks, but some are quite intricate design changes. This article looks at these architecture changes with a focus on what changes inside the transformer block, residual stream, KV cache, and attention computation.

Lighthouse Attention (11 minute read)

Lighthouse Attention, a selection-based hierarchical attention, offers up to 17x faster forward and backward passes than standard attention models at large contexts. It utilizes FlashAttention on a dense sub-sequence, maintaining efficiency and compatibility with upstream improvements. By enabling efficient long-context training and retaining dense model competence, Lighthouse Attention achieves 1.4x to 1.7x speedup in pretraining while reducing computational costs.

🎁

Miscellaneous

Runway started by helping filmmakers — now it wants to beat Google at AI (11 minute read)

Runway's founders believe that the next form of AI will be built from video and world models that learn how the world works. The company is training models directly on observational data to reach the next frontier of AI. Runway was one of the first to develop AI video generation, but world models are a different race with deep-pocketed competitors. The company has raised $860 million to date, but it is going against incumbents like OpenAI and Google.

The haves and have nots of the AI gold rush (1 minute read)

The AI boom has created a wealth divide, with an estimated 10,000 individuals from companies like OpenAI and Nvidia achieving over $20M in wealth, while others face uncertain futures with stagnant job prospects and layoffs. Software engineers express concerns about their skills becoming obsolete, raising anxiety about career paths. This disparity fuels tension in San Francisco's tech scene as some criticize the dual role of AI as a wealth source and a career threat.

⚡

Quick Links

TLDR is hiring a Senior Software Engineer, Applied AI ($250k-$350k, Fully Remote)

TLDR's Applied AI team is tasked with making every process at TLDR legible to code, runnable by anyone, and composable into larger workflows. Join a small, fast moving team using the latest AI tools with an unlimited token budget. Learn more.

Apple Silicon costs more than OpenRouter (3 minute read)

Openrouter costs about 1/3 the price at around 2x the speed for comparable models.

Headroom (GitHub Repo)

Headroom compresses everything an agent reads before it reaches the LLM to produce the same answers at a fraction of the tokens.

DeepSeek-V4-Flash means LLM steering is interesting again (9 minute read)

Steering is the idea that LLM outputs can be guided by directly manipulating the activations of a model mid-flight.

OpenAI Quietly Bought Voice-Cloning Startup Weights.gg, Then Folded the Team (3 minute read)

OpenAI acquired the six-person team and its intellectual properties, then shut down Weights.gg and dispersed its team across multiple OpenAI groups.

Want to advertise in TLDR? 📰

If your company is interested in reaching an audience of AI professionals and decision makers, you may want to advertise with us.

Want to work at TLDR? 💼

Apply here, create your own role or send a friend's resume to [email protected] and get $1k if we hire them! TLDR is one of Inc.'s Best Bootstrapped businesses of 2025.

If you have any comments or feedback, just respond to this email!

Thanks for reading,
Andrew Tan, Ali Aminian, & Jacob Turner

Manage your subscriptions to our other newsletters on tech, startups, and programming. Or if TLDR AI isn't for you, please unsubscribe.

TLDR AI 2026-05-18

TLDR AI 2026-05-18

Headlines & Launches

Deep Dives & Analysis

Engineering & Research

Miscellaneous

Quick Links

Keep Reading

TLDR AI