Codex will soon be able to control other desktop devices via Computer Use (2 minute read)
OpenAI is working on a capability that lets its coding agent operate macOS applications through Computer Use even when a laptop is locked or asleep. Computer Use currently requires an unlocked, awake session to see the screen, move the cursor, and type. Lifting the restriction will allow users to direct their agents without having to walk back to their machines to log in first. It is unknown when the feature will be released.
|
ChatGPT Personal Finance (6 minute read)
OpenAI released a preview of a new personal finance experience in ChatGPT for Pro users in the US. The feature lets users securely connect financial accounts, view spending dashboards, and ask questions grounded in their financial context and goals.
|
|
AI economics part 2 (11 minute read)
AI labs are in an ongoing war over GPU resources. That article looks into demand and supply and how the infrastructure powering AI today may not be sufficient. Scaling GPUs doesn't scale compute linearly. Efficiency matters more at raw scale given finite supply.
|
Portability Is a Myth: Why the Best AI Stacks Will Never Be Hardware-Agnostic (15 minute read)
AI kernel portability is structurally impossible because TPU's Pallas, NVIDIA's CuTile and CUTLASS, AWS's NKI, AMD's FlyDSL, and Tenstorrent's tt-Metalium each expose hardware-specific concepts that no universal DSL can unify. The evidence: MaxText's MoE grouped matmul ships as 282 lines of Pallas on TPU while flashinfer's equivalent for Blackwell SM100 takes 4 million lines of generated CUDA, with zero shared code because the algorithms themselves diverge across hardware.
|
|
May 26 workshop: Agent orchestration on AWS (Sponsor)
Multi-agent AI systems fail when agents can't share state, coordinate approvals, or recover from failures. The root cause: no orchestration layer managing execution and approval gates.Build that layer using AWS Step Functions, Amazon Bedrock Agents, and Apache Airflow. See demos of retry logic, human approvals, and graceful failure handling in the May 26 workshop.
|
How Claude Code works in large codebases: Best practices and where to start (5 minute read)
Claude Code is now being used in production across multiple large codebases in organizations with thousands of developers. These environments bring challenges that smaller codebases don't. This article covers patterns that Anthropic has seen that have led to the successful adoption of Claude Code at scale. It looks at how Claude Code has been used in monorepos with millions of lines, legacy systems built over decades, and microservices across separate repositories.
|
Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention (33 minute read)
KV-cache size, memory traffic, and attention cost quickly become the main constraints as reasoning models and agent workflows keep more tokens around for longer. LLM developers are adding a growing number of architecture tricks to reduce costs. Most of the changes look like small tweaks, but some are quite intricate design changes. This article looks at these architecture changes with a focus on what changes inside the transformer block, residual stream, KV cache, and attention computation.
|
Lighthouse Attention (11 minute read)
Lighthouse Attention, a selection-based hierarchical attention, offers up to 17x faster forward and backward passes than standard attention models at large contexts. It utilizes FlashAttention on a dense sub-sequence, maintaining efficiency and compatibility with upstream improvements. By enabling efficient long-context training and retaining dense model competence, Lighthouse Attention achieves 1.4x to 1.7x speedup in pretraining while reducing computational costs.
|
|
Runway started by helping filmmakers — now it wants to beat Google at AI (11 minute read)
Runway's founders believe that the next form of AI will be built from video and world models that learn how the world works. The company is training models directly on observational data to reach the next frontier of AI. Runway was one of the first to develop AI video generation, but world models are a different race with deep-pocketed competitors. The company has raised $860 million to date, but it is going against incumbents like OpenAI and Google.
|
The haves and have nots of the AI gold rush (1 minute read)
The AI boom has created a wealth divide, with an estimated 10,000 individuals from companies like OpenAI and Nvidia achieving over $20M in wealth, while others face uncertain futures with stagnant job prospects and layoffs. Software engineers express concerns about their skills becoming obsolete, raising anxiety about career paths. This disparity fuels tension in San Francisco's tech scene as some criticize the dual role of AI as a wealth source and a career threat.
|
|
Headroom (GitHub Repo)
Headroom compresses everything an agent reads before it reaches the LLM to produce the same answers at a fraction of the tokens.
|
|
|
|
|