Fast mode for Claude Opus 4.7 (2 minute read)
Fast mode for Claude Opus 4.7 is now available in research preview in the API and Claude Code, and on Cursor, Emergent, Factory, v0, Warp, and Windsurf. Fast mode is currently opt-in, but it will eventually become the default. A link to join the waitlist for fast mode is available.
|
|
How to achieve truly serverless GPUs (20 minute read)
Inference workloads are more variable and less predictable than training workloads. This makes them a natural fit for serverless computing. However, serverless computing only works if new replicas can be spun up as fast as demand changes. This article looks at how Modal took AI inference server scaling from multiple kiloseconds to just tens of seconds.
|
Semis Memo: Supply Chain Inheritance (4 minute read)
The AI infrastructure boom has driven increased demand for analog and power semiconductors, notably benefiting Multilayer Ceramic Capacitors, amidst a past supply glut and competition. Companies like Texas Instruments and NXP Semiconductors are avoiding capacity expansion, focusing instead on raising prices and improving profitability. The semiconductor supply chain, previously supporting EV and solar industries, is now being leveraged for AI-related demand growth.
|
What Parameter Golf taught us (7 minute read)
Parameter Golf attracted over 1,000 participants and 2,000 submissions focused on minimizing loss on a dataset within strict constraints. Participants leveraged a range of techniques, including careful tuning, quantization, and novel modeling ideas, with AI coding agents playing a significant role. This challenge revealed new talent and highlighted the evolving role of AI agents in research competitions.
|
|
Cactus Needle (GitHub Repo)
Cactus Needle is a 26M parameter Simple Attention Network distilled from Gemini 3.1 that can be finetuned locally on a Mac/PC. It runs on Cactus at 6,000 tokens per second prefill and 1,200 decode speed. The weights are fully open. Simple Attention Networks are aimed at redefining AI for consumer devices like phones, watches, and glasses.
|
Reinforcing Recursive Language Models (18 minute read)
The article discusses using reinforcement learning to fine-tune 4B models as recursive language models (RLMs) for production, achieving efficient task-specific behavior at a lower cost. By training a shared policy for both parent and child RLMs, this approach maintains task performance and reduces the need for multiple models. In tests, this method matches the performance of larger models like Claude Sonnet 4.6 but operates with significantly reduced size and cost.
|
Compute Optimal Tokenization (2 minute read)
Researchers derived compression-aware neural scaling laws by training nearly 1,300 models, revealing how bytes per token affect compute allocation. This challenges the heuristic that scales models by 20 tokens per parameter, showing it's due to specific tokenizers. The study suggests scaling should use bytes, not tokens, for better compute efficiency across diverse languages.
|
|
AI for the Real World: A conversation with Yann LeCun (12 minute read)
Today's LLMs may be commercially valuable, but predicting text alone will not lead to human-level intelligence because language is only a small fraction of how humans understand the world. Future AI systems will instead rely on “world models” that learn abstract representations of physics, causality, and consequences, enabling planning, reasoning, and adaptation in real-world environments like robotics, healthcare, factories, and industrial systems.
|
|
Qwen-Image-2.0 Technical Report (57 minute read)
The Qwen team released Qwen-Image-2.0, their latest multimodal image generation model, showing improved typography, instruction following, photorealism, and long-text rendering across generation and editing tasks.
|
Claude for Legal (GitHub Repo)
This repository contains reference agents, skills, and data sectors for the legal workflows that Anthropic most commonly sees.
|
|
|
|
|