Introducing Grok Build (3 minute read)
Grok Build, a new coding agent and CLI, has launched in beta for SuperGrok and X Premium Plus subscribers. It supports complex coding projects by allowing plan mode reviews and integrates seamlessly with user conventions. Users can deploy Grok's capabilities for automation and parallel processing using headless mode and specialized subagents.
|
Notes on Pope Leo XIV's encyclical on AI (12 minute read)
Pope Leo XIV recently released a document on the ethics of integrating AI into modern society. It touches on the environmental impact of the technology, covers the risks of algorithmic systems that make decisions that impact people's lives, discusses how AI amplifies the power of those with resources, and more. A link to the document is available in the article. The writing style is very approachable, even to non-Catholics.
|
|
On AI Hardware (7 minute read)
The market is becoming a stack of memory problems. Hardware changes slowly, while software and model architectures can move quickly. Hardware companies will need to build architectures that remain useful as the bottleneck shifts.
|
Gemini 3.5 Flash Looks Good For How Fast It Is (8 minute read)
Zvi judges Gemini 3.5 Flash the best model at its speed point but unconvincing against Opus 4.7 or GPT-5.5 outside latency-sensitive workloads, with Google positioning it as a daily driver for agentic workflows that outscores 3.1 Pro on Terminal-Bench and MCP Atlas while running 4x faster.
|
|
On-Policy Distillation (5 minute read)
On-policy distillation trains a student model on trajectories sampled from its own policy while a teacher provides dense token-level supervision through KL-based regularization, closing the train-inference distribution mismatch that off-policy methods suffer. The canonical formulation unifies forward-KL, reverse-KL, and JSD losses with reverse-KL emerging as the default for mode-seeking smaller students, and a one-line code swap of the regularizer model on top of an RL stack like Tinker implements the technique.
|
|
Introducing BenchBench (5 minute read)
BenchBench is a benchmark that tests how well models can create a benchmark. It works as a great benchmark for model abilities as well as a test of models' self-awareness. The benchmark tests creativity and not just problem-solving ability. In tests, GPT 5.2 was the only winner, with every other model, from Opus 4.6 to GPT 5.5, struggling to create an actually useful benchmark that others had a hard time solving.
|
|
|
|
|