The Alignment Problem
BOOK
AI Agents
by Brian ChristianOverview
A deeply reported investigation into the challenge of making AI systems that reliably do what we want — covering reinforcement learning, reward hacking, fairness, and AI safety.
Full Description
Brian Christian's 'The Alignment Problem' is the most thorough non-technical account of AI safety and the challenge of building agents that behave as intended. The book draws on extensive interviews with AI researchers at OpenAI, DeepMind, Berkeley, and MIT to explain reward hacking (agents optimising for the wrong thing), distributional shift, fairness and bias, interpretability, and the long-horizon challenge of aligning superintelligent systems. A must-read for anyone who wants to understand why getting AI agents to do what we actually want is so hard.