Skip to main content
ARTE LOGICA
The Alignment Problem

The Alignment Problem

BOOK
AI Agents
by Brian Christian

Overview

A deeply reported investigation into the challenge of making AI systems that reliably do what we want — covering reinforcement learning, reward hacking, fairness, and AI safety.

Full Description

Brian Christian's 'The Alignment Problem' is the most thorough non-technical account of AI safety and the challenge of building agents that behave as intended. The book draws on extensive interviews with AI researchers at OpenAI, DeepMind, Berkeley, and MIT to explain reward hacking (agents optimising for the wrong thing), distributional shift, fairness and bias, interpretability, and the long-horizon challenge of aligning superintelligent systems. A must-read for anyone who wants to understand why getting AI agents to do what we actually want is so hard.

Stay Informed

Get the latest AI resources and insights delivered to your inbox