0:00
/
Transcript

Measuring Machine Intelligence with Chris Painter

How should we evaluate AI’s capabilities, risks, and autonomy as systems grow more powerful?

Artificial intelligence is advancing rapidly, but our ability to measure what these systems can actually do—and the risks they may pose—has lagged behind. Headline benchmarks and viral demos offer snapshots of a system's performance, but they say little about how AI behaves in complex real-world settings or how much autonomy models can sustain over time. As these systems take on more consequential roles, the challenge is not just building more powerful models, but developing credible ways to evaluate their capabilities and limits.

Chris Painter, president of Model Evaluation and Threat Research (METR), joins Oren to discuss how researchers are building new frameworks to assess AI systems and what those efforts reveal about the trajectory of machine intelligence. They explore “time horizon” as a measure of autonomy, the difficulty of evaluating alignment and sabotage risks, and the constraints posed by compute and organizational bottlenecks. They also consider what it will look like when AI systems begin contributing even more to their own development and their capabilities outpace our ability to measure them.

Discussion about this video

User's avatar

Ready for more?